January, 1992
January, 1992
EDITORIAL


"It's Our Turn!" Yell Baby Bells




Jonathan Erickson


Like spoiled children everywhere, the nation's seven Baby Bell are stamping
their feet and crying "More! Now!" It's not enough that the Regional Bell
Operating Companies (RBOCs for short) have a monopoly, enabling them to amass
annual collective revenues in excess of $80 billion. Now they want to put
their aggregate hooks into nonbasic-telephone businesses such as online
information services. And the Supreme Court has given them the okey-dokey to
do so.
It hasn't always been this way. Back in 1984, when the consent decree
governing the breakup of the Bell System was implemented, Judge Harold Greene
prohibited the RBOCs from getting into the information service business,
citing his fear (and telephone company history) of unfair competition. At
issue is that RBOCs would control both the information content and
distribution means. As Judge Greene said, "If the regional companies were
permitted to generate information and to transmit it, they would...appear to
be the only entities in the developed world to have this kind of stranglehold
on information." A federal Court of Appeals overrode Judge Greene, however,
forcing him to lift the ban. The Supreme Court seconded the Appeals Court.
Surprisingly, established nationwide online services (such as CompuServe) seem
blase about the possibility of Baby Bell online services. If anything, they
see opportunity waiting in the wings, maybe in terms of joint ventures.
Likewise, cable TV vendors and independent database publishers are quiet, even
though the phone companies pose a long-term threat to them all.
Newspapers, who have been tinkering with online services for the past decade
or so, are squealing the most. Few, if any, of them have enjoyed any success
with these services, even though they've had nearly ten years to do so.
They've had about as much luck as, well, the phone companies had selling
computer systems. The American Newspaper Publishers Association (ANPA),
spokesgroup for the dailies, cites a litany of laments: The newspapers will
have to buy services from their direct competitors, the phone companies will
have a window into their competitors' business, and so on. The RBOCs are
responding with a nasty ad campaign which takes shots at newspapers that have
taken a stand against them. But deep down, the real concern is advertising
revenue.
This brings us to questions about the kind of services the RBOCs might offer.
The "editorial" material would probably be sports reports, stock quotes,
school lunch and restaurant menus, alarm monitoring, and the like. Phone
companies also say that its information services would be "great equalizers,"
providing electronic access for the elderly, the physically disabled, and
small business.
Still, the initial forays will likely provide "electronic yellow pages" that
let callers dial up information about sales and services of local businesses
or "electronic classified ads," thus directly competing with advertising in
local newspapers. Nevertheless, the ANPA maintains that the big issue is that
phone companies "will be able to take advantage of [their] monopoly position
to hamper other information services."
I'd guess that within the next six months or so we'll start seeing tentative
toe-dipping into phone-accessible, phone-sponsored services such as news
summaries, retail information, and so on. One RBOC (Pacific Bell) has already
launched its efforts by creating new divisions and naming executives. Much to
the dismay of other newspapers, Pulitzer Publishing and its flagship St. Louis
Post-Dispatch have announced the possibility of a joint venture with
Southwestern Bell to develop an information service (news and classified ads)
for 10,000 to 12,000 home computer users.
But the RBOCs will go slow because: 1. Some Baby Bells have already had their
fingers burnt when dipping into nonphone pockets (remember videotex); 2. there
just isn't that much business out there anyway (RBOCs counter that initially
there wasn't a market for phones, either); 3. the telephone network currently
isn't capable of carrying the kind of information that people will eventually
demand (broadcast-quality video, for instance); and 4. some state and federal
rules remain to be dealt with.
In the meantime, the Telecommunications Act of 1991 (H.R. 3515) that's before
Congress is a compromise of sorts. It would allow RBOCs to get into the online
service game, but only if they follow fair rules of play.
Frankly, I have a problem with the the idea that a local monopoly -- one whose
government-established rates and profits are guaranteed and paid for by you
and me -- will be out there competing with companies that have no such
guarantee. The trend, in Judge Greene's words, toward "the concentration of
the sources of information of the American people into just a few, dominant
collaborative conglomerates" is equally disconcerting.
The question of whether or not the Information Age is upon us isn't at stake
here, nor is the question of what it will look like. It's here and it will
evolve. What's at issue is who will provide that information to us and at what
cost -- monetary and (more importantly) otherwise. And I'm going to call my
congressman about this -- as soon as our phones are working again.





































January, 1992
LETTERS







Polymorphic Gates


Dear DDJ,
Michael Swaine's flame "Beyond Bablegate" (November 1991) contained more
overloaded operators than he thought. Actually, "the latest Applegate in the
making" pertains neither to copy protection nor to a hardware company.
Applegate could be construed as an AppleGate, however, in that it opens a
significant WindowsGate. Applegate Software's OptiMem for Windows is a DLL
that solves Windows memory problems--"gates" as formidable as any that shackle
Windows applications.
Arthur D. Applegate
Applegate Software
Redmond, Washington
Dear DDJ,
A few comments on Michael Swaine's enjoyable Babblegate piece in his "Swaine's
Flames" (November 1991). I too had the pleasure of technically reviewing the
manuscript of John Barry's Technobabble (MIT Press, 1991) and share Michael's
belief that it will help us all to do gooder English. However, it is unfair to
John, to the terminological and sociolinguistic professions, and to our noble
language to describe this entertaining book as scholarly. For example, Webster
III gives three meanings for the verb to mouse dating back to the 16th century
if not earlier. That babbler, W. Shakespeare, wrote "death feasts, mousing the
flesh of men," and the Restoration dramatist Wycherley (1641-1715) tells of
"naughty women whom they toused and moused." Similarly, to version (vt),
meaning "to make a version of" has a well-attested heritage, predating the DP
industry by many centuries. "Parts of speech" are identified from usage, not
imposed by elitist axioms on morphological structure. Natural language has
always and will always confound the prescriptive grammarians. Each
generation's "standard" English is a ratbag of "useful solecisms" surviving
from earlier generations.
Ironically, both Michael and John are alarmed by what they see as excessive in
the computer industry, whereas the real problem is the lack of neologisms for
genuinely new concepts. Apart from acronyms-as-words (another device as
ancient as written language), the DP lexicon has coined very few original
terms (byte and software spring to mind), preferring to coopt and overload
lexemes from other, often anthropocentric, vocabularies. It is this polysemous
glut of objects, memories (extended and expanded, but which is which?),
platforms, and the like, which leads to imprecision and misunderstanding. John
Barry often complains about synonyms, but these are not the culprits.
Independent discoveries naturally attract a slew of distinct descriptors
(many-one mappings) with no ambiguity. Eventually, one of the terms (the
best!) usually dominates. It's the one-many mess (take the term static in
C/C++, please) that confounds our discourse and documentation unless we remain
vigilant.
While on the subject of DP English, may I comment on some phrases in the
well-written PenPoint article by Ray Valdes? In "the ability to mount and
dismount volumes," I think I prefer dismount to the more prevalent unmount.
The rider/disk dismounts [from] the horse/system leaving all four unmounted!
In "the coding of message sends," we have a construct similar to that
discussed by Michael: "The install is straightforward." "Message sends" may
offend the purist, but I see it as technically precise (send as a function
name) and economical. I also like "front-ending the team with high-powered
talent," as a colorful and effective antianthropomorphism. I have two tiny
qualms over "detachable networking reconnectable volumes without peer in
mainframe environments." Not just the ever-present, vacuous environment, but
the potential ambiguity of peer when used in a networking context.
Stan Kelly-Bootle
Mill Valley, California


Davis Debated


Dear DDJ,
Howard Davis's rambling defense of software patents ("Letter," November 1991)
is wrong on almost every point. In particular, software patents were formerly
disallowed becuase algorithms were thought to be in the realm of abstract
ideas--not because of a lack of utility. When the Supreme Court ruled patents
may cover algorithms, it was not based on constitutional grounds as Davis
asserts.
The U.S. Constitution says in Article I, Section 8, "The Congress shall have
power to promote the progress of science and useful arts, by securing for
limited times to authors and inventors the exclusive right to their respective
writings and discoveries."
This is the basis of copyright and patent law, and the League for Programming
Freedom has no quarrel with it. Congress revises the criteria for patents from
time to time, and has always exempted mathematical formulas and laws of nature
on the theory that ownership of these would not promote progress. At issue is
whether algorithms implemented in software are in the same category.
We've had ten years of software patents now. Does anyone seriously think any
patents have promoted progress?
Roger Schlafly
Soquel, California
Dear DDJ,
This is in reply to the letter from Howard Davis in your November 1991 issue.
Sorry to burst your delusion, Mr. Davis, but computers have been able to
display text since before the 1960's. I might laud a more efficient method of
displaying text, but displaying text is not a new innovation in computer
software.
I'm a relatively new subscriber to DDJ, so I didn't get to read that article
by the League for Programming Freedom, but from the tone of your letter, they
seem like a fine bunch of people to me! Computer software, dispite the fact
that when "read" to a computer it can make that computer behave in different
ways, is written matter no different from the work of a novelist or poet.
Being written matter, there is nothing there to patent! You can no more patent
software than a novelist can patent a book. Claiming that, because your
software is so unique makes it patentable, is akin to a writer discovering and
patenting a genre that no other writer has explored before.
So even without reading the LPF's article, I would concur that patenting
software is patently absurd.
Bernie Gallagher
Somerset, New Jersey


Looking for Free Speech


Dear DDJ,
Your magazine has always been helpful to me as a programmer, and perhaps you
or one of your readers may be able to help me locate an algorithm to send
voice through the IBM PC's speaker.
A common way to reproduce voice is to feed the AM signal into an 8-bit A-to-D
converter, which is then sampled and the values stored in sequential order. To
"play back," the values are sent to a D-to-A converter, which reproduces the
voice accurately enough to recognize a person's voice.
Unfortunately, the PC's speaker is driven by digital logic circuitry and
current is either on or off in the speaker coil. It is impossible, therefore,
to send an AM signal to the PC's speaker. However, one can send a constant
amplitude, variable-width pulse train. The algorithm I need would convert a
string of 8-bit numbers representing an AM signal into a list of numbers
representing the variable-width string of pulses the PC's speaker needs, with
minimal loss of information.
From what I have been able to gather, the AM waveform needs some high pass
filtering and then is sent through a differentiater. The output should be a
logical O if the differentiater's output is positive and 1 if negative. While
it is easy to see how to build such a system in hardware, I want to do it with
software. Using the 8-bit code representing the AM information as input, how
does one get the "differentiated" output? The speech type program found in
shareware sources have not been of much help. Can someone point me in the
right direction?
Theron Wierenga
Muskegon, Michigan


The Chip is Worse



Dear DDJ,
In his July 1991 "Structured Programming" column, Jeff Dentemann presents a
technique for detecting the presence of a serial port. From our experience
with the popular German asynch toolbox V.24 Tools Plus, let me add some
thoughts and hints.
First, I recommend not to use the UART's loopback mode at all. Why? Simply
because this mode is faulty in about 10-20 percent of all 8250s all over the
world. The problem with the loopback mode has to do with the interrupt line:
With interrupts enabled, some UARTs fail to generate an interrupt request even
when you can read in the Interrupt ID Register (IIR) that the chip detected an
interrupt-causing event. The software-related problem of this is that it's not
easy to recover from this situation. Although Jeff doesn't use interrupts for
reading the response from the UART, a programmer could easily run into this
situation when having serial interrupts enabled before calling Jeff's
DetectComPort() function. But there is even another argument against using the
loopback mode. Some internal modem cards which simulate an 8250 fail to
simulate the loopback mode and can even get in an unpredictable state when
setting the loopback bit. We experienced this problem with an internal modem
for a Toshiba Laptop based on the Sierra chipset.
Second, Jeff's solution only works with true standard PC asynch adapters
(COM1..COM4). On the other hand, the programmer is more and more often
required to support PS/2s (on which Jeffs code doesn't work for COM3 and
(COM4) and even multiport adapters. In order to handle these fairly common
situations, you first have to deal with I/O addresses. I/O addresses from
multiport adapters (like DigiBoard, AST, etc.) are well-known and can
therefore easily be tested using an address table. But when testing some 10P20
I/O ports by simply writing certain bytes to the port and expecting other
bytes in return, you can easily lock up the computer. Why is this? Because if
there is another board located at the specific I/O address than the expected
asynch adapter, you will easily set this guy in an unpredictable state by
writing (for this board) random byte values.
To get around this, it is wise first to make sure by read-only port accesses
that there is a good chance for an asynch adapter at that specific location.
If all read-only tests pass, we write a value to the divisor latch port and
then simply read it back. This is, in essence, the same as setting a specific
baud rate and then verifying that the baud rate could be set. The sequence of
events is shown in Example 1. (I use pseudocode because I refuse to write in
Turbo Pascal.)
Example 1

 get a base I/O address.
 read the value of the Interrupt Enable Register.
 if bit 4,5,6, or 7 in the Interrupt Enable Register is set, this can't be
 a UART.
 read the value of the Modem Control Register.
 if bit 5,6, or 7 in the Modem Control Register is set, this can't be a
UART.
 read the value of the Interrupt Identification Register.
 if bit 4 or 5 in the Interrupt Identification Register is set, this can't
 be a UART.
 try to set 19.200 bps.
 read the current baud rate.
 if the current baud rate is 19.200 pbs, there is a UART at this base I/O
 address.

Even if multiport adapters don't need to be supported, an application should
use additional code to obtain valid base I/O addresses in order to make proper
use of COM3 AND COM4 in a PS/2. The problem with the PS/2 is that IBM decided
not to use 03e8H and 02e8H (the standard PC values) for COM3 and COM4, but
322OH and 3228H instead. One can easily test if there is a chance for these
I/O addresses by reading the BIOS data area. The PS/2 BIOS loads the data area
at offset 0040H:OOOOH with the I/O addresses of all installed asynch adapters
at boot time, so peeking into this area can tell whether it makes sense to
test for a PS/2 asynch adapter or not. Since the PC BIOS fills this area with
Os if it is too dumb to detect COM3 and COM4, the logic in Example 2 will
present you the base addresses of COM1..COM4 on PCs and PS/2s.
Example 2

 peek into the BIOS data area at offset 0040H:x, where x is O for COM1, 1
 for COM2,
 2 for COM3, and 3 for COM4.
 if there is a value unequal zero, return it as the base address.
 if the value is zero, return 03f8H for COM1, 02f8H for COM2, 038H for
 COM3, and 03f8H for COM4.

One last hint with regard to the PS/2: IBM not only chose nonstandard base
addresses for COM3 and COM4, but a nonstandard IRQ for COM3, as well. On the
PS/2, COM3 uses IRQ3, just like COM2 and COM4. It is therefore wise to use a
statement such as if the base address for COM3 is 3220H, then use IRQ3
otherwise, use IRQ4. to properly set up asynch interrupts.
The logic presented here has been used by our customers on several thousand
PCs and has not been reported to have failed even once.
Ralph Langner
Hamburg, Germany


Regarding Matrices


Dear DDJ,
However, programmer (including myself) beware of using floating-point
arithmetic to compute integer results. For instance, the 80x87 math
coprocessor guarantees the precision of floating-point operations up to 15
decimal places only. This may seem sufficient, until you realize how fast
Fibonacci numbers grow. In fact, Fibonacci(71) and greater, are beyond the
accuracy of the 80x87 math coprocessor.
This is illustrated in Listing One: The program simply compares the Fibonacci
number computed using various methods. When Fibonacci numbers greater than 71
are computed, small differences that are caused by inaccuracies of the
floating-point computations become apparent.
Thus, a better way to compute Fibonacci numbers would be to come up with an
integer algorithm that dynamically expands the number of bits used in the
computation, on as needed basis; or, simply overkill by using some large
number of bits (like 512 or 1024); or, precompute the floating-point constants
and then come up with a software emulation of higher-precision floating-point
operations; or, come up with an algorithm that always produces enough
precision to guarantee an accurate integer result (since that's all we need
for Fibonacci number computations).
Victor J. Duvanenko
Indianapolis, Indiana














January, 1992
PARALLEL DSP FOR DESIGNING ADAPTIVE FILTERS


Paralleled DSP chips implement the filter; here's how to program them




Daniel Chen


Daniel is an engineer with Texas Instruments. He can be reached there at 12203
S.W. Freeway, Houston, TX 77477.


A new generation of advanced computer applications consist of programs whose
required execution speeds are greater than the ability of the hardware to
perform them. In such situations, designers must abandon the classic,
single-processor, serial, Von Neumann computer architecture in favor of some
form of parallel architecture. Today, it is possible to design a parallel
system in which multiple processors are connected to work concurrently on
different parts of a problem, dramatically increasing the speed at which
instructions can be executed.
The conventional parallel architecture goes by the acronym SIMD, for Single
Instruction, Multiple Data stream. A SIMD computer's instruction sequence is
similar to that of a Von Neumann machine but the instructions are executed in
parallel on multiple sets of data by multiple processors. For problems whose
structure is unsuitable for SIMD techniques, designers can go to the MIMD
(Multiple Instruction, Multiple Data stream) architecture.
A MIMD computer uses several independent instruction sequences, each acting on
a separate data stream. MIMD offers greater parallelism and flexibility than
SIMD but is more complex in terms of the synchronization needed among
instructions.
SIMD and MIMD computers can be implemented with general-purpose microprocessor
elements, but it is now possible to get higher performance by using special
Digital Signal Processing (DSP) chips. For a number of high-performance,
computationally intensive applications such as 3-D graphics,
telecommunications, video conferencing, and neural networks, DSP techniques
are preferable to those used with a conventional microprocessor. And when such
chips are interconnected in parallel DSP (pDSP) architectures, instruction
execution times are much faster than with general-purpose microprocessors.
Two factors are driving the trend toward pDSP. One is that DSP algorithms are
inherently suited to task partitioning, which means that paralleled processors
can be assigned to individual tasks. The second is the dramatic increase in
DSP chip performance coupled with sharply lower prices compared to when these
devices were introduced ten years ago. The result is that pDSP is becoming an
increasingly cost-effective approach to achieving very high performance.
In addition to advanced applications (graphics, telecom, and so on),
traditional DSP problems can be solved by multiple processor implementations.
For example, filtering, correlation, and Fast Fourier Transforms (FFTs) are
all functions representable by a signal-flow graph. Such flow graphs identify
lower-level functions and their parallel interactions. Any problem that can be
symbolized in this manner is a candidate for parallel processing.
Practical pDSP architectures can be implemented using the Texas Instruments
TMS32OC40, the first DSP chip designed specifically for parallel processing.
High-performance systems can be designed because a virtually unlimited number
of C40s can be interconnected.
The TMS32OC40 incorporates the on-chip hardware necessary to meet the three
main requirements of parallel processing systems: efficient interprocessor
communication, high-information throughput, and a high-performance Central
Processing Unit (CPU). These requirements are met through parallel
communication ports for high speed and direct--no glue logic--interprocessor
communications, a multichannel DMA (Direct Memory Access) coprocessor for
concurrent I/O and immense throughput, and a high-performance, 32-bit floating
point CPU (see Figure 1). Backing up the hardware features are a full range of
software development tools specifically designed for parallel processing
systems.


Canceling Echoes with Adaptive Filters


A variety of telecommunication system problems are concerned with echo
cancellation. These problems crop up in long distance telephone voice
communications, full-duplex voiceband data modems, and high-performance,
"hands-free" audio-conferencing systems. In each case, practical echo
canceling circuitry is based on the principles of adaptive filtering. Recent
advances in DSP devices such as the C40 have led to the design of all digital
echo cancelers for both desktop and large systems.
An adaptive filter, upon which echo-cancellation hardware is based, is one
whose coefficients can be updated by an adaptive algorithm that optimizes the
filter's response to suit a desired performance criterion. A filter's
coefficients determine its characteristics and output function. In circuits
such as echo cancelers, the coefficients required to produce a given output
cannot be determined when the input signal is presented because the
coefficients depend on changing line or transmission conditions. Thus there is
a need for an adaptive filter that can alter its coefficients to match the
electrical and physical environments of the phone line.
Figure 2 illustrates the basic form of an adaptive filter.
The filter consists of two distinct parts: a filter structure designed to
perform the signal processing function and an adaptive algorithm for altering
the coefficients to suit the environment. An incoming signal, x(n), is
weighted in a digital filter to produce an output, y(n). The adaptive
algorithm adjusts the weights in the filter structure to minimize the error,
e(n), between the output, y(n), and the desired response of the filter, d(n).
In a real-time application such as echo cancellation (adaptive prediction,
noise cancellation, and channel equalization are others), an adaptive filter
implementation based on a programmable DSP device such as the C40 has many
advantages over a hard-wired filter. Not only do DSP chips consume less power
and space, they simplify manufacturing requirements. And the programmability
feature provides flexibility for system and software upgrades.
Adaptive filters require an implementation that provides fast multiplication
(parallel hardware multiplier), High-speed data flow (a pipe lined
architecture), and large storage capacity. The C40 meets these requirements
because its CPU contains a 40-hit floating-point multiplier and transfers data
at 320 Mbytes/second with a 40-nanosecond cycle time. Moreover, the chip has a
total memory space of 16 gigahytes, with 8 Kbytes of RAM and 16 Kbytes of ROM
packed onto its own silicon. A 512-byte program cache and boot code ROM round
out the C40's on-chip memory.
The DMA coprocessor runs concurrently with the CPU to maximize processing
speed and throughput. Six high-speed parallel communication ports offer
bidirectional data transfer rates between C40s of 20 Mbytes/second. Each port
has its own FIFO buffers and arbitration logic. And band width is broad
because multiple communication ports can be connected between processors
without the need for glue logic.
While adaptive filters can be implemented in a variety of ways, the C40-based
design to be described here uses a transversal filter and the LMS (Least Mean
Square) update algorithm. LMS is relatively simple to design and implement and
is well suited for many applications. The transversal filter--also known as a
tapped delay line--is an FIR (Finite Impulse Response) type which offers
greater stability than IIR (Infinite Impulse Response) types.
The general architecture of an adaptive filter used in an echo canceler system
consists of four TMS320C40 DSP devices operating in parallel. One such LMS
implementation is shown in Figure 3, where the leftmost C40 performs the
operation of convolution and the remaining three processors handle the
updating of filter coefficients.
Convolution is the multiplication of two vectors; in an adaptive filter, the
vectors are an input vector (input signal) and a weight vector that determines
the filter coefficient. This system also uses a central or global memory to
store coefficient data, but this architecture is not optimal for the highest
performance.
To improve filter performance, each C40 takes advantage of its captive on-chip
memory, as illustrated in the parallel architecture of Figure 4. Unlike the
circuit in Figure 3, each processor carries out the convolution task and also
handles updating of filter coefficients.
Each C40's internal 8-Kbyte Static RAM (SRAM) stores the routine for executing
the convolution function and the data for coefficient updates. The boot-code
ROMs contain information that initializes the pointers and arrays within the
chip. This permits the setting of addresses for the ports that permit
communication between C40s, and the setting of addresses for storing filter
weight information and the filter's output data. The type of data stored and
its function within the filter is shown in the detailed assembly language code
provided with this article.


The Adaptive Filter Implemented


Figure 5 shows the signal flow between the four interconnected C40 DSP devices
that make up the adaptive filter. Lines of communication between the
processors are illustrated by the lines with arrowheads. The C40 communication
ports are used to send signals among the processors.
In this transversal filter, the system input signal or input vector to C40 #1
is denoted by x(n) and the system output signal is y(n). Along with x(n), the
desired response of the filter, d(n), is fed to the input of C40 #1. The error
signal, e(n), is developed in C40 #1 and distributed to the other C40s in the
system. C40s #2, #3, and #4 develop their own output signals y2(n), y3(n), and
y4(n), which are returned to C40 #1 to form the system output signal y(n).
The function of DSP devices #2, #3, and #4 in Figure 5 is to make an
output-signal calculation based on input filter weights, the error signal and
the input signal from the previous stage. That calculation for each DSP device
is then sent to C40 #1, which returns the error signal e(n) to the devices.
Each DSP (#2, #3, #4) then updates the filter weights and passes a new input
signal to the following stage. The pseudo C code for executing these steps in
DSP's #2, #3, and #4 is given in Listings Two, Three, and Four (page 74).
The basic procedure is to initialize the filter weights, compute the value of
output signal y, receive an input x from the preceding stage, make an updated
calculation of y, and pass that value back to C40 #1. When the error signal is
received from C40 #1, the individual stage can update the filter weights.
A more extensive set of computations is carried out in C40 #1, which not only
calculates its own y output signal but receives the y outputs from stages #2,
#3, and #4. These values must be summed together to form the total output
signal y(n). This stage also computes the error signal, e, which is derived by
subtracting the output y from the desired value d. The pseudo C code program
for executing the functions of C40 #1 is given in Listing One(page 74).
Listings Six through Nine (pages 74 to 77) respectively are the C40 assembly
code versions of Listings One through Four. (Listing Five, page 74, is
CONST.H, the file that sets up the constant for Listings Six through Nine.) In
each listing, the program begins with an initialization routine to set the
initial inputs and filter weights and to set the pointers for the
communication ports.
The primary instructions for accomplishing these operations are LDI (Load
Integer) and STI (Store Integer). The LDP instruction in Listing Six is an
alternate form of LDI used to load the data-page pointer register. STF is the
command to store a floating-point value in an internal memory location.
To perform the computation of the filter output signal in any stage (y1(n),
y(n)), the assembly code uses the RPTBD command. This command allows a block
of instructions to be repeated a number of times without incurring any penalty
for looping.
The architecture of the C40 devices allows for the execution of parallel
instructions which simplifies programming and speeds execution. Thus,
instructions MPYF3SUBF3 and MPYF3ADDF3 allow a floating-point multiplication
and floating-point subtraction, or a floating-point multiplication and
floating-point addition, to be carried out in parallel or simultaneously.
Together with the RPTBD command previously mentioned, the output values y1(n)
and y(n) can be calculated on a continuous basis with updated data.
To update the adaptive filter coefficients, a program using the block repeat
instruction (RPTBD) and the parallel multiply commands provides a simple and
concise routine. See the portion of the code, "Update weights w(n)" in
Listings Six through Nine. The simplicity of the code is due to the powerful
architecture of the C40 DSP devices. Note that the parallel command used in
this subroutine is the MPYF3STF, which permits a simultaneous multiplication
and store of a floating-point value.
Programs for the C40 can be written in the ANSI C language and translated
directly into the highly optimized assembly code used in this adaptive filter
example. This is accomplished through the TI TMS320C40 Optimizing C Compiler,
which allows C programs to be linked with assembly language routines, and
allows for handcoding of time-critical routines directly in C40 Assembly
language. The compiler conforms exactly to the ANSI C specification and
contains a C-shell program to facilitate a one-step translation from C source
code to executable code.
Also incorporated in the C40 is SPOX, a hardware-independent software base for
a real-time DSP operating system. SPOX features a set of high-level C-callable
software functions that are independent of the underlying hardware platform,
thus insulating real-time DSP applications from numerous low-level system
details. The SPOX operating system plays an integral role in application
development, from the concept of new algorithms to integration of application
software into production hardware.


_PARALLEL DSP FOR DESIGNING ADAPTIVE FILTERS_
by Daniel Chen




[LISTING ONE]

/******* PSEUDO C CODE FOR CASCADE ADAPTIVE FILTER #1 *******/
/* Initialization */
 xptr = &x[0];
 wptr = &w[0];

 for (i=0;i<N1;i++){
 *xptr++ = 0.0;
 *wptr++ = 0.0;
 }
/* N1-1
* Compute y1 = SUM w[i] * x[i]
* i=0
*/
 xptr = &x[0];
 wptr = &w[0];
 input(x); /* input x from A/D converter */
 *xptr = x;
 input (d); /* input d from A/D converter */

 for (i=0;i<N1;i++)
 y1 += *xptr++ * *wptr++;
/* Compute y = y1 + y2 + y3 + y4 */
 receive(y2,y3,y4); /* receive y2, y3, y4 form processor 2, 3, 4 */
 y = y1 + y2 + y3 + y4;
/* Compute error signal e */
 e = d - y;
 output(y); /* output y to D/A converter */
 pass(e); /* pass e to processor 2, 3, 4 */
/* Update filter weights w[] */
 xptr = &x[N1-1];
 wptr = &w[N1-1];
 pass (*xptr); /* pass x(n-N1) to processor #2 */
 for (i=N1;i>0;i--){
 *wptr-- += mu * e *xptr--;
 *(xptr+1) = *xptr; /* delayed tap is implemented in circular buffer */
 }





[LISTING TWO]

/******* PSEUDO C CODE FOR CASCADE ADAPTIVE FILTER #2 *******/
/* Initialization */
 xptr = &x[0];
 wptr = &w[0];
 for (i=0;i<N2;i++){
 *xptr++ = 0.0;
 *wptr++ = 0.0;
 }

/* N2-1
* Compute y2 = SUM w[i] * x[i]
* i=0
*/
 xptr = &x[0];
 wptr = &w[0];
 receive(x); /* receive x(n-N1) from processor #1 */
 *xptr = x;
 for (i=0;i<N2;i++)
 y2 += *xptr++ * *wptr++;
/* pass y2 and receive e */
 pass(y2); /* pass y2 to processor #1 */
 receive(e); /* receive e(n) form processor #1 */
/* Update filter weights w[] */
 xptr = &x[N2-1];
 wptr = &w[N2-1];
 pass (*xptr); /* pass x(n-N1-N2) to processor #3 */
 for (i=N2;i>0;i--){
 *wptr-- += mu * e *xptr--;
 *(xptr+1) = *xptr; /* delayed tap is implemented in circular buffer */
 }






[LISTING THREE]

/****** PSEUDO C CODE FOR CASCADE ADAPTIVE FILTER #3 ******/
/* Initialization */
 xptr = &x[0];
 wptr = &w[0];

 for (i=0;i<N3;i++){
 *xptr++ = 0.0;
 *wptr++ = 0.0;
 }
/* N3-1
* Compute y3 = SUM w[i] * x[i]
* i=0
*/
 xptr = &x[0];
 wptr = &w[0];
 receive(x); /* receive x(n-N1-N2) from processor #2 */
 *xptr = x;

 for (i=0;i<N3;i++)
 y3 += *xptr++ * *wptr++;
/* pass y3 and receive e */
 pass(y3); /* pass y3 to processor #1 */
 receive(e); /* receive e(n) form processor #1 */

/* Update filter weights w[] */
 xptr = &x[N3-1];
 wptr = &w[N3-1];
 pass (*xptr); /* pass x(n-N1-N2-N3) to processor #4 */
 for (i=N3;i>0;i--){
 *wptr-- += mu * e *xptr--;

 *(xptr+1) = *xptr; /* delayed tap is implemented
 in circular buffer */
 }






[LISTING FOUR]

/****** PSEUDO C CODE FOR CASCADE ADAPTIVE FILTER #4 ******/
/* Initialization */
 xptr = &x[0];
 wptr = &w[0];

 for (i=0;i<N4;i++){
 *xptr++ = 0.0;
 *wptr++ = 0.0;
 }
/* N4-1
* Compute y4 = SUM w[i] * x[i]
* i=0
*/
 xptr = &x[0];
 wptr = &w[0];
 receive(x); /* receive x(n-N1-N2-N3) from processor #3 */
 *xptr = x;

 for (i=0;i<N4;i++)
 y4 += *xptr++ * *wptr++;
/* pass y4 and receive e */
 pass(y4); /* pass y4 to processor #1 */
 receive(e); /* receive e(n) form processor #1 */

/* Update filter weights w[] */
 xptr = &x[N4-1];
 wptr = &w[N4-1];
 for (i=N3;i>0;i--){
 *wptr-- += mu * e *xptr--;
 *(xptr+1) = *xptr; /* delayed tap is implemented
 in circular buffer */
 }






[LISTING FIVE]

**********************************************************************
* CONST.H - This file set up the constant for Cascade TMS320C40
* Adaptive Filter programs: LMS1.ASM LMS2.ASM LMS3.ASM LMS4.ASM
**********************************************************************
order1 .set N1 ; filter order for #1 C40
order2 .set N2 ; filter order for #2 C40
order3 .set N3 ; filter order for #3 C40
order4 .set N4 ; filter order for #4 C40

mu .set 0.01 ; step size
io_port .set 0100081h ; data I/O comm port addr for d, x, & y
C40_1_2 .set 0100041h ; comm port address from #1 to #2 C40
C40_1_3 .set 0100051h ; comm port address from #1 to #3 C40
C40_1_4 .set 0100061h ; comm port address from #1 to #4 C40
C40_2_1 .set 0100071h ; comm port address from #2 to #1 C40
C40_2_3 .set 0100061h ; comm port address from #2 to #3 C40
C40_2_4 .set 0100051h ; comm port address from #2 to #4 C40
C40_3_1 .set 0100081h ; comm port address from #3 to #1 C40
C40_3_2 .set 0100071h ; comm port address from #3 to #2 C40
C40_3_4 .set 0100061h ; comm port address from #3 to #4 C40
C40_4_1 .set 0100071h ; comm port address from #4 to #1 C40
C40_4_2 .set 0100081h ; comm port address from #4 to #2 C40
C40_4_3 .set 0100091h ; comm port address from #4 to #3 C40






[LISTING SIX]

******************************************************************
* LMS1 : Cascade TMS320C40 adaptive filter #1 Using Transversal
* Structure and LMS Algorithm, Looped Code
* Configuration:
* d(n) --------------------------+
* 
* e(n) +
* +-----<-----(SUM)
* -
* --------+-------- 
* x(n) ----Adaptive Filter-----+--------> y(n)
* -----------------
* +--------<-------+-------<--------+-------<--------+
* y2(n) y3(n) y4(n)
* y(n)<-+ 
* +----+----+ +----+----+ +----+----+ +----+----+
* +--TMS320C40x(n1) TMS320C40x(n2) TMS320C40x(n3) TMS320C40
* x(n)----> -----> -----> -----> 
* +-> # 1 # 2 # 3 # 4 
* +----+----+ +----+----+ +----+----+ +----+----+
* d(n)--+ 
* e(n) 
* +-------->-------+------->--------+------->--------+
* where n1 = n-N1, n2 = n-N1-N2, and n3 = n-N1-N2-N3
* Algorithm for processor #1:
* N1-1
* y1(n) = SUM w(k)*x(n-k) k=0,1,2,...,N1-1
* k=0
* y(n) = y1(n) + y2(n) + y3(n) + y4(n)
* e(n) = d(n) - y(n)
* w(k) = w(k) + u*e(n)*x(n-k) k=0,1,2,...,N1-1
* where filter order N = N1 + N2 + N3 + N4 and u is the step size mu,
**********************************************************************
 .include "const.h" ; include the constant definition file
 .sect "vector"
reset .word begin
; Initialize pointers and arrays

; xptr = &x[0];
; wptr = &w[0];
; for (i=0;i<N1;i++){
; *xptr++ = 0.0;
; *wptr++ = 0.0;
; }
 .text
begin .set $
 LDP @io_addr ; set data page
 LDI 0,R2 ; R2 = 0
 LDF 0.0,R1 ; R1 = 0.0
 LDI @io_addr,AR4 ; set pointer for data I/O
 LDI @C40addr2,AR5 ; set pointer for #2 C40 comm port
 LDI @C40addr3,AR6 ; set pointer for #3 C40 comm port
 LDI @C40addr4,AR7 ; set pointer for #4 C40 comm port
 LDI @xn_addr,AR0 ; set pointer for x[]
 LDI @wn_addr,AR1 ; set pointer for w[]
 STI R2,*-AR5(1) ; enable #2 C40 comm port
 STI R2,*-AR6(1) ; enable #3 C40 comm port
 STI R2,*-AR7(1) ; enable #4 C40 comm port
 STF R1,*+AR5(1) ; start #2 C40
 RPTS order1-1
 STF R1,*AR0++(1)% ; x[] = 0.
 STF R1,*AR1++(1)% ; w[] = 0.
 LDI order1,BK ; set up circular buffer
input:
; Compute filter output y1(n)
; xptr = &x[0];
; wptr = &w[0];
; input(x); /* input x from A/D converter */
; input (d); /* input d from A/D converter */
; *xptr = x;
; for (i=0;i<N1;i++)
; y1 += *xptr++ * *wptr++;
 LDI order1-2,RC
 RPTBD filter
 LDF *AR4,R6 ; input x(n)
 LDF *AR4,R7 ; input d(n)
 STF R6,*AR0 ; insert x(n) to buffer
 MPYF3 *AR0++(1)%,*AR1++(1)%,R1
 SUBF3 R2,R2,R2 ; R2 = 0.0
filter MPYF3 *AR0++(1)%,*AR1++(1)%,R1
 ADDF3 R1,R2,R2 ; y1(n) = w[].x[]
 ADDF R1,R2 ; include last result
; compute y(n) signals
; receive(y2,y3,y4); /* receive y2, y3, y4 form processor 2, 3, 4 */
; y = y1 + y2 + y3 + y4;
 ADDF *AR5,R2 ; add y2(n)
 ADDF *AR6,R2 ; add y3(n)
 ADDF *AR7,R2 ; add y4(n)
; Compute error signal e(n)
; e = d - y;
; pass(e); /* pass e to processor 2, 3, 4 */
 SUBF R2,R7 ; e(n) = d(n) - y(n)
 MPYF @u,R7 ; R7 = err = e(n) * u
; Output y(n) signal and e(n)
; output(y); /* output y to D/A converter */
; pass(e); /* pass e to processor 2, 3, 4 */
 STF R7,*+AR5(1) ; send out e(n)

 STF R7,*+AR6(1) ; send out e(n)
 STF R2,*+AR4(1) ; send out y(n)
 STF R7,*+AR7(1) ; send out e(n)
; Update weights w(n)
; xptr = &x[N1-1];
; wptr = &w[N1-1];
; pass (*xptr); /* pass x(n-N1) to processor #2 */
; for (i=N1;i>0;i--){
; *wptr-- += mu * e *xptr--;
; *(xptr+1) = *xptr; /* delayed tap is implemented
; in circular buffer */
; }
 LDI order1-3,RC ; initialize repeat counter
 RPTBD weight ; do i = 0, N-3
 MPYF3 R7,*AR0++(1)%,R1 ; R1 = err * x(n)
 ADDF3 R1,*AR1,R2 ; R2 = wi(n) + err * x(n)
 NOP

 MPYF3 R7,*AR0++(1)%,R1 ; R1 = err * x(n-i-1)
 STF R2,*AR1++(1)% ; update wi(n+1)
weight ADDF3 R1,*AR1,R2 ; R2 = wi(n) + err * x(n-i)
 LDF *AR0,R6
 STF R2,*AR1++(1)% ; update wi(n+1)
 BD input ; delay branch
 MPYF3 R7,*AR0,R1 ; R1 = err * x(n-N+1)
 STF R6,*+AR5(1) ; shift x(n-N) to #2 C40
 ADDF3 R1,*AR1,R2 ; R2 = wi(n-N+1) + err * x(n-N+1)
 STF R2,*AR1++(1)% ; update last w

; Define constants
xn .usect "buffer",order1
wn .usect "coeffs",order1
 .data
io_addr .word io_port
C40addr2 .word C40_1_2
C40addr3 .word C40_1_3
C40addr4 .word C40_1_4
xn_addr .word xn
wn_addr .word wn
u .float mu
 .end






[LISTING SEVEN]

******************************************************************
* LMS2 : Cascade TMS320C40 adaptive filter #2 Using Transversal
* Structure and LMS Algorithm, Looped Code
* Configuration:
* d(n) --------------------------+
* 
* e(n) +
* +-----<-----(SUM)
* -
* --------+-------- 

* x(n) ----Adaptive Filter-----+--------> y(n)
* -----------------
* +--------<-------+-------<--------+-------<--------+
* y2(n) y3(n) y4(n)
* y(n)<-+ 
* +----+----+ +----+----+ +----+----+ +----+----+
* +--TMS320C40x(n1) TMS320C40x(n2) TMS320C40x(n3) TMS320C40
* x(n)----> -----> -----> -----> 
* +-> # 1 # 2 # 3 # 4 
* +----+----+ +----+----+ +----+----+ +----+----+
* d(n)--+ 
* e(n) 
* +-------->-------+------->--------+------->--------+
* where n1 = n-N1, n2 = n-N1-N2, and n3 = n-N1-N2-N3
* Algorithm for processor #2:
* N2-1
* y2(n) = SUM w(N1+k)*x(n-N1-k) k=0,1,2,...,N2-1
* k=0
* w(N1+k) = w(N1+k) + u*e(n)*x(n-N1-k) k=0,1,2,...,N2-1
* where filter order N = N1 + N2 + N3 + N4 and u is the step size mu.
**********************************************************************
 .include "const.h" ; include the constant definition file
 .sect "vector"
reset .word begin
; Initialize pointers and arrays
; xptr = &x[0];
; wptr = &w[0];
; for (i=0;i<N2;i++){
; *xptr++ = 0.0;
; *wptr++ = 0.0;
; }
 .text
begin .set $
 LDP @C40addr1 ; set data page
 LDI 0,R2 ; R2 = 0
 LDF 0.0,R1 ; R1 = 0.0
 LDI @C40addr1,AR5 ; set pointer for #1 C40 comm port
 LDI @C40addr3,AR6 ; set pointer for #3 C40 comm port
 LDI @C40addr4,AR7 ; set pointer for #4 C40 comm port
 LDI @xn_addr,AR0 ; set pointer for x[]
 LDI @wn_addr,AR1 ; set pointer for w[]
 STI R2,*-AR6(1) ; enable #3 C40 comm port
 STI R2,*-AR5(1) ; enable #1 C40 comm port
 STI R2,*-AR7(1) ; enable #4 C40 comm port
 STF R1,*+AR6(1) ; start #3 C40
 RPTS order2-1
 STF R1,*AR0++(1)% ; x[] = 0.
 STF R1,*AR1++(1)% ; w[] = 0.
 LDI order2,BK ; set up circular buffer
input:
; Compute filter output y(n)
; xptr = &x[0];
; wptr = &w[0];
; receive(x); /* receive x(n-N1) from processor #1 */
; *xptr = x;
; for (i=0;i<N2;i++)
; y2 += *xptr++ * *wptr++;
 LDI order2-2,RC
 RPTBD filter

 LDF *AR5,R6 ; input x(n)
 STF R6,*AR0 ; insert x(n) to buffer
 MPYF3 *AR0++(1)%,*AR1++(1)%,R1
 SUBF3 R2,R2,R2 ; R2 = 0.0
filter MPYF3 *AR0++(1)%,*AR1++(1)%,R1
 ADDF3 R1,R2,R2 ; y2(n) = w[].x[]
 ADDF R1,R2 ; include last result
; Output y2(n) signals
; pass(y2); /* pass y2 to processor #1 */
 STF R2,*+AR5(1) ; send y2(n) to #1 C40
; Input error signal e(n)
; receive(e); /* receive e(n) form processor #1 */
 LDF *AR5,R7 ; load e(n) from #1 C40
; Update weights w(n)
; xptr = &x[N2-1];
; wptr = &w[N2-1];
; pass (*xptr); /* pass x(n-N1-N2) to processor #3 */
; for (i=N2;i>0;i--){
; *wptr-- += mu * e *xptr--;
; *(xptr+1) = *xptr; /* delayed tap is implemented
; in circular buffer */
; }
;
 LDI order2-3,RC ; initialize repeat counter
 RPTBD weight ; do i = 0, N2-3
 MPYF3 R7,*AR0++(1)%,R1 ; R1 = err * x(n)
 ADDF3 R1,*AR1,R2 ; R2 = wi(n) + err * x(n)
 NOP

 MPYF3 R7,*AR0++(1)%,R1 ; R1 = err * x(n-i-1)
 STF R2,*AR1++(1)% ; update wi(n+1)
weight ADDF3 R1,*AR1,R2 ; R2 = wi(n) + err * x(n-i)

 LDF *AR0,R6
 STF R2,*AR1++(1)% ; update wi(n+1)
 BD input ; delay branch
 MPYF3 R7,*AR0,R1 ; R1 = err * x(n-N+1)
 STF R6,*+AR6(1) ; shift x(n-N) to #3 C40
 ADDF3 R1,*AR1,R2 ; R2 = wi(n-N+1) + err * x(n-N+1)
 STF R2,*AR1++(1)% ; update last w

; Define constants
xn .usect "buffer",order2
wn .usect "coeffs",order2
 .data
C40addr1 .word C40_2_1
C40addr3 .word C40_2_3
C40addr4 .word C40_2_4
xn_addr .word xn
wn_addr .word wn
 .end






[LISTING EIGHT]


******************************************************************
* LMS3 : Cascade TMS320C40 adaptive filter #3 Using Transversal
* Structure and LMS Algorithm, Looped Code
* Configuration:
* d(n) --------------------------+
* 
* e(n) +
* +-----<-----(SUM)
* -
* --------+-------- 
* x(n) ----Adaptive Filter-----+--------> y(n)
* -----------------
* +--------<-------+-------<--------+-------<--------+
* y2(n) y3(n) y4(n)
* y(n)<-+ 
* +----+----+ +----+----+ +----+----+ +----+----+
* +--TMS320C40x(n1) TMS320C40x(n2) TMS320C40x(n3) TMS320C40
* x(n)----> -----> -----> -----> 
* +-> # 1 # 2 # 3 # 4 
* +----+----+ +----+----+ +----+----+ +----+----+
* d(n)--+ 
* e(n) 
* +-------->-------+------->--------+------->--------+
* where n1 = n-N1, n2 = n-N1-N2, and n3 = n-N1-N2-N3
* Algorithm for processor #3:
* N3-1
* y3(n) = SUM w(N1+N2+k)*x(n-N1-N2-k) k=0,1,2,...,N3-1
* k=0
* w(N1+N2+k) = w(N1+N2+k) + u*e(n)*x(n-N1-N2-k) k=0,1,2,...,N3-1
* where filter order N = N1 + N2 + N3 + N4 and u is the step size mu.
**********************************************************************
 .include "const.h" ; include the constant definition file
 .sect "vector"
reset .word begin
; Initialize pointers and arrays
; xptr = &x[0];
; wptr = &w[0];
; for (i=0;i<N3;i++){
; *xptr++ = 0.0;
; *wptr++ = 0.0;
; }
 .text
begin .set $
 LDP @C40addr1 ; set data page
 LDI 0,R2 ; R2 = 0
 LDF 0.0,R1 ; R1 = 0.0
 LDI @C40addr1,AR5 ; set pointer for #1 C40 comm port
 LDI @C40addr2,AR6 ; set pointer for #2 C40 comm port
 LDI @C40addr4,AR7 ; set pointer for #4 C40 comm port
 LDI @xn_addr,AR0 ; set pointer for x[]
 LDI @wn_addr,AR1 ; set pointer for w[]
 STI R2,*-AR7(1) ; enable #4 C40 comm port
 STI R2,*-AR6(1) ; enable #2 C40 comm port
 STI R2,*-AR5(1) ; enable #1 C40 comm port
 STF R1,*+AR7(1) ; start #4 C40
 RPTS order3-1
 STF R1,*AR0++(1)% ; x[] = 0.
 STF R1,*AR1++(1)% ; w[] = 0.
 LDI order3,BK ; set up circular buffer

input:
; Compute filter output y(n)
; xptr = &x[0];
; wptr = &w[0];
; receive(x); /* receive x(n-N1-N2) from processor #2 */
; *xptr = x;
; for (i=0;i<N3;i++)
; y3 += *xptr++ * *wptr++;
 LDI order3-2,RC
 RPTBD filter
 LDF *AR6,R6 ; input x(n)
 STF R6,*AR0 ; insert x(n) to buffer
 MPYF3 *AR0++(1)%,*AR1++(1)%,R1
 SUBF3 R2,R2,R2 ; R2 = 0.0
filter MPYF3 *AR0++(1)%,*AR1++(1)%,R1
 ADDF3 R1,R2,R2 ; y3(n) = w[].x[]
 ADDF R1,R2 ; include last result
; Output y2(n) signals
; pass(y3); /* pass y3 to processor #1 */
 STF R2,*+AR5(1) ; send y3(n) to #1 C40
; Input error signal e(n)
; receive(e); /* receive e(n) form processor #1 */
 LDF *AR5,R7 ; load e(n) from #1 C40
; Update weights w(n)
; xptr = &x[N3-1];
; wptr = &w[N3-1];
; pass (*xptr); /* pass x(n-N1-N2-N3) to processor #4 */
; for (i=N3;i>0;i--){
; *wptr-- += mu * e *xptr--;
; *(xptr+1) = *xptr; /* delayed tap is implemented
; in circular buffer */
; }
;
 LDI order3-3,RC ; initialize repeat counter
 RPTBD weight ; do i = 0, N3-3
 MPYF3 R7,*AR0++(1)%,R1 ; R1 = err * x(n)
 ADDF3 R1,*AR1,R2 ; R2 = wi(n) + err * x(n)
 NOP

 MPYF3 R7,*AR0++(1)%,R1 ; R1 = err * x(n-i-1)
 STF R2,*AR1++(1)% ; update wi(n+1)
weight ADDF3 R1,*AR1,R2 ; R2 = wi(n) + err * x(n-i)

 LDF *AR0,R6
 STF R2,*AR1++(1)% ; update wi(n+1)
 BD input ; delay branch
 MPYF3 R7,*AR0,R1 ; R1 = err * x(n-N+1)
 STF R6,*+AR7(1) ; shift x(n-N) to #4 C40
 ADDF3 R1,*AR1,R2 ; R2 = wi(n-N+1) + err * x(n-N+1)
 STF R2,*AR1++(1)% ; update last w

; Define constants
xn .usect "buffer",order3
wn .usect "coeffs",order3
 .data
C40addr1 .word C40_3_1
C40addr2 .word C40_3_2
C40addr4 .word C40_3_4
xn_addr .word xn

wn_addr .word wn
 .end






[LISTING NINE]

******************************************************************
* LMS4 : Cascade TMS320C40 adaptive filter #4 Using Transversal
* Structure and LMS Algorithm, Looped Code
* Configuration:
* d(n) --------------------------+
* 
* e(n) +
* +-----<-----(SUM)
* -
* --------+-------- 
* x(n) ----Adaptive Filter-----+--------> y(n)
* -----------------
* +--------<-------+-------<--------+-------<--------+
* y2(n) y3(n) y4(n)
* y(n)<-+ 
* +----+----+ +----+----+ +----+----+ +----+----+
* +--TMS320C40x(n1) TMS320C40x(n2) TMS320C40x(n3) TMS320C40
* x(n)----> -----> -----> -----> 
* +-> # 1 # 2 # 3 # 4 
* +----+----+ +----+----+ +----+----+ +----+----+
* d(n)--+ 
* e(n) 
* +-------->-------+------->--------+------->--------+
* where n1 = n-N1, n2 = n-N1-N2, and n3 = n-N1-N2-N3
* Algorithm for processor #4:
* N4-1
* y4(n) = SUM w(N1+N2+N3+k)*x(n-N1-N2-N3-k) k=0,1,2,...,N4-1
* k=0
* w(N1+N2+N3+k) = w(N1+N2+N3+k) + u*e(n)*x(n-N1-N2-N3-k) k=0,1,2,...,N4-1
* where filter order N = N1 + N2 + N3 + N4 and u is the step size mu.
**********************************************************************
 .include "const.h" ; include the constant definition file
 .sect "vector"
reset .word begin
; Initialize pointers and arrays
; xptr = &x[0];
; wptr = &w[0];
; for (i=0;i<N4;i++){
; *xptr++ = 0.0;
; *wptr++ = 0.0;
; }
 .text
begin .set $
 LDP @C40addr1 ; set data page
 LDI 0,R2 ; R2 = 0
 LDF 0.0,R1 ; R1 = 0.0
 LDI @C40addr1,AR5 ; set pointer for #1 C40 comm port
 LDI @C40addr2,AR6 ; set pointer for #2 C40 comm port
 LDI @C40addr3,AR7 ; set pointer for #3 C40 comm port

 LDI @xn_addr,AR0 ; set pointer for x[]
 LDI @wn_addr,AR1 ; set pointer for w[]
 STI R2,*-AR5(1) ; enable #1 C40 comm port
 STI R2,*-AR6(1) ; enable #2 C40 comm port
 STI R2,*-AR7(1) ; enable #3 C40 comm port
 RPTS order4-1
 STF R1,*AR0++(1)% ; x[] = 0.
 STF R1,*AR1++(1)% ; w[] = 0.
 LDI order4,BK ; set up circular buffer
input:
; Compute filter output y(n)
; xptr = &x[0];
; wptr = &w[0];
; receive(x); /* receive x(n-N1-N2-N3) from processor #3 */
; *xptr = x;
; for (i=0;i<N4;i++)
; y4 += *xptr++ * *wptr++;
 LDI order4-2,RC
 RPTBD filter
 LDF *AR7,R6 ; input x(n)
 STF R6,*AR0 ; insert x(n) to buffer
 MPYF3 *AR0++(1)%,*AR1++(1)%,R1
 SUBF3 R2,R2,R2 ; R2 = 0.0
filter MPYF3 *AR0++(1)%,*AR1++(1)%,R1
 ADDF3 R1,R2,R2 ; y4(n) = w[].x[]
 ADDF R1,R2 ; include last result
; Output y4(n) signals
; pass(y4); /* pass y4 to processor #1 */
 STF R2,*+AR5(1) ; send y4(n) to #1 C40
; Input error signal e(n)
; receive(e); /* receive e(n) form processor #1 */
 LDF *AR5,R7 ; load e(n) from #1 C40
; Update weights w(n)
; xptr = &x[N4-1];
; wptr = &w[N4-1];
; for (i=N3;i>0;i--){
; *wptr-- += mu * e *xptr--;
; *(xptr+1) = *xptr; /* delayed tap is implemented
; in circular buffer */
; }
 LDI order4-3,RC ; initialize repeat counter
 RPTBD weight ; do i = 0, N4-3
 MPYF3 R7,*AR0++(1)%,R1 ; R1 = err * x(n)
 ADDF3 R1,*AR1,R2 ; R2 = wi(n) + err * x(n)
 NOP

 MPYF3 R7,*AR0++(1)%,R1 ; R1 = err * x(n-i-1)
 STF R2,*AR1++(1)% ; update wi(n+1)
weight ADDF3 R1,*AR1,R2 ; R2 = wi(n) + err * x(n-i)

 BD input ; delay branch
 MPYF3 R7,*AR0,R1 ; R1 = err * x(n-N+1)
 STF R2,*AR1++(1)% ; update wi(n+1)
 ADDF3 R1,*AR1,R2 ; R2 = wi(n-N+1) + err * x(n-N+1)
 STF R2,*AR1++(1)% ; update last w

; Define constants
xn .usect "buffer",order4
wn .usect "coeffs",order4

 .data
C40addr1 .word C40_4_1
C40addr2 .word C40_4_2
C40addr3 .word C40_4_3
xn_addr .word xn
wn_addr .word wn
 .end























































January, 1992
THE FIVE LEVELS OF RAID


Mike Wiebel and Steve Johnson


Mike is a development engineer for StorageTek's Louisville Modeling Group. He
studies the performance of advanced disk and tape system architectures through
simulation and can be reached via Internet at mikew-@stortek.com or at
303-673-7082. Steve manages the Longmont Modeling Group for StorageTek. He is
involved in the simulation of sophisticated real-time systems and teaches
programming at the University of Colorado. Steve can be contacted at
StorageTek.


In the competitive world of sophisticated, real-time fault-tolerant systems,
it is imperative to know what kind of performance a system will give before
proceeding with its development. Imagine investing in a new computer system
only to discover that it doesn't perform any better than the old one!
Fortunately, programmers can use discrete-event simulation to model both
existing and projected real-time system performance, thus saving development
time and money. This article describes a revolutionary concept in data storage
and retrieval known as RAID (Redundant Arrays of Inexpensive Disks) and shows
how discrete-event simulation can be used not only to predict the performance
of different RAID architectures, but also to project performance over a
product's life cycle.


RAID


In the last five years, CPU speeds have dramatically increased, memory costs
have significantly dropped, and storage capacity has more than doubled. The
mechanical magnetic disk device access times, dominated by seek and latency
delays, have improved at a snail's pace compared to electronic device speed
improvements. From 1971 to 1981, the raw seek time for a high-end IBM disk
improved by a factor of only two, while the rotation time did not change.
Clearly, users are paying for speedy CPUs that spend too much time waiting for
I/O devices. RAID meets the challenge of improving I/O performance by
improving the disk subsystem back-end bandwidth while simultaneously improving
the reliability of the disk storage subsystem.
RAID technology is comprised of a disk subsystem based on the smaller, less
expensive disks developed for the PC. Several small disks grouped together
into an array form the virtual image of a larger, more expensive disk. Because
the failure rate of small disks is higher than that of large disks, the data
must be protected to guard against its loss. Several methods exist for
protecting the data in the subsystem, from simple mirroring to more
sophisticated error correction codes.


Five Levels of RAID


Researchers at the University of California, Berkeley published a few years
ago a paper describing five classifications of RAID. Each RAID level offers
different advantages in reliability and performance. Combining many small
disks forms an array of disks. Dividing this array into "parity" groups
improves the system performance by reducing error recovery time and the length
of disk accesses needed to perform an I/O. A parity group contains data disks
and additional data check disks that contain redundant data. The redundant
information on the backup disk recreates the lost data when a disk failure
occurs. To speed recovery time, the backup disk is "hot," residing in the
subsystem and activating immediately after failure occurs. The failed disk's
replacement then becomes the new hot backup disk. Every RAID level offers its
own performance or reliability advantages. An overview of the five levels of
RAID points out the advantages of each architecture.
The first level of RAID (shown in Figure 1) is a mirrored disk architecture,
in which every data disk has a duplicate backup disk. Synchronously writing
data to both disks ensures the availability and reliability of the data when
disk failure occurs. In a pure mirrored system, the check disks are invisible
to the software. Adding additional disk controllers and data paths makes these
disks visible and effectively increases the bandwidth of the subsystem. This
allows read operations to occur in parallel, slightly increasing subsystem
performance. Still, on the average, multiple arms seeking to the same cylinder
and waiting for the specified sector to rotate underneath the head take longer
than a single arm performing the same task, as the entire process waits for
the slowest disk. The real value of level one lies in its ability to protect
valuable data. The tremendous reliability advantage of mirrored disks has a
significant price, as half the storage space is effectively unused. This
expense was the driving force behind other RAID organizations.
The Hamming code and other forward error detection/correction codes are
veteran performers in the battle against data transmission errors. Second
level RAID architectures (see Figure 2) apply Hamming code principles to the
disk array to reduce the overhead of data reliability and allow for some error
correction without retransmission delays. The Hamming code associates
even-parity bits with unique combinations of data bits. By bit-interleaving
the data across the disks of a group and adding a sufficient number of check
disks, this technique can be duplicated in a RAID architecture. According to
Hamming code principles, ten data disks require four check disks, while 25
data disks require five. There is a significant performance advantage in
simultaneously streaming data from multiple drives. Level-two RAID outperforms
level one in large write and read-modify-write transactions and falls slightly
behind level one in doing large reads. Level-one architectures outclass level
two in all small data transfers. This is due to level two's need to access all
disks in a group to modify part of a sector, causing a reduction in the number
of small transactions processed in parallel. Level-two architectures are more
appropriate for supercomputers than intensive transaction-oriented
applications.
Level-three RAID (single check disk per group) eliminates most of the overhead
associated with error detection. Level two used parity checking to detect the
erroneous disk and correct it. For a disk subsystem, this is grossly
redundant, because the disk controller can determine if its disk failed. A
single check disk recovers the lost data when level three determines a disk
has failed (see Figure 3). Level-three RAID does not perform better than a
level-two architecture; its advantage is the greater storage capacity brought
about by the reduced overhead. On a per-disk basis, level three is slightly
better than level two because fewer check disks lead to better disk
performance. Level three is a poor performer in intensive,
transaction-oriented applications for the same reasons as level two; dismally
small I/O capabilities.
In levels two and three, the data is bit-interleaved across the disks in the
group. The Hamming code can then be utilized in the detection and correction
of errors. Level-three RAID advances the technology by using the disk
controller to detect and correct errors. The inability to do more than one I/O
at a time handicaps level-two and level-three RAIDs. Level four (see Figure 4)
stores data in block interleave fashion (a single transfer unit in a single
sector), enabling multiple data reads in the same group at one time.
Additionally, a single read only accesses one disk, so the entire transfer
takes place at device speeds. Small writes are expensive in this architecture,
due to the number of disk accesses required. This small data transfer requires
at least four moves: a reread of the old data and old parity, and the writing
of the new data and new parity. All writes must access the check disk, so this
clearly becomes the bottleneck in this architecture.


Fifth-Level RAID: No Single Check Disk


The level-five architecture (Figure 5) distributes check information across
all disks, giving level five the ability to do parallel writes. Writes to
different sectors are accompanied by check information written to different
disks. This removes the bottleneck of a single check disk. The improvement in
small read-modify-write events makes level five a viable alternative in
transaction-processing orientations. Level five also inherits the excellent
large read/write performance level achieved by previous architectures, making
it ideal for supercomputer activities as well. Another improvement, not
specific to level five, can be made in the disk subsystem. Adding buffers to
each drive and increasing the size of the smallest transfer unit to the size
of a physical track renders the time lost to rotational delays insignificant.
Instead of waiting for the requested sector to rotate into position, begin
transfer with the next sector and transfer the entire track, making allowances
for data received out of order. Another advantage of this is that if the next
sector requested was on the track just read and still resides in a buffer, no
disk access is needed to obtain that sector.


Study


StorageTek, the company we work for, is currently developing a RAID device and
is successfully applying discrete-event simulation to the development process.
By modeling subsystems, we can determine their potential benefits during the
development phase and project the effects of design decisions. The StorageTek
Corporate Modeling Group develops its models in C and CSIM, a portable C
function library that supports discrete-event simulation constructs such as
multiple threads of execution. Because CSIM is a C function library, C
programmers can create models without having to struggle with unknown
languages and support tools. Using C in discrete-event simulation modeling
lets us take advantage of the language's power, portability, and efficiency.
Developing models of RAID levels one, three, and five and comparing the
results to those of a typical disk-device model demonstrates the usefulness of
discrete-event simulation in the development of real-time systems. Of the five
architectures discussed here, levels one, three, and five hold the most
promise. Running simulations of each during sub-system backup aids in
demonstrating their potential. Even in RAID architectures, backup is necessary
due to the potential hazards associated with human interaction (delete all,
for example). In mainframe environments, MIS managers try to shorten the
window of down time due to backups, allowing them to maximize their
multimillion dollar investment for commercial applications. Disk performance
is the limiting factor in performing backups and RAID technology significantly
reduces backup time requirements.


Model Descriptions


SLED (Single Large Expensive Disk) simulates the performance of a typical
single large expensive disk, such as the IBM 3390, during backup. The code
implementing SLED is shown in sled.c (Listing One, page 78). The system
boundaries for this experiment extend only around the disk subsystem, its
controller, and the access to these devices. This simulation has one
transaction that enters the subsystem, reads a track from the device, and
sends the data out through the bus to the control unit and out through the
channel toward the host. The transaction continues to read tracks until it
finishes backing them all up. In the model, LATENCY defines the time to read
one track from the device, because the maximum rate at which a track can be
read from the disk is the time it takes the disk to rotate once. Predicting
the finish time analytically using the formula Backup time = number_cylinders
* track_per_cylinder * LATENCY gives us an expected time of 7.9 minutes. The
simulation, which also includes latency delays incurred during cylinder
switching, predicts a finish time of 8.09 minutes, well within range of the
analytical prediction. Total device capacity is 1.90 gigabytes, so the device
backs up at a rate of 3.9 Mbytes per second.
Lvl1.c (Listing Two, page 78) is a simulation model of a level-one subsystem.
It involves two identical devices in a backup simulation, one a mirror of the
other. Adding additional controllers and channels to the subsystem allows
access to different tracks simultaneously, aiding in the realization of the
subsystem's potential. In this model, each device has its own thread of
execution operating upon it. Each thread reads the next track to be backed up
and sends it up through the control unit toward the host. As in sled.c, the
rotational speed of the device limits the transfer times. The advantage lies
in this subsystem's ability to transfer two tracks simultaneously, nearly
doubling the back-up rate of sled.c. Total device capacity is 1.90 gigabytes
as in sled.c, but increased bandwidth on the back end reduces backup time to
4.48 minutes. RAID level one backs up at a rate of 7.06 Mbytes per second, an
80 percent improvement over the typical device.
Lvl3.c (Listing Three, page 79) simulates a level-three implementation of
RAID. This model has a data group of four disks with one check disk. The disks
are smaller and slightly slower than the disks used in the previous models but
their combined storage capacity is nearly three times that of sled.c and
lvl1.c. The model operates along the same principles as the previous
simulations. The model generates a separate thread of execution for each disk
device in the subsystem. These threads then access the track to be backed up
and transfer the data toward the host. Lvl1.c has more detail at the control
unit level. The typical disk (SLED) and level one have one control unit for
each storage device. The RAID architectures used here have one main control
unit that oversees activity on all disks in the array. Previous models used
4.5-Mbytes-per-second channels to transfer the data from the control unit to
the host. These models had one channel for each disk device. Level three uses
only two channels to transfer information to the host. Replacing the 4.5-Mbyte
channels with 10-Mbyte channels prevents the transfer to the host from
becoming a bottleneck. A transaction must wait at the control unit for a
channel to become available before transferring the track toward the host. One
of the five devices contains only redundant check data, so these devices
contain 5.77 gigabytes of data transferred in 12.55 minutes--a rate of 7.66
Mbytes per second. At this point, the need for buffering at the control unit
level becomes clear. By buffering the tracks at the control-unit level it is
possible to improve the backup speed. In this simulation, the same amount of
data can be transferred in 10.15 minutes, or 9.47 Mbytes per second. This a
significant improvement over both the typical device and the level-one
architecture.
The level-five advantage is parallelism: More than one transfer can take place
at a time. The concept of "staging" takes advantage of this new capability.
Reading data from the RAID device causes the loading, or staging, of the data
next to the desired data. If the next piece of information called for resides
in the buffer, then it eliminates a disk access and the transfer takes place
at channel speeds. In lvl5.c (Listing Four, page 80), two separate operations
take place concurrently. A staging operation joins the backup operation. The
backup operation performs as before, obtaining data (tracks) and moving it
through the system toward the host. The staging operation is constantly
monitoring the control-unit buffer to decide if another track should be
staged. For this operation, when the track lies in the buffer device, speeds
are no longer the limiting factor; the track transfers at channel speeds.
Factors that affect the performance of the backup operation are group size,
buffer size, and the number of channels. Increasing the group size and the
buffer size reduces the time spent waiting for data from the device. If the
data is on one device, the speed of the transfer is the speed of the device.
By spreading the data out over many devices, the speed of the transfer is the
combined power of the devices. This concept is known as "scaling." A
level-five architecture simulation with a group size of 50 devices using
staging achieved a backup rate of 20 Mbytes per second, a 150-percent
improvement over a staged level five with a group size of five devices.


Conclusion


Discrete-event simulation greatly simplifies the prediction and comparison of
the five levels of RAID. Through discrete-event simulation, this article
examined the performance capabi ities of three levels of RAID architectures
and compared the results with a model of a SLED device performing the same
operation. Simulation makes it possible to determine the effect of channel
speed improvements, increased group sizes, greater track capacities, and many
important known and unknown design decisions. The effect of many of these
changes can be seen here using the simplified models presented in this
article; most modifications are as easy as changing a variable.
Simulation used to be an expensive luxury because of the long runtimes and
enormous memory requirements of the models. High-powered languages such as C
and C++, the plummeting price of memory, and the increasing speed of
microprocessors all add up to make simulation an increasingly available design
tool. The use of simulation enables an engineer to find the strengths and
weaknesses of concepts such as RAID, avoiding the pitfalls before it's too
late.



Acknowledgments


The authors would like to recognize the research efforts of David A.
Patterson, Garth Gibson, and Randy H. Katz and UC Berkeley's department of
Electrical Engineering and Computer Science. It is through their efforts that
RAID technology is becoming a reality.


References


Hamming, R.W. "Error Detecting and Correcting Codes." The Bell System
Technical Journal (April, 1950).
Harker, J.M. "A Quarter Century of Disk File Innovation." IBM Journal of
Research and Development (September, 1981).
Patterson, D.A., G.A. Gibson, and R.H. Katz. "A Case for Redundant Arrays of
Inexpensive Disks (RAID)." ACM SIGMOD 88 (Chicago, 1988).

_THE FIVE LEVELS OF RAID_
by Mike Wiebel and Steve Johnson


[LISTING ONE]


 /* sled.c */

#include <stdio.h>
#include <assert.h>
#include <csimdefs.h>

#define HMSEC(x) ((TIME)((x))) /* 1.0e-05 second */
#define MSEC(x) ((TIME)(100.0 * (x))) /* milli second */
#define SEC(x) MSEC((1000.0 * (x))) /* second */
#define MINUTE(x) SEC((60.0 * (x))) /* minute */
#define HOUR(x) MINUTE((60.0 * (x)))

#define NUM_DISKS 1 /* num. disks in simulation */
#define NUM_FE 1L /* num. front end channels */
#define NUM_BE 1L /* num. back end channels */
#define SEEK_TIME 3.83 /* avg. num. of milliseconds spent seeking */
#define LATENCY 14.22 /* milliseconds required for one rotation */
#define NUM_CYLINDERS 1113 /* num. cylinders on disk */
#define TRACKS_PER_CYL 15 /* num. tracks per cylinder */

SS_PTR cu; /* CSIM control unit pointer */
SS_PTR disk_drive; /* CSIM disk drive pointer */
MS_PTR fe_bus; /* CSIM front end channel pointer */
MS_PTR be_bus; /* CSIM back end channel pointer */

static long track=0L; /* current track backing */
static int cylinder=0; /* current cylinder backing */
static long tot_tracks=0L; /* total tracks backed */

void backup_status(done)
int *done;
{

 /* determine if this disk drive is backed up. */

 track++;
 tot_tracks++;
 if (tot_tracks == NUM_DISKS * NUM_CYLINDERS * TRACKS_PER_CYL)

 *done = 1;
}

ECODE seek()
{
long num_fe; /* num. front end channels available */

 if(track == 1){ /* first track requires seek */
 XacAdvExponen(MSEC(SEEK_TIME));
 }
 if(track == TRACKS_PER_CYL){
 XacAdvance(MSEC(LATENCY));/* switch cylinder and rotate */
 track = 0;
 cylinder++;
 }

 /* if channel is available transmit data; else rotational delay */

 MAvail(fe_bus,&num_fe);
 while(!num_fe){
 XacAdvance(MSEC(LATENCY));
 MAvail(fe_bus,&num_fe);


 /* sled.c - 2 */
 }
 return(SUCCESS);
}

ECODE send_data()
{
double xfr_time;

 /* transfer data back to host */

 xfr_time = LATENCY;
 MSeize(fe_bus,1L);
 XacAdvance(MSEC(xfr_time));
 MRelease(fe_bus,1L);
 return(SUCCESS);
}

int sim_main(argc, argv)
int argc;
char *argv[];
{
int done=0;
int i;

 /* configure system and set up CSIM statistics */

 SServer(&cu,"control unit statistics");
 SServer(&disk_drive,"disk drive statistics");
 MServer(&fe_bus,NUM_FE,"front end channels");
 MServer(&be_bus,NUM_BE,"back end channels");

 /* set simulation time */

 SimWarmUp(0,0);

 SimRun(MINUTE(30),1);

 /* simulate backup */

 SSeize(disk_drive);
 SSeize(cu);
 MSeize(be_bus,1L);
 for(i=0; i<NUM_DISKS; i++){
 do{
 backup_status(&done);
 assert(seek() == SUCCESS);
 assert(send_data() == SUCCESS);
 }while(!done);
 done = 0;
 track = 0;
 }
 printf("Backup complete\n");
 SimPrint();
 SRelease(cu);
 MRelease(be_bus,1L);
 SRelease(disk_drive);
 XacTerminate();
}






[LISTING TWO]

 /* RAID level 1 */
#include <stdio.h>
#include <assert.h>
#include <csimdefs.h>

#define HMSEC(x) ((TIME)((x))) /* hund.milli second */
#define MSEC(x) ((TIME)(100.0 * (x))) /* milli second */
#define SEC(x) MSEC((1000.0 * (x))) /* seconds */
#define MINUTE(x) SEC((60.0 * (x))) /* minutes */
#define HOUR(x) MINUTE((60.0 * (x)))

#define NUM_DISKS 1 /* one disk; other disk is mirror */
#define NUM_FE 2L /* num. front end channels */
#define NUM_BE 2L /* num. back end channels */
#define NUM_CU 2L /* num. of control units */
#define SEEK_TIME 3.83 /* avg. num. of milliseconds spent seeking */
#define LATENCY 14.22 /* avg. num. milliseconds spent seeking */
#define NUM_CYLINDERS 2226
#define TRACKS_PER_CYL 15

MS_PTR cu; /* CSIM control unit pointer */
MS_PTR disk_drive; /* CSIM disk drive pointer */
MS_PTR fe_bus; /* CSIM front end channel pointer */
MS_PTR be_bus; /* CSIM back end channel pointer */
Q_PTR tsend;

static long track=0L;
static int curr_cylinder=0;


void backup_status(done)
int *done;
{
 /* determine if this disk drive is backed up. */

 track++;
 if (track == TRACKS_PER_CYL){
 track = 0;
 curr_cylinder++;
 }
 if (curr_cylinder == NUM_CYLINDERS * NUM_DISKS)
 *done = 1;
}

ECODE seek(cylinder_backing)
int *cylinder_backing;
{
long num_fe;

 if(curr_cylinder == 0 && track == 1){ /* first track requires seek */
 XacAdvExponen(MSEC(SEEK_TIME));
 }
 if(*cylinder_backing != curr_cylinder){
 XacAdvance(MSEC(LATENCY));
 *cylinder_backing = curr_cylinder;
 }
 return(SUCCESS);
}




 /* lvl1.c - 2 */

ECODE send_data()
{
double xfr_time;


 /* transfer data back to host */
 xfr_time = LATENCY;

 MSeize(fe_bus,1L);
 XacAdvance(MSEC(xfr_time));
 MRelease(fe_bus,1L);
 return(SUCCESS);
}

int sim_main(argc, argv)
int argc;
char *argv[];
{
XAC_PTR xp;
BOOLEAN orig;
static int done=0;
int cylinder_backing=0;
int i;


/* configure system and set up CSIM utilities */
 MServer(&disk_drive,2L,"Disk drive server statistics");
 MServer(&cu,NUM_CU,"Control unit server statistics");
 MServer(&fe_bus,NUM_FE,"front end channel statistics");
 MServer(&be_bus,NUM_BE,"back end channel statistics");
 Queue(&tsend,"Time to send data");
 /* set simulation time */
 SimWarmUp(0,0);
 SimRun(MINUTE(17),1);
/* simulate backup */
 XacSplit(&xp,&orig); /* allow each disk subsys to transfer */
 MSeize(cu,1L);
 MSeize(be_bus,1L);
 MSeize(disk_drive,1L);
 for(i=0; i<NUM_DISKS; i++){
 do{
 assert(seek(&cylinder_backing)==SUCCESS);
 QEnter(tsend);
 assert(send_data()==SUCCESS);
 QLeave(tsend);
 backup_status(&done);
 }while(!done);
 done = 0;
 track = 0;
 }
 SimPrint();
 MRelease(cu,1L);
 MRelease(be_bus,1L);
 MRelease(disk_drive,1L);
 XacTerminate();
}





[LISTING THREE]

 /* RAID level 3 */

#include <stdio.h>
#include <csimdefs.h>

#define HMSEC(x) ((TIME)((x))) /* 1.0e-05 second */
#define MSEC(x) ((TIME)(100.0 * (x))) /* milli second */
#define SEC(x) MSEC((1000.0 * (x))) /* second */
#define MINUTE(x) SEC((60.0 * (x))) /* minute */
#define HOUR(x) MINUTE((60.0 * (x)))

#define GRP_SIZE 4 /* no. data disks in parity group */
#define NUM_CHECK 1 /* no. check disks in group */
#define NUM_CHAN 2L /* no. of channels */
#define NUM_BUSS 5L /* no. busses */
#define SEEK_TIME 11.5 /* avg. num. of milliseconds spent seeking */
#define LATENCY 16.66 /* milliseconds required for one rotation */
#define TRACK_CAPACITY 40.0 /* capacity in KB */
#define XFR_RATE 10000.0 /* KB transferred per sec */
#define NUM_CYLINDERS 1900 /* no. useable cylinders on device */
#define TRACKS_PER_CYL 19 /* no. tracks per cylinder */

#define SWITCH_TIME 4 /* milliseconds */

SS_PTR raid_cu; /* CSIM RAID control unit pointer */
MS_PTR data_disk; /* CSIM array of data disks pointer */
SS_PTR check_disk; /* CSIM check disk pointer */
MS_PTR buss; /* CSIM array of busses pointer */
MS_PTR channel; /* CSIM array of channels pointer */


ECODE seek(track_tally,original,track_count)
long track_tally;
BOOLEAN original;
int *track_count;
{

 if (original){
 SSeize(check_disk);
 }
 else{
 MSeize(data_disk,1L);
 }
 if (!track_tally){
 XacAdvExponen(MSEC(SEEK_TIME));
 }
 if (!(*track_count) && track_tally){/* switch cylinders */
 XacAdvance(MSEC(SWITCH_TIME));
 }
 (*track_count)++;
 if (*track_count == TRACKS_PER_CYL)
 *track_count = 0;

 if (original){
 SRelease(check_disk);
 }
 else{
 MRelease(data_disk,1L);
 }
 return(SUCCESS);
}

 /* lvl3.c - 2 */

ECODE transfer()
{
double xfr_time;
XAC_PTR xp;
BOOLEAN orig;

 /* send data to control unit */

 MSeize(buss,1L);
 XacAdvance(MSEC(LATENCY));
 MRelease(buss,1L);

 /* send data to host */

 XacSplit(&xp,&orig);
 if (!orig){
 xfr_time = TRACK_CAPACITY/XFR_RATE;

 MSeize(channel,1L);
 XacAdvance(SEC(xfr_time));
 MRelease(channel,1L);
 XacTerminate();
 }
 return(SUCCESS);
}

int sim_main(argc, argv)
int argc;
char *argv[];
{
static int done=0; /* flag indicating completion */
int track_count=0;
long track_tally=0; /* count of tracks backed up */
long backup_goal; /* total number of tracks to back up */
int i; /* counter */
XAC_PTR xp; /* CSIM transaction pointer */
BOOLEAN original; /* flag indicating presence of original xac */

 /* configure system and set up CSIM */

 SServer(&raid_cu,"Control unit server statistics");
 SServer(&check_disk,"check disk statistics");
 MServer(&data_disk,GRP_SIZE,"data disk statistics");
 MServer(&buss,NUM_BUSS,"buss statistics");
 MServer(&channel,NUM_CHAN,"channel statistics");

 /* set simulation period */

 SimWarmUp(0,0);
 SimRun(MINUTE(30),1);

 for(i=0; i<GRP_SIZE; i++){ /* generate xac for each disk */
 XacSplit(&xp,&original);
 if (!original)
 break;
 }

 /* backup subsystem */


 /* lvl3.c - 3 */

 backup_goal = TRACKS_PER_CYL * NUM_CYLINDERS;
 while(!done){
 seek(track_tally,original,&track_count);
 transfer();
 track_tally++;
 if (track_tally == backup_goal){
 done = 1;
 SimPrint();
 }
 }
 XacTerminate();
}







[LISTING FOUR]

 /* RAID level 5 */

#include <stdio.h>
#include <assert.h>
#include <csimdefs.h>

#define HMSEC(x) ((TIME)((x))) /* 1.0e-05 second */
#define MSEC(x) ((TIME)(100.0 * (x))) /* milli second */
#define SEC(x) MSEC((1000.0 * (x))) /* second */
#define MINUTE(x) SEC((60.0 * (x))) /* minute */
#define HOUR(x) MINUTE((60.0 * (x)))

#define GRP_SIZE 50 /* no. data disks in parity group */
#define NUM_CHAN 2L /* no. of channels */
#define NUM_BUSS 50L /* no. busses */
#define SEEK_TIME 11.5 /* avg. num. of milliseconds spent seeking */
#define LATENCY 16.66 /* milliseconds required for one rotation */
#define TRACK_CAPACITY 40.0 /* capacity in KB */
#define XFR_RATE 10000.0 /* KB transferred per sec */
#define NUM_CYLINDERS 1900 /* no. useable cylinders on device */
#define TRACKS_PER_CYL 19 /* no. tracks per cylinder */
#define SWITCH_TIME 4 /* milliseconds */
#define BUF_SZ 50 /* number of tracks buffer will hold */

SS_PTR raid_cu; /* CSIM RAID control unit pointer */
SAS_PTR data_disk; /* CSIM array of data disks pointer */
MS_PTR buss; /* CSIM array of busses pointer */
MS_PTR channel; /* CSIM array of channels pointer */
long buffer[BUF_SZ]; /* RAID control unit buffer */
long update[BUF_SZ]; /* RAID buffer lock */
static long last_buf=0L; /* last track written to buffer */

void buffer_search(track,bingo,slot)
long track;
int *bingo;
int *slot;
{
int i;

 *bingo = 0;
 for(i=0; i<BUF_SZ; i++)
 if(buffer[i] == track){
 *bingo = 1;
 *slot = i;
 }
}

ECODE buf_get(slot)
int slot;
{
double xfer_time;
XAC_PTR xp;
BOOLEAN orig;


 xfer_time = TRACK_CAPACITY / (double)XFR_RATE;
 XacSplit(&xp,&orig);
 MSeize(channel,1L);
 XacAdvance(SEC(xfer_time/2.0));
 MRelease(channel,1L);


 /* lvl5.c - 2 */
 if(!orig){
 XacTerminate();
 }
 buffer[slot] = -1; /* mark slot as writeable */
 return(SUCCESS);
}

int find_track(track)
long track;
{
int number;

 number = track;
 while(number >= GRP_SIZE)
 number -= GRP_SIZE;
 return(number);
}

ECODE track_get()
{

 MSeize(buss,1L);
 MSeize(channel,1L);
 XacAdvance(MSEC(LATENCY));
 XacAdvance(MSEC(1));
 MRelease(channel,1L);
 MRelease(buss,1L);
 return(SUCCESS);
}

ECODE update_buf(track)
long track;
{
XAC_PTR xp;
BOOLEAN orig;
static int i;
static int j;
int slot;
long next;
int d_num;

 j = 0;
 for(i=0; i<BUF_SZ; i++){
 XacSplit(&xp,&orig);
 if(!orig) break;
 }
 if(!orig){
 slot = j++;
 if(buffer[slot] < track && update[slot] != 1){
 update[slot] = 1;
 next = last_buf++;

 d_num = find_track(next);
 SASeize(data_disk,d_num);
 XacAdvance(MSEC(LATENCY));
 SARelease(data_disk,d_num);
 buffer[slot] = next;
 update[slot] = 0;
 }

 /* lvl5.c - 3 */

 XacTerminate();
 }
 return(SUCCESS);
}

int sim_main(argc, argv)
int argc;
char *argv[];
{
XAC_PTR xp; /* CSIM transaction pointer */
BOOLEAN original; /* flag indicating presence of original xac */
static int done=0;
static long track=0L;
int disk_num;
int i;
int slot;
int bingo;
BOOLEAN avail;

 /* configure system and set up CSIM */

 SServer(&raid_cu,"Control unit buffer statistics");
 SArray(&data_disk,GRP_SIZE,"data disk statistics");
 MServer(&buss,NUM_BUSS,"buss statistics");
 MServer(&channel,NUM_CHAN,"channel statistics");

 for(i=0; i<BUF_SZ; i++){ /* mark buffer as empty */
 buffer[i] = -1;
 update[i] = 0;
 }

 /* set simulation period */

 SimWarmUp(0,0);
 SimRun(MINUTE(30),1);

 while(!done){
 XacSplit(&xp,&original);
 if(original){
 buffer_search(track,&bingo,&slot);
 if(bingo){
 assert(buf_get(slot) == SUCCESS);
 }
 else{
 disk_num = find_track(track);
 SAAvail(data_disk,disk_num,&avail);
 SASeize(data_disk,disk_num);
 buffer_search(track,&bingo,&slot);
 if(bingo){

 assert(buf_get(slot) == SUCCESS);
 }
 else{
 assert(track_get() == SUCCESS);
 }
 SARelease(data_disk,disk_num);
 }


 /* lvl5.c - 3 */

 if (track == GRP_SIZE * TRACKS_PER_CYL * NUM_CYLINDERS)
 done = 1;
 track++;
 }
 else{ /* maintain buffer */
 assert(update_buf(track) == SUCCESS);
 XacTerminate();
 }
 }
 printf("Backup complete\n");
 SimPrint();
}







































January, 1992
WRITABLE INSTRUCTION SET COMPUTERS


CISC + RISC = WISC




Jack J. Woehr


Jack is a senior project manager at Vesta Technology Inc., in Wheat Ridge,
Colo. Jack can be reached as jax@well.UUCP, as JAX on GEnie, or as the Sysop
of the RCFB BBS, 303-278-0364.


Writable Instruction Set Computers, or "WISC" for short, are devices that
blend the best of CISC and RISC into a single simple architecture. Because of
their simplicity, WISC devices allow the programmer to tinker with the
internals of the instruction set. In this article, I examine WISC
Technologies' (Box 429, La Honda, CA 94020) CPU/16 processor and put it
through some of its paces by writing the microcode for an oddball instruction
or two.


Dual-Stack WISC


The CPU/16 is a dual-stack word-addressing central processing unit designed in
discrete logic. At power up, the entire CPU/16 microcode store is both empty
and writable. The CPU/16 is dependent on the host computer (nominally a PC or
AT, though other interfaces could be devised) for the powerup load of its
microcode and program. The microcode and program that come with the CPU/16
implement MVP-Forth/16 and contain a number of benchmark examples that
illustrate the speed attainable by application-specific instruction sets. Even
with the entire Forth system and special benchmark instructions, about half
the CPU/16 microcode store is unused and thus available to the ambitious
microcode composer.
The microassembler, microcode, and cross-assembler that come with the CPU/16
are provided both compiled and in MVP-Forth source code form. The WISC
programmer is thus confronted with the paradox of one of the purest Forth
programs ever written, coupled to an old-fashioned, file-less BLOCK-only Forth
development system. No matter: As indicated, the source is one of the classics
of Forth, sparsely commented and eminently comprehensible. It takes about four
hours reading to grasp the essentials of WISC development.
The CPU/16 has two stacks, DS and RS, each with its addressable stack pointer,
DP and RP. The top item on each stack may be an input to the ALU. In addition,
DHI serves as a top-of-data-stack cache and as input B to the ALU, and is
paired with DLO for shifts. DLO also serves as a sort of scratch register.
RAM is addressed using the program counter, PC, as a RAM pointer. For this
reason, the program counter value is automatically latched at the beginning of
every microcode instruction into a register called PCSAVE, so that the program
counter may later be restored if it is used as a RAM pointer in the course of
execution of an instruction.


Warren Abstract Machine "deref"


Much in the manner that Forth can be expressed as a virtual machine design the
silicon embodiment of which has been attempted in projects such as the Novix,
Harris RTX, FRISC-32 (SC32), and the CPU/16 itself, Prolog, too, has its
virtual machine architecture (known as the "Warren Abstract Machine" after
David Warren, one of the original proponents of such a concept).
Using Hassan Ait-Kaci's Warren's Abstract Machine (MIT Press, 1991) as my
guide, Listing One (page 89) implements the WAM "deref" procedure as a single
instruction, WAM-DEREF.
The Warren Abstract Machine "deref" procedure is described by Mr. Ait-Kaci as
that "...which, when applied to a store address, follows a possible reference
chain until it reaches either an unbound REF cell or a non-REF cell, the
address of which it returns." In other words, the first in a series of Prolog
variables, each of which points to the next, may be resolved to a final
unbound variable (that is, a variable which points to itself) or to a final
nonvariable (an object tagged otherwise than as a variable).
WAM-DEREF assumes that references are stored in the order [tag],[address].
WAM-DEREF first checks that the object whose address is on top of the data
stack does not point to itself. If it does, that same address will be
returned. If the object points to a different object, WAM-DEREF will get a
copy of the current tag and proceed to that distant reference.
If the tag of the new object does not match the previous object's tag, the new
object's address is returned and WAM-DEREF exits.
If the tag of the previous object is identical to the tag of the new object,
then the new object is examined to see if it points to itself. If so, its
address is returned and WAM-DEREF exits. If the new object points to some
other object, the program counter is reloaded without increment so that the
invocation of the WAM-DEREF instruction repeats, but with the latest address
in the top of stack cache.
WAM-DEREF makes no assumption about the tag's nature. I tested this
instruction by creating the following structures:
CREATE FOO 0,0, CREATE WOOF 1, FOO, CREATE ZOTZ 1, WOOF,
at which point ZOTZ WAM-DEREF returned the address of FOO. Changing WOOF to
point to itself via WOOF DUP 1+! on the word-addressing CPU/16 causes ZOTZ
WAM-DEREF to return the address of WOOF, as does changing WOOF's tag to any
number other than 1.


Multiple-page Microcode


The WISC microcode store consists of 256 pages of eight 32-bit micro
instructions. Through absolute branches (for example, JMP=110) and conditional
branches (that is, JMP=00E) are possible within a microcode page and the
microcode page counter can be incremented to the next page by a
microinstruction, there is no way to decrement the microcode page counter.
This is the reason for the trick employed in WAM-DEREF generates a repetitive
dereference by reloading the program counter with its previous unincremented
value.


Silicon in the Wind


In our dynamic profession, computer concepts appear, disappear, and reappear.
An example is the analog computer, which to a large extent disappeared two
decades ago. Yet the hardwired neural net, that precocious brat of the '80s,
is an analog computer. Who can tell when the time will come for the writeable
instruction set computer to have its day in the sun?

_WRITABLE INSTRUCTION SET COMPUTERS_
by Jack J. Woehr


[LISTING ONE]


\ DEREF a' la Warren Abstract Machine

DECIMAL
166 OPCODE: WAM-DEREF ( addr1 -- addr2)

0 :: SOURCE=ALU ALU=B DEST=PC ;; \ RAM pointer := TOS
1 :: INC[PC] ;; \ RAM pointer++
2 :: SOURCE=RAM ALU=AxnorB ;; \ @RAM pointer == TOS ??
3 :: SOURCE=RAM ALU=AxnorB ;; \ test takes 2 cycles
4 :: SOURCE=RAM ALU=A DEST=DLO INC[MPC] JMP=11E ;;
\ Jump to 6 if not equal, 7 if equal. Note that the microprogram counter is
\ incremented in this instruction. The next microinstruction will branch to
\the next page. Note also that this microinstruction saved the address pointer
\in DLO for later use.

6 :: SOURCE=ALU ALU=B DEST=PC JMP=000 ;;
\ Re-load RAM pointer with original address, we are continuing
\ via 0 on the next page, since we have run out of space here.

7 :: SOURCE=PCSAVE ALU=A+1 DEST=PC JMP=110 ;;
\ Reload Program Counter, we are leaving via 6 on the next page.

167 CURRENT-PAGE !

0 :: SOURCE=RAM ALU=A DEST=DHI ;; \ DHI := tag
1 :: SOURCE=DLO ALU=A DEST=PC ;; \ RAM pointer := next addr
2 :: SOURCE=RAM ALU=AxnorB ;; \ Compare tags
3 :: SOURCE=RAM ALU=AxnorB ;; \ Comparison takes two cycles
4 :: SOURCE=DLO ALU=A DEST=DHI JMP=11E ;; ( Jump = 6+boolean)
\ Jump to 6 is tags were equal, to 7 if different. This microinstruction loads
\ DHI (top of stack) with the current address under examination. Therefore, if
\ this instruction is re-entered, it will have same entry conditions as
\ previously, but starting at the next reference in the chain.

\ This is the exit pointed to by both microinstruction 7 on previous microcode
\ page, and by the conditional branch in instruction 4 on this page.
6 :: SOURCE=PCSAVE ALU=A+1 DEST=PC INC[MPC] JMP=101 ;;

\ Our exit is long and tortuous! We are exiting via 0 on the next page.
5 :: JMP=000 ;;
\ This microinstruction is the exit for both 6 and 7.

\ This is where "different tags" takes us ... we "fool" the CPU/16
\ into re-executing with original PC value.
7 :: SOURCE=PCSAVE ALU=A DEST=PC INC[MPC] JMP=101 ;;
\ We are looping by reloading Program Counter with the same address
\ as it contained at entry. But again we are out of room on this
\ microcode page, so we must finish on the next microcode page.

168 CURRENT-PAGE ! \ Increment microcode page under compilation.

0 :: DECODE ;; \ Latch new program counter value.
1 :: END ;; \ Exit to next instruction.

;;END



































































January, 1992
PROGRAMMING THE 29050


Taking the risk out of RISC




David L. Moore


Dave wrote the code generator and optimizer for YARC Systems' 29000 Fortran
compiler. He is the author of FTL Modula-2 for Z80, MS-DOS, and 68000 systems.
Dave can be contacted as DaMoore on BIX, or as davem@yarc.UUCP.


Reduced Instruction Set Computers (RISC) give consistently higher performance
than Complex Instruction Set Computers (CISC). In fact, the current generation
of RISC microprocessors is typically three times more powerful than the 80486
and 68040, the fastest CISC chips presently available.
Although RISC processors are used in the current generation of workstations
(where CISC machines are too slow), the greatest success RISC has enjoyed, at
least in terms of actual numbers of processors shipped, has been in embedded
systems applications. In these applications, the processor is used as part of
some machine--a laser printer or optical character recognition system--rather
than as a general-purpose computer.
Some embedded systems designers continue to use CISC because they believe RISC
processors are difficult to program. As the chips have become more
sophisticated, and the supporting tools more robust, these objections have
been overcome. Many projects that would have been coded entirely in assembler
for earlier CISC processors can now be written in a high-level language for
the faster RISC chips. Assembler coding is then left to the final tuning phase
of a project, if used at all. And with many RISC processors, assembly language
programming isn't that difficult anyway. In fact, because of flat address
space and more registers, RISC assembly programming is much easier than
programming on the 8086 family.
In this article, I concentrate on the AM29050, the most powerful member of the
Advanced Micro Devices' 29000 family. The AM29050 implements floating-point
operations on the chip using conventional instructions. In the earlier members
of the family, floating point is performed either in software, or by the
AM29027 coprocessor. The single-chip AM29050, however, provides much higher
performance than the 29000 with attached coprocessor; better still, the 29050
does not require special programming.


The AM29050 Instruction Set


The AM29050 is a three-address machine. Arithmetic and logic instructions
contain two operand fields and a result field. One of the operands can be a
literal value in the range 0-255 as in Example 1(a). The first operand is the
destination field. The first two instructions are straightforward adds. The
third instruction ("subtract reverse") subtracts gr97 from 2 and puts the
result in gr99. gr96 through gr99 are examples of global registers, which I'll
discuss shortly.
Example 1: (a) One of the operands can be a literal value in the range 0 to
255; (b) loads and stores always require the address of the load or store be
contained in a register; (c) commonly used values for cases in which the first
operand is 0, and the second controls the type of memory access.

 (a)
 [1] add gr96,gr97,gr98
 [2] add gr97,gr97,1
 [3] subr gr99,gr97,2

 (b)
 [1] load 0,0,gr96,gr97
 [2] store 0,0,gr96,gr98

 (c)
 [1] load 0,0,gr96,gr97 ;load full 32 bit word
 [2] load 0,0x01,gr96,gr97 ;load byte, zero fill
 [3] load 0,0x11,gr96,gr97 ;load byte, sign extend
 [4] load 0, 0x02, gr96, gr97 ;load half word, zero fill
 [5] load 0,0x12,gr96,gr97 ;load half word, sign extended

Loads and stores always require that the address of the load or store be
contained in a register; see Example 1(b). The first line loads the value
stored at the address given by gr97 into gr96. The second line stores this
value into the address given by gr98.
In addition to the source and destination register, the load and store
instructions have 8 bits of additional information. This is specified by the
first two operands of the instruction. The first operand specifies a 1-bit
value, while the second specifies a 7-bit value.
The first operand is normally 0. It is set to 1 to perform a load or store to
an attached coprocessor, such as the now obsolete AM29027, instead of to
memory. All communication with the AM29027 was carried out in this way. When
talking to the coprocessor, the second (7-bit) field was a coprocessor
command. I'll not go into these here because the AM29050 doesn't need an
attached floating-point coprocessor. The interface, however, is still
supported.
When the first operand is 0, the second controls the type of memory access.
The commonly used values are shown in Example 1 (c). (Note that the AM29050
assembler adopts the C notation "0xnnnn" for hexadecimal numbers.)
CISC assembly language programmers will find the operation of the AM29050's
jump instructions surprising. As with most RISC chips, each jump instruction
has an associated "delay instruction." The delay instruction is placed
immediately after the jump. It is executed while the jump is being processed.
When first programming the chip, it is easy to forget to code a delay
instruction for every jump. Experience does not remove the obligation of
trying to do something useful with this instruction, rather than simply coding
a noop. This is one of the few genuine difficulties in programming the
AM29050.
As an example, Example 2(a) shows a loop that adds together an array of 100
numbers on an 8086 processor. On the AM29000, this loop translates to the code
in Example 2(b). Lines [1a] through [1d] declare new names for some global
registers. This facility makes it easy to document the contents of registers
and is essential given the large number of registers on the AM29050. Lines [2]
through [5] correspond to lines [1] through [3] in the 8086 code. Notice the
loop counter (line 3) starts at two less than the loop count. This is because
the jmpfdec instruction at line [8] tests the counter for a negative value
before performing the decrement. Line [9] is the delay instruction for the
jmpfdec.
Example 2: (a) A loop that adds together an array of 100 numbers on an 8086
processor; (b) the loop shown in (a) translated to the code on an AM29000.

 (a)
 [1] xor ax,ax
 [2] mov cx,100
 [3] lea bx,array
 [4] addloop: add ax, word ptr [bx]
 [5] add bx,2

 [6] loop addloop

 (b)
 [1a] .reg counter,gr96
 [1b] .reg addr,gr97
 [1c] .reg sum,gr98
 [1d] .reg temp,gr99
 [2] const sum,0
 [3] const counter,100-2
 [4] const addr,array
 [5] consth addr,array
 [6] addloop:load 0,0,temp,addr
 [7] add addr,addr,4
 [8] jmpfdec counter,addloop
 [9] add sum,sum,temp

This code runs very quickly on the AM29050 because we are not using the value
loaded at line [6] until line [9], three cycles after the load started. There
is no delay waiting for the value to be loaded. On the 29050, this loop takes
four clock cycles each time around. On the 80486, the loop takes 11 clock
cycles if the data value is in cache and around 16 if it is not.


Registers Galore!


Because of the reduced instruction set, a RISC chip requires less space for
microcode, or for hardwired logic. Instead, this space can be used for
registers. The AM29000 range has over 150 general-purpose registers, all 32
bits wide.
On a CISC machine, there is a fixed number of registers, each with a unique
name. The same few registers are used by all procedures, so registers have to
be saved and restored across procedure calls.
On the AM29050, there are two sets of registers. The global registers are
fixed, just as on a CISC machine. gr96 always refers to a particular 32-bit
register in the processor: global register 96. The local registers are
dynamic. Each procedure allocates its own set of local registers. The hardware
register corresponding to a given local register changes from procedure to
procedure. A procedure only allocates as many local registers as it requires,
up to a maximum of 126. Within the procedure, these registers are referenced
as lr0, lr1, and so on.
The AM29050 processor contains 128 hardware registers for local registers.
When deeply nested procedures are called, the total number of registers
allocated by all the procedures may exceed this number. When this happens, the
oldest registers are spilled to a stack in memory. Like most microprocessor
stacks, this stack grows downwards in memory.
Register filling and spilling is "lazy": A spilled register is not restored to
the hardware until it is needed. This results in a pool of unused local
registers in the hardware. Whenever a procedure allocates no more registers
than exist in the free pool, no spill or fill is required for that procedure.
As a result, the overhead of saving and restoring registers across procedure
calls is often completely avoided.


What's Left Out?


The original RISC processors implemented only instructions that could be
executed in a single cycle. More complex instructions had to be implemented as
procedure calls or software interrupts. Almost all of the usual logical and
arithmetic instructions that you would expect to find on a CISC machine are
present and implemented in hardware on the AM29050, rather than as software
traps.
The AM29050 instruction set is "reduced" from a CISC instruction set in only
three ways. One of these we have already seen: The processor can only perform
data manipulation on registers; arithmetic and logical operations cannot
contain implicit loads and stores.
The second reduction is that all instructions are exactly 4 bytes in length.
This restriction is noticed when loading constants. A general constant load
requires two instructions, as in Example 3. The first instruction sets the
bottom half of global register 96 to 0x5678 and clears the top half to 0. The
second instruction sets the top half to 0x1234 without altering the bottom
half of the register.
Example 3: A general constant load requires two instructions.

 [1] const gr96,0x12345678
 [2] consth gr96,0x12345678

Clearly, loads of constants with 0s in the top 16 bits only require the first
instruction. In addition, there is a constn instruction which loads a constant
with all 1s in the top 16 bits.


Conquering Divide


The final reduction is that, on the AM29050, there is no hardware integer
divide instruction. The divide instruction causes a trap to a software routine
which requires about 40 machine cycles. This is still comparable to an 80486,
which typically takes 43 cycles, but there are a couple of faster ways to do
integer divides.
The AM29050 does support floating-point divide in hardware. In addition, the
AM29050 floating point supports denormalized reals in hardware. A denormalized
real is a real number with an exponent of 0. We can use this to divide
positive integers; see Example 4(a). This will do an integer division between
two positive integers smaller than about eight million (23 significant bits)
in 14 cycles.
Example 4: (a) Using a denormalized real to divide positive integers; (b)
dividing a positive integer by 3.

 (a)
 [1] fdiv gr96,gr97,gr98
 [2]convertgr96, gr96,unsigned
 truncate, single,single

 (b)
 [1] const gr98,0x55555556
 [2] consth gr98,0x55555556
 [3] multmu gr96,gr97,gr98


Alternatively, if the divisor is a known constant, we can implement the divide
of a positive value as a multiply! We first divide 0x100000000 by the
constant, then use this value to do a multmu, which produces the higher 32
bits of a 64-bit unsigned multiply.
For example, to divide a positive integer by 3, we calculate 0x100000000
/3=0x55555556, and the code comes to resemble Example 4(b). This divide takes
five cycles! (I first saw this trick in the Pascal compiler for the CDC 6600,
written by Urs Amman.)


Fast Floating Point


The floating-point instructions take several cycles to complete, but the
floating-point units are pipelined so that, in many cases, a new operation can
be started every cycle. (See Table 1.) You don't have to wait for one
operation to complete before starting the next.
Table 1: Times for some slow operations

 Operation Time Issue every
 ----------------------------
 MULTIPLY 3 1
 FADD 3 1
 FMUL 3 1
 FDIV 11 10
 DADD 3 1
 DMUL 6 4
 DDIV 18 17

 (All times in cycles)

The pipelining means that, with a 40-MHz clock, you can perform 40 Megaflops
on a single-chip computer! In fact, the actual limit is twice this because two
instructions, FMSM and FMAC, start a combined multiply and add, and a new one
can be started every cycle. Hence, the peak rate is 80 Megaflops!
To get the maximum floating-point performance, you need to carefully consider
the order in which operations are to be performed. Evaluating a polynomial is
a good example.
The traditional way to evaluate a polynomial is to use Horner's rule. For
example, the polynomial x^4-10x^3 +35x^2-50x+24 would be evaluated as
(((x-10)*x+35)*x-50)*x+24. As each step uses the result of the previous step,
no advantage can be taken of the pipelining. It will take 21 cycles to
evaluate this polynomial in single precision using Horner's rule. If, instead,
you find the roots of the polynomial, you can evaluate the polynomial as
((x-1)*(x-2))*((x-3)*(x-4)). The subtractions can all be done in parallel so
that all four subtracts are complete after six cycles, even though each
subtract requires three cycles. The first two multiplies are then executed in
parallel. Only the final multiply must be executed alone. In this way, you can
evaluate the polynomial in 12 cycles.


Using Burst Mode


Floating-point calculations on the AM29050 are so fast, that the limit on
calculation speed is often not the floating-point performance of the chip, but
the speed of main memory.
Consider taking a Euclidean norm, for example. This simply squares the
elements of an array and adds the squares together. In C, this would look like
Example 5(a). This piece of code requires one memory load per two
floating-point operations. In AM29050 code, in its simplest form, the loop
would look like that in Example 5(b). By convention, functions always pass
parameters starting in lr2 and return results in gr96.
Example 5: (a) Squaring the elements of an array and adding the squares
together in C; (b) doing the same thing in AM29050 code.

 (a)
 float Euc_Norm(float) x[],int len)
 {
 int i;
 float a=0.0;
 for (i=0;i<len;i++) a+=x[i] *x[i];
 }
 (b)
 [1] sub lr3,lr3,2 ;prepare counter for jmpfdec
 [2] const gr96,0
 [3] mtacc gr96,single_precision,0 ;clear acc 0
 [4] loop: load 0,0,gr97,lr2
 [5] add lr2,lr2,4
 [7] jmpfdec lr3,loop
 [8] fmac single_precision,0,gr97,gr97
 [9] jmpi lr0
 [10] mfacc gr96,single_precision,0 ;result from acc 0

The fmac instruction at line [8] multiplies its arguments and adds the result
to one of four accumulators. These accumulators are special-purpose registers,
not part of the global or local registers. The mtacc instruction (move to
accumulator) at line [3] moves the value of gr96 (0) into accumulator zero.
The mfacc instruction (move from accumulator) at line [10] retrieves the
result. It is the delay instruction for the jump indirect instruction at line
[9].
Each iteration of this loop requires four cycles. Each fmac requires six
cycles each but can be started every three cycles. The accumulate for one loop
is finishing while the multiply for the next loop is being performed.
At 40 MHz, you do this loop ten million times per second, and each iteration
does two floating-point operations, so this simple loop gives us 20 Megaflops.
Burst mode allows you to access memory faster than you could by accessing
memory one location at a time. The memory in a memory chip is organized into
rows and columns. When a normal access is made, the chip must first select the
column for the bit to be read, and then select the row.
In burst mode, the processor takes advantage of static-column mode, in which
the column does not change from one access to the next. The column-select step
is avoided, resulting in a faster read access. The three cycles normally
required to load a word can be reduced to two.
Burst mode is always used to load instructions into the processor. This is the
only way the memory system can keep up with the instruction execution rate
required by the processor. Extra hardware, external to the AM29050, is used on
the instruction bus so a word of instructions can be loaded every cycle,
rather than every other cycle. This special hardware could be duplicated on
the data bus to produce single-cycle burst data loads, but this adds cost that
few users will find justified. In this discussion, I'll assume this hardware
is not present.

To use the burst mode, we must use a special instruction, loadm, that loads
multiple registers. Using this instruction, our Euclidean norm looks like
Example 6. For simplicity, I'm assuming the number of elements is a multiple
of four. I've accumulated to all four special accumulators and then added the
separate totals at the end. This allows all the fmac instructions to run in
parallel.
Example 6: Using loadm to use burst mode

 [1] srl lr3,lr3,2 ;divide count by 4
 [2] sub lr3,lr3,2 ;prepare counter for jmpfdec
 [3] const gr96,0
 [4] mtacc gr96,single_precision,0 ;clear accumulators
 [5] mtacc gr96,single_precision,1
 [6] mtacc gr96,single_precision,2
 [7] mtacc gr96,single_precision,3
 [8] const gr96,3 ;operations per loop, less 1
 [9] loop: mtsr CR,gr96 ;set load count special register
 [10] loadm 0,0,gr97,lr2 ;load 4 words
 [11] add lr2,lr2,16
 [12] fmac single_precision,1,gr97,gr97
 [13] fmac single_precision,2,gr98,gr98
 [14] fmac single_precision,3,gr99,gr99
 [15] jmpfdec lr3,loop
 [16] fmac single_precision,4,gr100,gr100
 [17] mfacc gr96,single_precision,0 ;combine acc values
 [18] mfacc gr97,single_precision,1
 [19] fadd gr96,gr96,gr97
 [20] mfacc gr98,single_precision,2
 [21] mfacc gr99,single_precision,3
 [22] fadd gr96,gr96,gr98
 [23] jmpi lr0
 [24] fadd gr96,gr96,gr99

This rather complex loop takes ten cycles per loop, performs eight
floating-point operations, and equates to 32 Megaflops at 40 MHz. Using burst
mode, you can improve the performance of the routine by 50 percent. With
special hardware to load a word every cycle, this would increase to 40
Megaflops.


How Hard is it to Program?


You've seen some optimizations that can be performed to extract very high
performance out of the AM29050. Most of the time, however, straightforward
code will run adequately. The regularity of the instruction set and the huge
number of available registers make it relatively simple to produce such code.
These attributes also enable compilers to produce fast code, so the
performance penalty for using a high-level language is fairly small. Any
programmer experienced in assembly language programming on another processor
will find little difficulty in turning out code for this chip.


More Speed!


There is constant competition between microprocessor developers to produce
more powerful chips. The next generation of CISC chips will seek to execute
one instruction per cycle, just as RISC chips do now. To do this, they will
have to be organized internally as a data-flow processor, in which
instructions are triggered by the arrival of their operands. This organization
is very different from the organization of the current generation of
microprocessors, so designing these chips will require a major effort.
Superscalar architectures, the next generation of RISC chips, will execute
several instructions per cycle. RISC architecture lends itself to superscalar
execution because all the instructions are the same size. On a CISC machine,
you have to decode the current instruction before you can even find the
following instruction.
Because of their complexity and novelty, it is likely to be at least a couple
of years before we see the next generation of CISC chips in real machines.
These chips will bring CISC performance up to the level currently enjoyed by
users of RISC processors. By then, however, RISC chips can be expected to have
advanced to yet higher levels of performance.
_PROGRAMMING THE 29050_
by David L. Moore


Example 1:

(a)

[1] add gr96,gr97,gr98
[2] add gr97,gr97,1
[3] subr gr99,gr97,2

(b)

[1] load 0,0,gr96,gr97
[2] store 0,0,gr96,gr98



(c)

[1] load 0,0,gr96,gr97 ;load full 32 bit word
[2] load 0,0x01,gr96,gr97 ;load byte, zero fill
[3] load 0,0x11,gr96,gr97 ;load byte, sign extend
[4] load 0,0x02,gr96,gr97 ;load half word, zero fill
[5] load 0,0x12,gr96,gr97 ;load half word, sign extended




Example 2:

(a)


[1] xor ax,ax
[2] mov cx,100
[3] lea bx,array
[4] addloop:add ax,word ptr [bx]
[5] add bx,2
[6] loop addloop


(b)

[1a] .reg counter,gr96

[1b] .reg addr,gr97
[1c] .reg sum,gr98
[1d] .reg temp,gr99
[2] const sum,0
[3] const counter,100-2
[4] const addr,array
[5] consth addr,array
[6] addloop:load 0,0,temp,addr
[7] add addr,addr,4
[8] jmpfdec counter,addloop
[9] add sum,sum,temp




Example 3

[1] const gr96,0x12345678
[2] consth gr96,0x12345678



Example 4

(a)



[1] fdiv gr96,gr97,gr98

[2] convert gr96,gr96,unsigned,truncate,single,single



(b)


[1] const gr98,0x55555556
[2] consth gr98,0x55555556
[3] multmu gr96,gr97,gr98




Example 5

(a)


 float Euc_Norm(float x[],int len)
 {
 int i;
 float a=0.0;
 for (i=0;i<len;i++) a+=x[i]*x[i];
 }


(b)

[1] sub lr3,lr3,2 ;prepare counter for jmpfdec
[2] const gr96,0
[3] mtacc gr96,single_precision,0 ;clear acc 0
[4] loop: load 0,0,gr97,lr2
[5] add lr2,lr2,4
[7] jmpfdec lr3,loop
[8] fmac single_precision,0,gr97,gr97
[9] jmpi lr0
[10] mfacc gr96,single_precision,0 ;result from acc 0



Example 6

[1] srl lr3,lr3,2 ;divide count by 4
[2] sub lr3,lr3,2 ;prepare counter for jmpfdec
[3] const gr96,0
[4] mtacc gr96,single_precision,0 ;clear accumulators
[5] mtacc gr96,single_precision,1
[6] mtacc gr96,single_precision,2
[7] mtacc gr96,single_precision,3
[8] const gr96,3 ;operations per loop, less 1
[9] loop: mtsr CR,gr96 ;set load count special register
[10] loadm 0,0,gr97,lr2 ;load 4 words
[11] add lr2,lr2,16
[12] fmac single_precision,1,gr97,gr97
[13] fmac single_precision,2,gr98,gr98
[14] fmac single_precision,3,gr99,gr99
[15] jmpfdec lr3,loop
[16] fmac single_precision,4,gr100,gr100

[17] mfacc gr96,single_precision,0 ;combine acc values
[18] mfacc gr97,single_precision,1
[19] fadd gr96,gr96,gr97
[20] mfacc gr98,single_precision,2
[21] mfacc gr99,single_precision,3
[22] fadd gr96,gr96,gr98
[23] jmpi lr0
[24] fadd gr96,gr96,gr99






















































January, 1992
REEXAMINING B-TREES


Free-at-empty is better than merge-at-half




Ted Johnson and Dennis Shasha


Ted is an assistant professor at the University of Florida. Dennis is a
professor at New York University and author of a popular puzzle book, entitled
The Puzzling Adventures of Dr. Ecco and an upcoming book on database
performance tuning. They can be reached via Internet at shasha@cs.nyu.edu or
ted@cis.ufl.edu.


The versatility of B-trees is the reason they are ubiquitous in database
programs from mainframe packages (such as IBM's VSAM) to PC products (such as
dBase and its competitors, or the database facility in OS/2 Extended Edition).
Since their invention by Rudolph Bayer in 1972, there have been any number of
variations in both data structure and algorithms. Nevertheless, there is
always room for improvement. This article first briefly reviews the B-tree
concept, and then summarizes our investigation into a simpler, more efficient
method for managing B-trees that grow on the average.


A Quick B-tree Rehash


A B-tree is a data structure that allows you to store and retrieve a set of
key-value pairs. For example, the keys may be social security numbers and the
values may be names or addresses. Normally, the values take up more space than
the keys.
Unlike a binary tree, a B-tree is a balanced tree structure. Every leaf is at
the same distance from the root. The classic B-tree structure stores some
key-value pairs in the interior nodes of the structure; a B+ tree is a
variation that places all key-value pairs at the leaves. B-trees are now
almost always used as B+ trees. The main findings in this article apply to
both kinds of structures.
B-trees are useful for two kinds of searches: exact searches (for example,
find all information associated with employee 10) and range searches (for
example, find all information associated with employees whose IDs are between
35 and 43). The reason is that the key-value pairs are always kept in sorted
order. By contrast, if you use hashing as an access method rather than
B-trees, you get faster exact searches--but cannot easily conduct range
searches.
The interior (nonleaf) nodes of a B+ tree do not contain key-value data, but
rather consist of navigational information: pairs of so-called separators and
pointers to nodes. Specifically, an interior node consists of the sequence
S[1] P[1] S[2] P[2] ... S[n] P[n], Where the s{i}s are separators sorted in
ascending order and the p{i}s are pointers to child nodes.
The maximum number of children that an interior node may have is called its
"fanout." On big mainframe B-trees, where a node may span an entire track, its
fanout can be over a thousand. A leaf node usually has fewer elements than an
interior node because values are usually bigger than pointers, and the keys
are at least as big as the separators. Normally, the key-value pairs at the
leaf nodes are kept in sorted order, but some researchers advocate
implementing a hash table at the leaves.
The principal advantage of B-trees is that they offer consistently fast access
times for both exact and range searches. The consistent access times are due
to the balanced tree. All searches require traversing an equival number of
blocks from the root down to the leaves. Even for databases of hundreds of
megabytes, the search path is limited to a small number of blocks. Combine the
B-tree structure with a disk cache for frequently accessed blocks and you get
an access method that's hard to beat.
A principal disadvantage of B-trees is the storage overhead consumed by the
interior nodes of the tree. Another is the programming complexity involved in
managing this balanced data structure efficiently.


Accessing B-trees


Given a key k, searching a B+ tree for the associated value is easy. First,
start at the root node of the tree. At each node, find the first separator "s"
such that s<=k, and follow the pointer to the right of s. At the leaf, simply
see whether there is a record associated with key k. (With a classical B-tree,
you may find the data earlier, at the interior node rather than the leaf,
though this will rarely happen if the fanout is large.)
Modifying a B+ tree works as follows: To insert a key-value pair, search for
the key you want to insert. If already present, then replace the existing
value with the new value. Otherwise, add the key-value pair into the leaf node
(assuming a B+ tree)--a successful insert.
Sometimes, an insert may encounter a leaf that is already full. In that case,
the insert must "split" the node into two nodes. The left node takes the lower
half of the elements, and the right node takes the upper half. Then the
algorithm must insert a separator-pointer pair into the parent. This may cause
the parent to split. In the worst case, the split will propagate recursively
through the interior nodes all the way to the root of the tree. This would
then add a new level to the balanced tree structure.
There is not much one can do about splits--if a node is full, one cannot put
more data in it without destroying old data. However, the analogous operation
for deletes, the "merge," allows more options.


The Merge Operation


One of the trickier tasks in programming a B-tree is what to do with a node
when it falls below half full. Rudolph Bayer decreed that nodes should be
merged with their neighbors when this occurs; this approach is called
"merge-at-half." Other researchers have proposed many other possibilities, but
the chief competitor is "free-at-empty"--a much simpler approach in which you
wait until the node is empty and then just free it and adjust its parent.
Intuitively, the choice seems to be a programming effort vs. space utilization
trade-off. Virtue--one might think--would consist of merging way before the
node becomes empty. But virtue here is considered harmful.
As our research has discovered, it turns out that the merge-at-half strategy
is harder to implement, detrimental to performance, and gives only a
negligible gain in space utilization compared with the free-at-empty
strategy--provided the B-tree is growing on the average.


History and Evidence


Space utilization holds many surprises for the unwary. To test your wariness,
see how you would answer the following questions:
Suppose you are doing only inserts into a B-tree (or B+ tree). Do you think
the distribution of your data will affect the average space utilization?
Because (normal) splitting results in transforming a full node into two
half-full nodes, all nodes (except the root) are at least half full in a pure
insert B-tree. Does this mean that the average utilization is 75 percent in a
pure insert B-tree?
The answer to the first question is no. What matters is that every order of
inserts is equally likely. For example, it does not matter if many employees
have social security numbers falling between 123-45-6789 and 123-46-6789, and
very few between 000-00-0000 and 023-45-6789. On the other hand, if the
inserts into the B-tree occur in sorted order, utilization will suffer unless
one uses a heuristic.
Why will it suffer? Because every node will be half full as a result of the
normal splitting algorithm and will not receive any more values after it is
split. The heuristic that is often used is to detect whether the split
occurred because of an insert of the highest key-value pair or the lowest. In
either case, change the splitting algorithm by distributing the key-value
pairs, as follows: three-fourths to the existing node and one-fourth to the
new node.
The answer to the second question is also negative. Andrew Yao showed that the
average utilization is about 69 percent. The "intuitive" reason is that a full
node is replaced by two half-full nodes, reducing the utilization to below 75
percent.



Our Results


We found another trap for the unwary. Suppose successful inserts and deletes
are the only modifications. Almost all applications insert more than delete,
so we are most interested in that case.
At 50-percent deletes, free-at-empty gives utilization below 40 percent,
whereas merge-at-half gives 60 percent. One point for virtue! However, when
deletes comprise at most 47 percent of all modifications, the space
utilization difference between free-at-empty and merge-at-half nearly
disappears. In fact, the utilization is close to the pure insert value for
both methods. The bigger the nodes are, the sharper the "knee" of this curve
becomes. See Figure 1.
What's more, merge-at-half requires significantly more restructuring activity
than free-at-empty, thus hurting performance. See Figure 2.
There is actually another benefit to free-at-empty in those high-performance
online transaction applications that allow concurrent accesses to B-trees. By
concurrent accesses, we mean that many inserts/deletes/searches may occur
during the same time interval.
Many locking schemes are used in such cases, but all require exclusive locks
on all nodes that are changed. Because merge-at-empty requires locks on more
nodes than free-at-empty, and because the locks are held for longer times,
merge-at-empty reduces the possible amount of concurrency and can therefore
hurt performance.
The moral of this story is simple and sweet: When programming B-trees, do it
the easy way!


References


Bayer, R. and E. McCreight. "Organization and Maintenance of Large Ordered
Indexes." Acta Informatica 1, 1972.
Johnson, T. and D. Shasha. "Utilization of B-trees with Inserts, Deletes, and
Modifies." ACM Symposium on Principles of Database Systems Conference, 1989.
Shasha, D. and N. Goodman. "Concurrent Search Structure Algorithms." ACM
Transactions on Database Systems, (March, 1988).
Yao, Andrew. "On Random 2-3 Trees." Acta Informatica 9, 1978.











































January, 1992
 MULTIPLE MICROCONTROLLERS IN AN EMBEDDED SYSTEM


A case study in system architecture and embedded hardware design




Christopher Rosebrugh and Eng-Kee Kwang


Eng-Kee Kwang and Christopher Rosebrugh are cofounders of PI Systems
Corporation. Eng-Kee is vice president of software development, and Chris is
vice president of engineering. They can be reached at 10300 SW Greenburg Rd.,
Suite 500, Portland, OR 97223.


In the article "Linking User Interface and Database Objects" (DDJ, December
1991), we described the software architecture of the Infolio, a portable
data-collection tablet that uses a pen as the input device. Here, we examine
the Infolio's hardware architecture--and the trials and tribulations we
encountered during the design and development stages--from an embedded system
perspective. Why this perspective? Because one of the unique aspects of the
Infolio is that it's built around microcontrollers typically used with
embedded systems, not the microprocessors commonly used in portable computers.
Our focus in this article is on hardware design constraints and the decisions
we made concerning them. We hope this information will be useful to others
designing and developing embedded systems.


Hardware Design Constraints


Because we designed Infolio to be a lightweight, low-power consumption, low
cost, portable pen-based database computer, and because we had an aggressive
product development schedule, we were faced with a number of hardware
constraints:
Keep the architecture simple. Use a well-understood core processor supporting
a flat memory model to minimize software development constraints. The
compressed product schedule drove this requirement.
Create a modular design to support turning off/on various sections as required
by the software. The low-power consumption constraint drove this requirement.
Minimize logic components. Board space would be needed for power management
buffers. The cost and low-power consumption constraints drove this
requirement.
Offload the graphics and pen tasks from the core processor to achieve higher
system throughput. The fact that the Infolio would essentially be a database
machine drove the need to have raw processing power available for searches,
queries, and analyses.
Minimize on-board memory. Use PCMCIA memory cards to hold system code and
stack/heap space. Goals for manufacturability (eliminating fine pitch
packages, such as those used for SRAM, from the board) and upgradability (both
code and system memory capacity) drove this requirement.


Hardware Architecture


Given the high-level design constraints, we developed a hardware architecture
that makes extensive use of processor and memory card technology. Figure 1
shows a block-level representation of the Infolio hardware.
The Infolio makes use of multiple processors: the Motorola MC68331 as the core
processor, the Motorola MC68HC05C4 as the power management processor, the
Hitachi 63484 to handle graphics, and the Intel 80C51 to control the
digitizer.
The most interesting processor in the system is the MC68331. Initially
positioned by Motorola as an embedded controller, the 331 is basically a 68020
with a 16-bit external bus and a lot of what would traditionally be system
"glue," or peripheral logic, integrated onto the die.
One reason we were drawn to the 331 was its 68020-like core, the CPU32. The
CPU32 core is a silicon subcomponent that Motorola incorporates into a range
of microprocessors. The 68020 architecture, of course, is well-known among
engineers with UNIX workstation experience. Our own background is primarily
UNIX and Macintosh, so we expected to experience little or no learning curve.
Another reason we chose the 331 is because it is a fully static part. The
clock speed can be dynamically adjusted from 0 Hz (low-power stop mode) to 16
MHz, depending on the system activity.
A third reason is that the 331 is based on the IMB (Inter-Module Bus) concept.
Peripherals are integrated onto the die with the CPU32, interfacing via an
internal memory-mapped bus. The 68331 includes a SIM (System Integration
Module) with ten chip selects, a periodic interrupt timer, a software watchdog
timer, and system clock controls. Additionally, a Queued Serial Module is
included which provides both a 2-wire asynchronous Serial Communications
Interface and a 16-word-deep Serial Peripheral Interface. A General Purpose
Timer module is available which provides input capture/output compare and
pulse-width modulation functionality.
Yet another reason for selecting the 331 was that the controller has a
background mode which allows, through serial communication over multiplexed
device pins, an integrated development system and/or a debugger to be
developed. Finally, the 331 is relatively inexpensive--under $25 in volume
quantities--and provides up to five times the processing power of an Intel
80286.
The 8-bit MC68HC05 microcontroller controls the hardware side of system boot,
manages most of the system interrupts, and processes pen information before
passing it on to the main processor. This device is responsible for
controlling the power to the various modules of the hardware, thus controlling
the five different power states of the system: awake, relaxed, napping,
sleeping, and off.
Both the Hitachi 63484 and the Intel 80C51 have very specific tasks. The 63484
was chosen because it provides reasonably high-level graphics commands (lines,
polylines, circles, fills, bit-blts, and so on) at a relatively low cost. We
also chose this processor because of our schedule pressures -- it eliminated
the need to write low-level graphics routines; it also helped reduce our
system code size (which we limit to 1 Mbyte). As Figure 1 shows, graphics
memory is completely controlled by the 63484 and is not available as a system
resource. (Incidentally, notice that the graphics memory is the only memory on
the motherboard.)
The video memory is partitioned into six frame buffers split into two layers
having three buffers each. This arrangement is shown in Figure 2. The first
layer (the "paper") contains the screen image, and the second layer (the
"acetate") displays the pen's "ink." Each layer has only one active buffer at
a time; typically, the inactive buffers are used to cache fonts and icons that
can be "bit-blitted" to the active paper layer for higher graphics
performance. The screen image the viewer sees results from logically ORing the
paper and acetate layers' active buffers. All images use 1 bit per pixel; the
video subsystem is currently a monochrome design. We've placed all
device-dependent code, including the 63484 graphics driver, in the micro
kernel. This minimizes the impact on applications if the Infolio software
system were ported to another architecture.
The fourth processor in the system, the Intel 80C51 microcontroller, came to
us as part of a technology licensing agreement with our digitizer supplier,
CalComp. We do no programming of the part ourselves; we simply submit
system-level requirements to the digitizer. The '51 scans the gridlines of the
digitizer to detect pen position, tilt, pressure, and height of the screen
surface.


Memory and Storage Strategies


Other interesting aspects of the architecture are the memory map and memory
card-based storage mechanisms (See Figure 3.) The 68331 has only 24 address
lines, so the memory space is limited to 16 Mbytes. But because the processor
provides lines which distinguish code access from data access, we essentially
have 32 Mbytes of memory space. (Note that both Infolio system and application
software run in supervisor space. As described last month, the software kernel
is a cooperative-multi-tasking, nonprotective system.)
Infolio has no magnetic storage media. The decision to develop a completely
solid-state system grew out of the requirements for long battery life and high
data integrity in a field environment. Users may freely insert and remove
memory cards--while the machine is running--without fear of losing data. The
system has three PCMCIA (Release 2.0) memory-only card slots. While one slot
holds a mixed memory card containing system ROM and RAM (1 Mbyte of each), the
other two slots hold cards of up to 8 Mbytes each. (Card capacities will grow
as semiconductor vendors introduce higher-density ICs.) Additionally, there is
a virtual card which exists as a partition of system RAM. These cards may be
used both for user data storage and third-party application storage. As
discussed last month, all data is stored as a hierarchical collections of
objects--including application objects--and objects may be freely moved,
copied, cloned, and linked between memory cards.
The memory card address space can be viewed as an 8-Mbyte block into which a
card is mapped when activated by the system. The system has a notion of an
active "data" card and an active "code" card. As the database is manipulated
by an application, links are typically encountered which, when traversed, may
cause the system to map, or swap another card into the 8-Mbyte data space.
Likewise, as applications execute from memory cards, function calls may jump
to and from code residing on various cards, causing the system to map the
appropriate card into the 8-Mbyte code space.
Memory cards have access times ranging from less than 100 ns to 250 ns, so
execution of code and access to user data is quite fast when compared to disk
access and load times. Therefore, although the system may be running at a
clock speed of only 8 or 13 MHz, user interaction occurs in real time.


Software Development Tools


Early in the product definition stage of the Infolio we decided that, to
simplify the design, we would not support a development environment on the
machine itself. Therefore, all application development is done on a host
system (Sun SPARCstation or an IBM-compatible PC), then downloaded to the
Infolio. This frees the system from the complex requirements of native
development, and allows developers to work with familiar compilers and
debuggers. In our case, we use the 68332 cross compiler from Intermetrics.

However, we still had to bootstrap our own software development effort. This
was achieved in two ways. As discussed last month, the complete Infolio
software system executes on the SPARCstation and the PC. By creating an
emulation environment on these machines, we were able to develop most of the
system without the hardware even being present. About five percent of the
system software (50 Kbytes) is hardware specific and is written and maintained
for each of our platforms: Infolio, SPARCstation (under OpenWindows), and the
PC (under Microsoft Windows).
The second part of our bootstrap strategy was to use the Hewlett-Packard
HP64749 In-Circuit Emulator (ICE), combined with Intermetrics XDB symbolic
cross debugger to develop and debug the Infolio-specific code as well as any
problems with the high-level code. For the first six months of development, we
used the Intermetrics' ITools MC68332 compiler and assembler (on the
SPARCstation) to cross compile/assemble our ANSI C code into a hex file, which
we then downloaded through XDB into the HP64749 emulator memory.
Although this development environment was generally useful and beneficial, we
encountered problems which caused significant delays. The main problem was
that we originally wanted to move system ROM from address 0 in the memory map
to address 0x800000 after system boot. This would allow us to map the memory
cards at address 0, hence performing no address translation on pointers (or
links) stored on the cards.
Because one of our design requirements was to have the data on a card be
independent of where the card sits in memory space, we manipulate card
addresses as if they are relative to 0. But because we were unable (even with
substantial help from both HP and Intermetrics) to change the memory map and
still use the ICE, we had to make hardware changes to move the card memory to
the top half of the 16-Mbyte space, and we had to make software changes to
add/subtract the card's base address when accessing card data. A second
problem was our inability to shift the MC68331 System Integration Module's
memory map from its default position at the top 2 Kbytes of the 16-Mbyte space
to its alternate position at the top 2 Kbytes of the 8-Mbyte space. This
proved to be a problem only when interacting with the emulator, however, and
our production system uses the alternate location.
A third problem was a fairly slow turnaround time for software changes. We
were using a mixed environment (HP emulator, SUN workstation, and Intermetrics
tools), so the only viable path at the time for downloading code was via XDB
through the SUN's serial port to the emulator at 19.2 Kbaud. Because of the
protocol used, it took nearly an hour to download 512 Kbytes of code.
The last problem was that as our system code grew past 512 Kbytes, we could no
longer use the in-circuit emulator's emulation memory. This actually proved to
be the catalyst for a major hardware change. Originally, we had I Mbyte of ROM
and 512 Kbytes of SRAM on the motherboard to hold the system code and data.
After putting together early prototypes of the Infolio, we found that the
small size and intricate design made it difficult to continually take the
units apart to update the system ROM. This fact, combined with the inability
to use emulation memory (due to size limitations) led us to take all memory
off the motherboard and use a PCMCIA card to hold the system code and data.
Since moving the system code to a memory card, several problems have been
eliminated. A major one, update speed, is much improved because we can now
cross compile to a binary file and download that file through the Infolio's
serial port directly onto a memory card. That card is then moved into the
system slot for use. A 1-Mbyte serial download now takes less than ten minutes
at 38.4 Kbaud. We also are able to run the ICE with this configuration by not
using the emulation memory and allowing the ICE to write to system "ROM" in
order to insert software breakpoints to support XDB. We use 2-Mbyte SRAM cards
for development, with the lower half holding system code and the upper half
holding the stack, system heap, and the virtual data card.
Needless to say, we couldn't expect third-party developers to go through the
process just described; application development tools would obviously have to
be provided. However, because the Infolio is viewed as an embedded system, the
development environment threatened to be especially tricky to create.
Consequently, we created a development kit to support application development
for the Infolio. As shown in Figure 4, developers can employ the user
interface layout editor and/or the database structure editor to define the
application-specific data structures and their user presentation. These tools
produce code templates to which the developer can add specialized task
handlers. Custom event handlers can be attached to user interface objects if
there is a need to respond immediately to an event. For example, the
application might do special processing on a value in a database object when
the user taps the pen on a particular check-box object. Developers can also
create applications without the graphical editing tools by using the
Application Programming Interface directly.
On the development host (SPARCstation or PC), developers can compile and debug
their application code using familiar tools (on the PC, Borland's C++ in ANSI
C mode, and Turbo Debugger, for example). The developer's code is then linked
to a library we provide, which is essentially the complete Infolio system,
resulting in a single large executable that can be debugged on the host. Note
that, on the Infolio target, applications are objects that exist in a fully
late-bound flexible form.
To debug the application code on the target, we've developed a symbolic cross
debugger which operates through the MC68331's background mode. A special
connector joins the host to the Infolio via the expansion port. The symbolic
cross debugger communicates with the target to set/get variable values, get
stack information, set software breakpoints, and control execution.


Conclusions


There were four keys to developing the system to our design requirements and
time constraints. As discussed last month, our object-oriented architecture
made maximum use of code reusability and standardized communication and so
helped keep the system code compact and made it quicker to develop. Next,
Motorola's 68331 provided a simple, well-known architecture to work with, and
its on-chip integration reduced board real estate and power demands. Using
intelligent coprocessors also reduced the chip count and code development
effort. Thirdly, the software emulation of the system let us develop code
concurrently with the system hardware, and also allowed a more comfortable and
productive application development environment. Finally, the use of in-circuit
emulation and cross debugging at the source-code level expedited the process
of developing Infolio-specific code and of debugging early problems with the
high-level code.













































January, 1992
YOUR OWN DISK DUPLICATION PROGRAM


Read and write disks using 386 protected mode


 This article contains the following executables: CB386.ARC


Al Williams


Al is a freelance writer and a consultant on the Space Station Freedom
project. His book, DOS 5: A Developer's Guide, is available from M&T Books. He
can be reached at 310 Ivy Glen Court, League City, TX 77573.


DOS extenders are playing an increasing role in PC software development. Upon
hearing that Intel's 386/486 C CodeBuilder includes a royalty-free DOS
extender which supports DPMI, I was naturally curious. So I decided it was
time for CodeBuilder's first check-up and proceeded to place it on the
examining table.
To test this package, I developed a disk duplication system. The program has a
pleasing user interface, and is a practical, useful example. It stores images
of floppy disks entirely in memory -- simple for a DOS-extended program;
difficult or impossible for normal programs. I wanted to test CodeBuilder on
some points I think are important including Microsoft C compatibility,
interrupt handling, physical memory addressing, making DOS/BIOS calls, and the
speed penalty for making such calls.


Vital Signs


The CodeBuilder package is a complete development system. It includes a C
compiler, a debugger, a linker, a librarian, and a make program. Unlike many
DOS extenders, it doesn't have many 386 bells and whistles -- it looks much
like Microsoft C, except for the 32-bit integers and nonsegmented pointers.
The debugger won't impress many CodeView or Turbo Debugger users, but it is
serviceable.
Because the compiler is Microsoft compatible, Intel supplies little
documentation for it and almost none for the libraries. Instead, Intel
recommends the Microsoft documentation.
One important feature of CodeBuilder is the cost of distributing executables
that you generate -- it's free! Unlike many other DOS extenders, Intel allows
you to distribute your executables (with the DOS extender built in) at no
charge. This will be a decisive factor in many purchasing decisions. However,
I expect Intel has set a trend other vendors will follow.


The Project


CopyBuilder 386 (the disk duplication system) requires a lot of memory,
directly accesses the screen, makes many DOS and BIOS calls, handles
interrupts, and uses a display library I developed under Microsoft C. The
program operates on a disk image in memory. The image can be read from a
diskette or a disk file. CopyBuilder can write the image to either another
diskette or a disk file. The disk files include a title and a checksum.
CopyBuilder includes a simple character-mode interface and help system. When
CopyBuilder prompts you for a file name, you can enter the file directly. If
you enter a name that ends in a colon or a backslash or a filename that
contains wildcards, CopyBuilder presents a menu of matching files. You can
select the file from this menu. The default file is always "*.*". The menu
will never contain more than 120 files; but if the directory has too many
files, you can narrow the search by using wildcards.
The source code for CopyBuilder resides in several modules, some of which are
not included here due to space considerations. (See "Source Code Availability"
on page 3 for details on obtaining the complete system.) CB386.H (Listing One,
page 82) and CB386.C (Listing Two, page 82) contain the main code. FILEIO.C
(Listing Three, page 84) and DISKIO.C (not included) handle file and disk I/O,
respectively. REBOOT.C (Listing Four, page 86) provides code to disable
CONTROL-ALT-DEL. DISPLAY.C (Listing Five, page 86), HISTO.C, and DIRPICK.C
(not included) make up a simple display library developed for Microsoft C.


CodeBuilder In Action


Only two routines, read_disk() function (DISKIO.C) and loadit() (Listing
Three) allocate memory using malloc(). Both routines read in new images.
Because sector_read() and sector_write() use BIOS interrupt 13H for disk I/O,
they must allocate a special memory buffer below the 1-Mbyte line (dosbuf).
CodeBuilder automatically copies extended memory data to a special buffer
below 1 Mbyte for many DOS calls. However, for the BIOS disk services, it does
not. You are responsible for placing data buffers where the BIOS can access
them. When writing, you must first copy the data to the special buffer. When
reading, you must copy the data out of this buffer when the BIOS call
completes.
CodeBuilder allows programs to allocate this type of buffer using the standard
DOS memmory allocation call (interrupt 21H function 48H). Unlike the normal
DOS call, CodeBuilder expects the number of bytes (not paragraphs) in the EBX
register. If EAX is equal to 0x00004800, CodeBuilder allocates memory below 1
Mbyte. If EAX is equal to 0x80004800, CodeBuilder allocates extended memory.
CodeBuilder normally aligns structure members on a 4-byte boundary to improve
performance. This is a good idea, unless the structure maps to an external
entity. In CB386.H, the _bpb structure contains the BIOS Parameter Block (BPB)
from the target disk. The structure must mirror the actual BPB on the disk.
CB386 uses #pragma align-(_bpb=1) to force CodeBuilder to place each structure
member on the next available byte.


Larger than Life DOS Calls


Notice in FILEIO.C (Listing Three) that CB386 calls fread() and fwrite() with
32-bit integer values for the size argument. For instance, when reading a
1.44-Mbyte floppy disk, fread() must read 1.44 Mbytes of data. Of course,
CodeBuilder internally breaks the request into multiple DOS calls, each
handling smaller pieces of data. The buffer that CodeBuilder uses for DOS I/O
is set by the /xiobuf option. For example, /xiobuf=40K, which is the default,
reserves 40K of memory for DOS communications.


Accessing Video RAM


DISPLAY.C mainly uses video BIOS calls to manipulate the screen. Two
functions, vidsave() and vidrestore(), directly access the screen memory. This
is relatively simple because CodeBuilder uses a flat addressing scheme. For a
color display, the buffer is at location B800:0000. (The vidmode() function
ensures page zero is active.) CodeBuilder maps memory below 1 Mbyte so that
this address can be easily converted to a linear address. Simply shift the
segment value to the left four places and add the offset. For example char
*vptr=(char *)0xB8000 will point vptr to the beginning of screen memory.
CB386 uses the standard signal() and _harderr() functions to trap break
interrupts and critical errors. The signal() method is usually unsatisfactory
for programs such as CB386 because it allows ^C to appear on the screen when a
break occurs. However, the CB386 break handler (breakfunc() in DISP.C) does a
longjmp() to reinitialize the main menu. This has the handy side effect of
redrawing the screen -- the ^C doesn't last long enough to be a concern.


Interrupts



The interrupt 9 function (ourint9() in REBOOT.C) contains code to prevent the
user from rebooting the machine with CONTROL-ALT-DELETE. It does this by
hooking the keyboard interrupt (INT 9). If it detects the DELETE key, it looks
at the BIOS keyboard status word. If the CONTROL and ALT keys are down, the
interrupt handler consumes the DELETE key--the BIOS never sees it. In all
other cases, the BIOS receives the keystroke unaltered.
Handling interrupts in a DOS extender is usually very different from handling
them under regular DOS. Still, CodeBuilder is extremely different, even for a
DOS extender. Programs declare interrupt functions via a pragma. In REBOOT.C
you can see the statement #pragma interrupt(ourint9). Further down in the
code, the actual definition of ourint9() appears. Notice that the registers
are not pushed on the stack, as in Microsoft C. You must call the special
CodeBuilder function, _get_stk_frame(), to get a pointer to an _XSTACK
structure. The CodeBuilder header file STK.H declares the _XSTACK structure,
which is sketchily documented. It contains the interrupt registers, and other
related fields.
Normally, CodeBuilder processes interrupts in protected mode, then reissues
them in V86 mode for DOS. The opts field in _XSTACK can alter this behavior.
(The CodeBuilder documentation incorrectly calls this field stk_opts.) By
placing different values in this field, a program can prevent the V86
interrupt or abort the program. In ourint9(), for example,
frameopts=_STK_NOINT causes CodeBuilder not to reissue the keyboard interrupt.
This prevents the BIOS from gaining control when CB386 detects the
CONTROL-ALT-DELETE keys.
Virtual memory can cause special problems for Interrupt Service Routines
(ISRs). If an ISR is swapped out when its interrupt occurs, it must be swapped
back in before the interrupt can proceed. At best, this delays the interrupt,
holding up the entire system. As a worst case, imagine that the ISR controls
the disk interrupt. When the ISR must be swapped in, CodeBuilder tries to read
it from disk. CodeBuilder reissues the disk interrupt, which triggers another
request to swap in the ISR. This is a deadlock situation, and must be
prevented.
The lockint9() function in REBOOT.C uses the DPMI functions to lock the
interrupt 9 handler in memory. The _dpmi_lockregion() function ensures a block
of memory will remain in RAM. CodeBuilder may swap unlocked blocks to disk.
Locks always occur on 4-Kbyte pages, so we assume ourint9() won't be larger
than 4K and sidestep the issue of computing the length of the function.
The same problem arises when ISRs use global or static data -- you also must
lock the data in memory. Intel suggests examining the maps generated by the
linker to find the address ranges to lock. Of course, when you recompile, you
must reexamine the maps, alter the code, and compile again -- a big time
waster. Because lockint9() doesn't use global or static data, this problem
doesn't appear.


Environment Problems


One final note about CB386. The DOS menu command uses the standard putenv()
function to change the DOS shell's prompt. In earlier versions of CodeBuilder,
putenv() did not modify the environment such that child processes got the new
values. Intel's latest versions handle this code correctly. If you have the
older version, there is no harm -- but the prompt will remain unchanged.


Analysis


The CodeBuilder compiler is very similar to the Microsoft C compiler. However,
certain features (ASM, interrupt handling, and so on) are missing or very
different, so don't expect to port complex code effortlessly. Intel and
Microsoft also implement a completely different set of pragmas. The graphics
functions only support EGA and VGA monitors. Also, the presentation graphics
library (in Microsoft C 6.0) is not present.
It appears Intel chose to make the compiler as simple to use as possible.
CodeBuilder is a 32-bit C compiler, with a simple DOS extender built in to
support it. This simplicity means you can learn CodeBuilder in short order,
but prevents you from taking advantage of many 386 features that other
extenders allow you to use. Of course, many of these features are not
available under DPMI host, anyway.
Because CodeBuilder maps DOS memory at its physical address; accessing the
screen or other physical memory couldn't be much easier. Flat 32-bit pointers
also simplify programming. Of course, with any flat model, you lose many
benefits of memory protection -- you can't expect the DOS extender to catch
null pointer references, for example.
Calling DOS and BIOS calls with CodeBuilder was straightforward. It would be
better if CodeBuilder did not require CB386 to manage a DOS buffer for its
BIOS calls. CodeBuilder lifts this restriction on most other calls, however.
Also, to test CodeBuilder's simple benchmark, I wrote a program, TIMING.C
(Listing Six, page 88), that will compile in real or protected mode. It
exercises the VGA and writes large files to disk. The results, with those for
Microsoft C and Phar Lap's 286 DOS Extender are in Table 1. CodeBuilder isn't
too fast with the graphics operations and pays a steep penalty for running DOS
and BIOS calls in V86 mode. High-performance programs will want to minimize
their use of DOS and BIOS calls.
Table 1: Results from TIMING. C

 Graphics File
----------------------------------
 Real Mode 19 12

 286DOS Extender 22 24
 (default options)

 286DOS Extender 22 14
 (-XFER 32 option)

 CodeBuilder 36 62
 (default options)

 CodBuilder 36 62
 (/xiobuf=64K option)



Final Notes


I personally found the documentation and the debugger disappointing. Still,
CodeBuilder is a fairly new product -- it should improve over time. The
compiler seems solid, and the libraries are complete and mostly bug free.
CodeBuilder's simplicity will appeal to those with straightforward code to
port or portable code to write. Certainly, its royalty-free distribution
license will appeal to everyone. Protected-mode programming veterans may find
CodeBuilder lean on 386-specific features. Yet, as you can see, it is capable
of creating substantial applications.


Bibliography


Intel Corp. Intel 386/486 C Code Builder Kit Reference Manual, Santa Clara,
Calif.: Intel Corp., 1991.
Intel Corp., et al., DOS Protected Mode Interface (DPMI) Specification. Santa
Clara, Calif.: Intel Corp., 1990.
Microsoft Corp. Microsoft C Run-Time Library Reference, Redmond, Wash.:
Microsoft Press, 1990.
Williams, Al. DOS 5: A Developer's Guide, Redwood City, Calif.: M&T Press,
1991.



Products Mentioned


Intel 386/486 C Code Builder Kit
Intel 5200 N.E. Elam Young Parkway Hillsboro, OR 07124-5961 503-696-8080 $695

_YOUR OWN DISK DUPLICATOR PROGRAM_
by Al Williams


[LISTING ONE]

/****************************************************************
 * CB386.H - include file for CopyBuilder 386 *
 * See makefile for compile directives -- Al Williams *
 ****************************************************************/
#define NRTRIES 3 /* number of times to retry disk ops */

/* force codebuilder to not align */
#pragma align(_bpb=1)
/* structre of disk BPB */
extern struct _bpb
 {
 unsigned char jump[3];
 char oemname[8];
 unsigned short bytespersec;
 unsigned char secperclust;
 unsigned short ressectors;
 unsigned char nrfats;
 unsigned short rootsize;
 unsigned short nrsectors;
 char media;
 unsigned short fatsectors;
 unsigned short secpertrack;
 unsigned short nrheads;
 unsigned int hiddensecs;
 unsigned int hugesectors;
 unsigned char physdrive;
 char notused;
 unsigned char signature; /* should be 0x29 */
 unsigned int serno;
 char label[11];
 char type[8];
 char pad[512-60]; /* rest of 512 byte sector */
 } bpb;
/* various globals */
extern int driveno;
extern unsigned disksize;
extern unsigned sectorct;
/* disk image buffer */
extern unsigned char *diskbuf;
/* DOS buffer used to communicate with BIOS */
extern char *dosbuf;
/* set by critical errors */
extern int critical_err;
/* information on buffer */
extern struct _bufinfo
 {

 char title[65];
 unsigned size;
 unsigned short copies;
 char source[13];
/* checksums (stored and computed) */
 unsigned short csum, ccsum;
 short dirty;
 } bufinfo;
extern void (*slbreak)(); /* place to hook break handler */
/* holds disk format command */
extern char fmtcmd[];
/* additional break handler */
extern void (*when_break)();
/* prototypes for break handlers */
void load_break();
void save_break();
/* general prototypes */
int sector_read(int head, int track, int sector, int drive,
 unsigned char *buf,unsigned count);
int sector_write(int head, int track, int sector, int drive,
 unsigned char *buf,unsigned count);
/* disable reboot */
void noreboot(void);
void okreboot(void);





[LISTING TWO]


/****************************************************************
 * CB386.C - main file for CopyBuilder 386 *
 * See makefile for compile directives -- Al Williams *
 ****************************************************************/
#include <stdio.h>
#include <dos.h>
#include <ctype.h>
#include "cb386.h"
#include "display.h"

/* current drive number */
int driveno;
/* current bios parameter block */
struct _bpb bpb;
/* size of disk */
unsigned disksize;
/* information about buffer */
struct _bufinfo bufinfo;
/* number of sectors */
unsigned sectorct;
/* storage for disk image */
unsigned char *diskbuf;
/* set when a critical error occurs */
int critical_err;
/* DOS buffer for disk BIOS reads and writes */
char *dosbuf;
/* text of format command */

char fmtcmd[129];
/* set when a critical error occurs */
int critical_err;
/* critical error messages */
static char *cmsgs[]=
 {
 "Write protect",
 "Unknown unit",
 "Drive not ready",
 "Unknown command",
 "CRC error",
 "Bad Request",
 "Seek error",
 "Unknown media",
 "Sector not found",
 "Out of paper",
 "Write error",
 "Read error",
 "General failure",
 "Unknown",
 "Unknown",
 "Invalid change"
 };
/* Critical error handler */
void cerror(int dev,int code)
 {
 char msg[81];
 int choice;
 critical_err=code&=0xFF;
 sprintf(msg,"Critical error: %s (Retry, Fail, Ignore)", cmsgs[code]);
 choice=prompt(msg,"RIF",ERRCOLOR);
 if (choice==0) choice=_HARDERR_RETRY;
 else if (choice==1) choice=_HARDERR_IGNORE;
 else choice=_HARDERR_FAIL; /* F or ESC */
 _hardresume(choice);
 }
/* main program */
main(int argc,char *argv[])
 {
/* set up critical error handler */
 _harderr(cerror);
 if (argc>2argv[1][0]=='?'argv[1][1]=='?')
 error("CopyBuilder 386 by Al Williams\n"
 "A diskette duplication system\n"
 "Usage: CB386 [drive]\n");
/* save current drive/directory */
 cdsave();
/* set up video mode and detect monitor type */
 vidmode();
/* if mono monitor, set up neutral colors */
 if (mono)
 {
 TEXTCOLOR=7;
 SOCOLOR=0x70;
 ERRCOLOR=1;
 HELPCOLOR=7;
 }
/* reset title, etc. */
 strcpy(bufinfo.title,"<EMPTY>");

 strcpy(bufinfo.source,"N/A");
 strcpy(fmtcmd,"format");
 noreboot();
/* if user asked for a disk on the command line, read it */
 if (argc==2)
 {
/* show a display */
 disp();
/* read the disk */
 read_disk(*argv[1]);
 }
/* goto main menu */
 menu();
 }
/* Non-interactive error routine */
error(char *s)
 {
 printf("\n%s\n",s);
 cdrestore();
 exit(1);
 }
/* compute checksum
 add sixteen bit words and wrap carry around */
checksum()
 {
 unsigned char *bufp=diskbuf;
 int i=bufinfo.size;
 unsigned short cksum=0;
 unsigned cksum1;
 while (i--)
 {
 cksum1=cksum;
 cksum1+=*bufp++;
 cksum=cksum1&0xFFFF;
 if (cksum1&0xFFFF0000)
 cksum++;
 }
 bufinfo.ccsum=cksum;
 return cksum;
 }
/* Format a disk in the indicated drive */
format(int driveno)
 {
 char fcmd[80];
 int stat;
/* build command line */
 sprintf(fcmd,"%s %c:",fmtcmd,driveno+'A');
/* save video */
 vidsave();
/* execute command */
 stat=system(fcmd);
 vidrestore();
 if (stat==-1) advise("Unable to execute format program");
 }
/* turn the blinking -wait- on at bottom of screen */
wait_on()
 {
 int ocolor=color;
 color=SOCOLOR0x80; /* make it blink */

 goxy(0,24);
 clreol();
 goxy(37,24);
 printfc("-WAIT-");
 curshide();
 color=ocolor;
 }
/* turn the blinking -wait- off at bottom of screen */
wait_off()
 {
 goxy(0,24);
 clreol();
 }







[LISTING THREE]

/****************************************************************
 * FILEIO.C File oriented I/O routines for CB386 *
 * See makefile for compile directives -- Al Williams *
 ****************************************************************/
#include <stdio.h>
#include <malloc.h>
#include <errno.h>
#include "cb386.h"
#include "display.h"

void (*slbreak)(); /* place to hook break handler */

/* ^C handler when saving -- used in DISKIO.C too */
void save_break()
 {
 advise("\aOperation aborted");
 if (slbreak) slbreak();
 when_break=slbreak;
 }
/* ^C handler when loading -- used in DISKIO.C too */
void load_break()
 {
 cleanup();
 when_break=slbreak;
 }
/* Clean up an aborted load */
cleanup()
 {
 if (diskbuf) free(diskbuf);
 diskbuf=NULL;
 memset(&bufinfo,0,sizeof(bufinfo));
 strcpy(bufinfo.title,"<EMPTY>");
 strcpy(bufinfo.source,"N/A");
 }
/* save disk image to a file */
m_save()
 {

 slbreak=when_break;
 when_break=save_break;
 saveit();
 when_break=slbreak;
 slbreak=NULL;
 }
/* load disk image to a file */
m_load()
 {
 slbreak=when_break;
 when_break=load_break;
 loadit();
 when_break=slbreak;
 slbreak=NULL;
 }
/* actual routine to save data */
saveit()
 {
 char fn[66];
 FILE *f=NULL;
 int cnt1;
/* if no data to save, forget it */
 if (!diskbuf)
 {
 advise("No disk image in memory");
 return;
 }
/* get a file name */
 if (!getfilen(fn,65)) return;
 critical_err=0;
/* check if the file already exists */
 if (!access(fn,0))
 {
 if (prompt("\aFile exists. Overwrite? (Y/N)"
 ,"NY",TEXTCOLOR)<=0)
 return;
 }
/* if checking for the file caused a critical error,
 don't even try to open the file */
 if (!critical_err) f=fopen(fn,"wb");

/* if the file can't be opened (or wasn't because of
 a critical error, forget it */
 if (!f)
 {
 advise("Can't open file for writing");
 return;
 }
/* Get a title */
 if (ask("Title: ",NULL,TEXTCOLOR,64,fn,NULL)==-1) return;
/* only change title if new one was entered */
 if (*fn) strcpy(bufinfo.title,fn);
/* Turn the wait indicator on */
 wait_on();
 /* set dirty bit in image. copies/source/csum are meaningless */
 bufinfo.dirty=1;
/* write signature */
 putw('C386',f);
/* write size of bufinfo */

 putw(sizeof(struct _bufinfo),f);
/* write bufinfo */
 fwrite(&bufinfo,1,sizeof(struct _bufinfo),f);
/* write entire buffer in one swoop */
 cnt1=fwrite(diskbuf,1,bufinfo.size,f);
/* one is deliberate here -- we always want to fclose */
 if (ferror(f)fclose(f)(cnt1!=bufinfo.size))
 advise("\aError writing to file");
 else
 bufinfo.dirty=0;
 wait_off();
 }
/* routine to actually load file to disk image */
loadit()
 {
 char fn[66];
 FILE *f;
 int cnt,cnt1;
/* if buffer is full, confirm */
 if (!dirtyquery()) return;
/* get file name */
 if (!getfilen(fn,65)) return;
/* open it */
 f=fopen(fn,"rb");
/* can't open it, forget it */
 if (!f)
 {
 advise("Can't open file");
 return;
 }
/* Turn wait indicator on */
 wait_on();
/* check file type */
 cnt=getw(f);
 if (cnt!='C386')
 {
 advise("Not a C386 image file");
 return;
 }
/* get size of bufinfo structure */
 cnt=getw(f);
/* new versions of C386 can't make bufinfo smaller, only larger
 at the end */
 fread(&bufinfo,1,cnt,f);
/* kill old image */
 if (diskbuf) free(diskbuf);
/* allocate new image based on size */
 diskbuf=malloc(bufinfo.size);
 if (!diskbuf)
 {
 advise("Insufficient memory");
 cleanup();
 fclose(f);
 return;
 }
/* read it all in */
 cnt1=fread(diskbuf,1,bufinfo.size,f);
/* one is deliberate here also (see saveit(), above) */
 if (ferror(f)fclose(f)(cnt1!=bufinfo.size))

 {
 advise("Error reading file");
 cleanup();
 return;
 }
 checksum();
 strcpy(bufinfo.source,fn);
 bufinfo.copies=0;
 wait_off();
 }









[LISTING FOUR]

/****************************************************************
 * REBOOT.C - Disable ^ALT-DEL for CodeBuilder programs *
 * Al Williams -- August 1991 *
 ****************************************************************/
#include <stdio.h>
#include <i32.h>
#include <stk.h>
#include <dos.h>
#include <conio.h>

/* When running DOS we replace the keyboard interrupt
 (INT 9) */
/* old INT 9 handler */
static void (*oldint9)();
#pragma interrupt(ourint9)
/* replacement interrupt handler */
static void ourint9()
 {
 int code,temp;
/* pointer to CodeBuilder stack frames */
 _XSTACK *frame=_get_stk_frame();
/* pointer to BIOS shift status byte */
 unsigned char *shift_status=(unsigned char *)0x417;
/* read keyboard */
 code=inp(0x60);
/* DEL is scan code 0x53 -- if *shift_status&0xc==0xc then shift
 and Alt are down */
 if (code!=0x53(*shift_status&0xc)!=0xc) _chain_intr(oldint9);
/* will not allow ^ALT-DEL */
/* consume key from keyboard */
 temp=inp(0x61);
 outp(0x61,temp0x80);
 outp(0x61,temp);
 outp(0x20,0x20);
/* Tell CodeBuilder not to reissue interrupt */
 frame->opts=_STK_NOINT;
 return;
 }

/* This function locks the page ourint9 is on so it can't be swapped out.
 4K is the minimum size, and surely ourint9 isn't that big.... */
static lockint9(int flag)
 {
 static lock=0;
 if (flag!=lock)
 {
 lock=flag;
 if (flag)
 _dpmi_lockregion(ourint9,4096);
 else
 _dpmi_unlockregion(ourint9,4096);
 }
 }
/********************** External interfaces ******************/
/* Disable ^ALT-DEL */
void noreboot()
 {
 lockint9(1);
 oldint9=_dos_getvect(9);
 _dos_setvect(9,ourint9);
 }
/* Enable ^ALT-DEL */
void okreboot()
 {
 lockint9(0);
 _dos_setvect(9,oldint9);
 }








[LISTING FIVE]

/****************************************************************
 * DISPLAY.C general purpose Microsoft C text display library *
 * See also DISPLAY.H DIRPICK.C, and HISTO.C -- Al Williams *
 ****************************************************************/
#include <stdio.h>
#include <dos.h>
#include <string.h>
#include <stdarg.h>
#include "display.h"

/* global variable sets color of output */
int color=7;
/* primary colors for color monitor (see display.h) */
int colors[4]={ 0x1e, 0x70, 0x1c, 0x7e };
/* 1=mono monitor, 0=color, -1=unknown */
int mono=-1;
/* set video mode and detect mono monitor
 this should always be called first */
void vidmode()
 {
 union REGS r;

 if (mono<0)
 {
 r.h.ah=0xf;
 int86(0x10,&r,&r);
 mono=r.h.al==7;
 }
 r.x.ax=mono?7:3;
 int86(0x10,&r,&r);
 r.x.ax=0x500;
 int86(0x10,&r,&r);
 }
/* goto point x,y (from 0-79 and 0-24) */
void goxy(int x,int y)
 {
 union REGS r;
 r.h.ah=2;
 r.h.dh=y;
 r.h.dl=x;
 r.h.bh=0;
 int86(0x10,&r,&r);
 }
/* clear screen region */
void clears(int x0, int y0,int x1,int y1)
 {
 union REGS r;
 r.x.ax=0x600;
 r.h.bh=color;
 r.h.ch=y0;
 r.h.cl=x0;
 r.h.dh=y1;
 r.h.dl=x1;
 int86(0x10,&r,&r);
 goxy(0,0);
 }
/* get x,y position */
void getxy(int *x,int *y)
 {
 union REGS r;
 r.h.ah=3;
 r.h.bh=0;
 int86(0x10,&r,&r);
 *x=r.h.dl;
 *y=r.h.dh;
 }
/* write count characters
 -- handle \r, \n, backspace, and \a
 updates the cursor if count==1 (w/o line wrap)
 otherwise, the cursor doesn't move */
void writecc(int c,int count)
 {
 union REGS r;
/* PS/2 BIOS tries to print 0 characters! */
 if (count<=0) return;
/* if bell character... */
 if (c=='\a')
 {
/* use function 0eH to do count bells */
 while (count--)
 {

 r.x.ax=0xe00'\a';
 r.x.bx=0;
 int86(0x10,&r,&r);
 }
 return;
 }
/* if regular character (not \n or \r or bs) */
 if (c!='\n'&&c!='\r'&&c!=8)
 {
/* print regular character */
 r.h.ah=9;
 r.h.al=c;
 r.h.bh=0;
 r.h.bl=color;
 r.x.cx=count;
 int86(0x10,&r,&r);
/* if count isn't 1 return else do cursor update
 NOTE: \n \r always update cursor */
 if (count!=1) return;
 }
/* get cursor position */
 r.h.ah=3;
 r.h.bh=0;
 int86(0x10,&r,&r);
/* if \r, zero x coordinate
 Note that 100 \r's is the same as 1 */
 if (c=='\r')
 r.h.dl=0;
/* if \n, increment y coordinate by count */
 else if (c=='\n')
 r.h.dh+=count;
/* if backspace back up by count or to start of line */
 else if (c==8)
 r.h.dl-=r.h.dl>count?count:r.h.dl;
 else
/* bump x coordinate. Assume it won't wrap over */
 r.h.dl++;
 r.h.ah=2;
 int86(0x10,&r,&r);
 }
/* write a string using writec, a writecc macro
 (see display.h) */
void writes(char *s)
 {
 while (*s) writec(*s++);
 }
/* printf using writecc max length 99 */
int printfc(char *fmt,...)
 {
 int rc;
 char outbuf[100];
 va_list aptr;
 va_start(aptr,fmt);
 rc=vsprintf(outbuf,fmt,aptr);
 writes(outbuf);
 return rc;
 }
/* prompt for single key @ coordinates x,y
 use str as prompt, resp is valid keys, pcolor

 is the color to use (0 for same color). Alpha characters
 in resp should be upper case. If resp is "" then all
 characters are valid. If resp is NULL then any alpha
 character is valid.
 returns:
 -1 if ESC pressed
 index of character if resp is valid
 character if resp is NULL or ""
 */
int prompt_at(int x, int y, char *str,char *resp,int pcolor)
 {
 int ocolor,c;
 char *index;
 goxy(x,y);
 ocolor=color;
 if (pcolor) color=pcolor;
/* clear to end of line */
 clreol();
 writes(str);
 while (1)
 {
/* get key */
 c=getch();
 if (!c)
 {
/* ignore extended keys */
 getch();
 continue;
 }
/* if esc quit */
 if (c==27) break;
/* shift upper */
 c=toupper(c);
/* if resp in not null, check it */
 if (resp&&(index=strchr(resp,c))) break;
/* if resp is null, check for alpha */
 if (resp==NULL&&isalpha(c)) break;
/* if resp=="" then anything is OK */
 if (resp&&!*resp) break;
 }
 color=ocolor;
 goxy(x,y);
 clreol();
 curshide();
 return c==27?-1:resp&&*resp?index-resp:c;
 }
/* prompt for input @x,y. Prompt with promptstr valid is a string of valid
 input characters (if NULL, all characters are OK., clr is the color
 (0 for same), len is the input length, buf is the buffer (should be at
 least len+1 long, and help is an optional help string (use NULL for the
 default help). returns: -1 if ESC # of characters input otherwise
 You can use the backspace key to edit entries */
int ask_at(int x, int y, char *promptstr,char *valid,
 int clr,int len,char *buf,char *help)
 {
 int count=0,c,ocolor=color;
 char *bp=buf;
/* clear buffer */
 memset(buf,0,len+1);

/* set color, goto input line, and clear it */
 if (clr) color=clr;
 goxy(x,y);
 clreol();
/* write prompt */
 writes(promptstr);
/* main loop */
 while (1)
 {
/* get a character. Extended keys are <0 */
 c=getch();
 if (!c) c=-getch();
/* handle backspace */
 if (c==8)
 {
 if (bp!=buf)
 {
 bp--;
 writec(8);
 writec(' ');
 writec(8);
 *bp='\0';
 count--;
 }
 continue;
 }
/* Escape or enter ends input */
 if (c=='\r'c==27)
 {
/* restore color */
 color=ocolor;
/* clear line */
 goxy(x,y);
 clreol();
 curshide();
/* return */
 return c==27?-1:count;
 }
/* If F1 give help */
 if (c==-59)
 {
 vidsave();
 prompt(help?help:
 " Use <ENTER> to accept, <ESC> to quit, <Backspace>"
 " to correct.","",SOCOLOR);
 vidrestore();
 continue;
 }
/* ignore other extended keys (c<0) or regular keys if
 at input limit (count==len) */
 if (count==lenc<0) continue;
/* if not valid character, ignore */
 if (valid&&!strchr(valid,c)) continue;
/* echo input character */
 writec(c);
/* store in buffer */
 *bp++=c;
/* update count */
 count++;

 }
 }
/* routines to save and restore the video context */
/* places to save things */
static char vbuf[4096];
static int save_xy,save_color;
/* save video */
void vidsave()
 {
 union REGS r;
 save_color=color;
 r.h.ah=3;
 r.h.bh=0;
 int86(0x10,&r,&r);
 save_xy=r.x.dx;
 memcpy((void *)vbuf,(void *)(mono?0xb0000:0xb8000),4096);
 }
/* restore video */
void vidrestore()
 {
 union REGS r;
 memcpy((void *)(mono?0xb0000:0xb8000),(void *)vbuf,4096);
 color=save_color;
 r.h.ah=2;
 r.h.bh=0;
 r.x.dx=save_xy;
 int86(0x10,&r,&r);
 }







[LISTING SIX]

/******************************************************************
 * TIMING.C - simple non-rigorous benchmark for Phar Lap's *
 * 286 DOS Extender & Intel 386 CodeBuilder *
 * Compile with: ICC timing.c graphics.lib (386 protected mode) *
 * OR: *
 * CL -AL -Lp -G2 -Ox timing.c graphp.obj llibpe.lib graphics.lib *
 * (286 protected mode) *
 * OR: *
 * CL -AL -G2 -Ox timing.c graphics.lib *
 * (real mode) *
 ******************************************************************/
#include <stdio.h>
#include <graph.h>
#include <time.h>

#define time_mark time_it(0)
#define time_done time_it(1)

main()
 {
 printf("Timing graphics operations\n");
 time_mark;

 gtest();
 time_done;
 printf("Timing file operations\n");
 time_mark;
 ftest();
 time_done;
 exit(0);
 }
/* Function to mark times */
int time_it(int flag)
 {
 static clock_t sttime;
 unsigned s;
 if (!flag)
 {
 sttime=clock();
 }
 else
 {
 s=(clock()-sttime)/CLK_TCK;
 printf("Elapsed time: %d seconds\n",s);
 }
 return 0;
 }
/* Graphics test -- must have VGA */
int gtest()
 {
 int i,x,y;
 _setvideomode(_MRES256COLOR);
 for (i=1;i<11;i++)
 {
 _setcolor(i);
 for (y=0;y<199;y++)
 for (x=0;x<319;x++)
 _setpixel(x,y);
 }
 _setvideomode(_DEFAULTMODE);
 return 0;
 }
/* File test -- assumes 320K free on current drive */
char filedata[64000];
int ftest()
 {
 FILE *tfile;
 int i,j;
 for (j=0;j<10;j++)
 {
 tfile=fopen("~~TIMING.~@~","w");
 if (!tfile)
 {
 perror("TIMING");
 exit(1);
 }
 for (i=0;i<5;i++)
 fwrite(filedata,sizeof(filedata),1,tfile);
 if (fclose(tfile))
 {
 perror("TIMING");
 }

 unlink("~~TIMING.~@~");
 }
 return 0;
 }


























































January, 1992
UNTANGLING SMARTDRIVE


Effective disk caching




Geoff Chappell


Geoff is a mathematician with a special interest in neural nets. He can be
contacted at International Hall, Brunswick Square, London WC1N 1AS, England or
on Internet at uunet!cix.compulink.co.uk!geoffc.


For some time now, Microsoft has included SMARTDRV.SYS, a device driver for
the SMARTDrive disk cache, with its language compilers, Windows, and (more
recently) DOS itself. This article explores SMARTDrive's inner workings so
that you can use the cache effectively (or decide not to use it at all).
Additionally, I've included a program that demonstrates how SMARTDrive can be
queried or reconfigured using DOS's device driver I/O Control (IOCTL)
interface as a starting point for programmers who want to cooperate with
SMARTDrive or claim its memory for themselves.
As of version 3.13, SMARTDRV.SYS accepts two numerical parameters and five
switches on its command line in the CONFIG.SYS file. I'll begin by sketching
the general features of a SMARTDRV.SYS device already loaded into memory on a
typical machine.
The cache is created in extended or expanded memory, in a single allocation,
the size of which is tailored to a multiple of 16 Kbytes for ease of memory
management. SMARTDrive refuses to load unless at least 128 Kbytes of extended
memory are available, although it will proceed with a smaller allocation if
the user insists on a smaller cache size. For the most part, SMARTDrive is
indifferent to the type of memory used for the cache. This article assumes use
of extended memory, but bear in mind that the cache will be allocated out of
expanded memory if the /a switch is supplied on the device driver's command
line.
The device driver itself stays in conventional memory (or upper memory, if
loaded high), where it consumes some 13 Kbytes or so. Of this, approximately
the first 4 Kbytes is code and data for the program. The bulk of the driver's
space is taken by an intermediate buffer as big as the biggest track on any
hard disk. SMARTDrive deals with data from disk in terms of this track size
and maintains the intermediate buffer so it may fill requests which do not
cover whole tracks. On a PC with hard disks that are formatted to 17 sectors
per track, this buffer will be a little more than 8 Kbytes. Following the
intermediate buffer are the Cache Control Blocks (CCBs)--a double-linked list
of 14-byte structures, one for each track the cache has been configured to
hold. The cache in extended memory is nothing but a set of track buffers laid
contiguously. All the information describing a given buffer's contents is held
in the corresponding CCB.
Figure 1 presents the memory layout, while Table 1 details the Control Block
structure. Note that although the CCBs are ordered physically in one-to-one
correspondence with the track buffers, the logical links between CCBs are
maintained by the caching algorithm in such a way that CCBs for less recently
used tracks are further down the list. The double linkage speeds retrieval of
recently used tracks and avoids wasting time when searching for a spare buffer
to load with a new track.
Table 1: The Control Block structure

 Offset Size Description
 ------------------------------------------------------------------------

 00h word Address of CCB for next most recently used track (or FFFFh)
 02h word Address of CCB for next track used more recently (or FFFFh)

 04h dword Offset into cache of corresponding track buffer

 08h word Flags Track buffer's contents are:
 xxxx xxxx xxxx xxx1 Invalid (empty)
 xxxx xxxx xxxx xx1x Dirty
 xxxx xxxx xxxx x1xx Nondiscardable

 0Ah word Head and drive for track--
 dx parameter for int 13h but with bit 80h stripped
 0Ch word Cylinder for track--
 cx parameter for int 13h but with sector field cleared

*In a Control Block marked empty, the other flag bits and disk parameters are
meaningless.

*A nondiscardable track will not be considered for replacement during a search
for an old track to make way for a new one, but will be discarded if the whole
cache is invalidated.
Caching is implemented by intercepting int 13h, the software interrupt used
for BIOS-level disk services. SMARTDrive is interested only in fixed disks, of
which existence and characteristics determines during initialization by
querying int 13h function 08h. Support is provided for up to 16 physical hard
disks. (Compare this with the default block device driver in IO.SYS which
until DOS 5.0, recognized no more than two.)
From the outset, it must be understood that SMARTDrive assumes a normal
register convention for int 13h calls and is therefore incompatible with a
variety of systems for accommodating disks with more than 1024 cylinders.
Recent versions of SMARTDRV.SYS search for partitioning schemes known to
indicate problems. In some cases the difficulty can be overcome, so command
line switches are provided to direct SMARTDrive to skip this checking and
proceed with its installation on the assumption that the user has obtained a
remedy for any incompatibility problems. Briefly, the /p switch defeats all
checks and the recently added /y directs SMARTDrive not to pursue certain
schemes involving extended partitioning.
SMARTDrive also hooks int 19h, as must all programs which seek both to control
non-DOS interrupts and be considered well-written. The point is that int 19h
reloads the operating system without reinitializing the ROM BIOS; interrupt
vectors that will not be reset by the newly reloaded DOS must therefore be
restored before the BIOS receives the int 19h direction.
Handlers are provided in SMARTDrive's code for two other interrupt vectors,
but have so far been left unactivated in SMARTDRV.SYS. This is because
SMARTDrive is a write-through cache. Therefore, it need not intercept the
timer interrupt (int 1Ch) to ensure that dirty tracks in the cache get flushed
to disk regularly, nor trap attempts to reboot the computer via Ctrl-Alt-Del,
although a handler for the int 09h keyboard interrupt is waiting in the wings.
To complete the general picture, notice that SMARTDrive provides an interface
for querying its configuration or changing the behavior of its cache. This
interface is not implemented via software interrupt, but through a service DOS
provides for communicating with device drivers that is largely unfamiliar to
DOS programmers. Before constructive use of this interface can be
demonstrated, however, we must first discuss the methods SMARTDrive employs to
manage the cache.


The Cache Algorithm


Only int 13h functions 02h (read) and 03h (write) lead to nontrivial
processing. Some functions are passed transparently, but many cause all tracks
in the cache to be discarded--a fresh start, needed for instance when a disk
is formatted, but also triggered by a write request with implausible
parameters or the use of an unknown int 13h function. In general, any
situation (particularly, any error) deemed capable of compromising the
integrity of even one cached track is dealt with by invalidating the whole
cache.
Consider the chain of events following the interception of a request to read
some number of sectors from a hard disk. After establishing that the
parameters describing the request make sense, SMARTDrive adds the number of
sectors involved to a tally it keeps of sectors read during the session. It
then decomposes the request into three pieces: a partial track at the
beginning, a body of whole tracks, and a partial track at the end.
These components are processed in the order given, but it is easiest to
consider the body of whole tracks first. Each track in turn is sought in the
cache. If present, each may be copied immediately to the appropriate location
in the int 13h caller's buffer, presenting a considerable gain over a disk
access. To enable SMARTDrive to provide an estimator of its success, a count
of sector's retrieved in such "cache hits" is kept for comparison with the
total number of sectors read. The CCB for the track is then moved to first
place in the list (as also happens whenever a new track is entered in the
cache), both to increase the speed with which the track may be found again and
to decrease the chance of its being discarded from the cache, for whenever
SMARTDrive needs to enter a new track it goes to the end of the list to begin
its search.
If a track is not in the cache, it will of course have to be read from disk.
But SMARTDrive does not process such tracks individually. Instead, it builds a
set of contiguous tracks which may be read straight to the int 13h caller's
buffer in one block. After reading the tracks from disk, SMARTDrive copies
them one by one to the cache. Ordinarily, this is a simple matter of finding
either an empty track buffer or the one used least recently (and therefore
most eligible for replacement). However, tracks in the cache may be marked
nondiscardable, leaving open the possibility that no space exists in the cache
for a new track.
SMARTDrive indulges a certain defeatism in this situation and also in response
to an error when reading from disk: Although it doesn't discard the cache, it
abandons the current request, passing it along the int 13h chain, even though
it may have already filled the bulk of the request successfully.

Partial tracks present a special difficulty, for if the track does not exist
in the cache and must therefore be read from disk, where should it be loaded?
It cannot be read directly into the int 13h caller's buffer and must therefore
be loaded into an intermediate buffer in SMARTDrive's own memory. From this
buffer, the relevant sectors may be copied to their destination and the whole
track may be moved to the cache.
The special treatment necessary for partial tracks can be turned to advantage.
First, SMARTDrive remembers the disk parameters for any track it loads into
the intermediate buffer to fill a request for a partial track, and it is
therefore possible that the next time sectors are sought in a partial request,
they will not only lie somewhere in the cache, but still be in the
intermediate buffer. This situation can be dealt with very quickly indeed,
because it avoids even the small time delay involved in moving data from
extended memory, and is regarded as sufficiently special that the statistics
maintained by SMARTDrive record these "buffer hits" separately.
Second, the intermediate buffer may be used indirectly to help systems such as
Windows support address paging when running programs in protected mode. Under
these systems, it may be impossible to use the BIOS int 13h services to read
from or write to memory the linear and physical addresses of which differ.
Since version 3.11, SMARTDrive's initialization has included verification that
the first sector of each hard disk could in fact be read using int 13h,
although this may be overridden with the /u switch. The intermediate buffer is
clearly intended to have matching linear and physical addresses. As such, it
would be extravagant for DOS extenders to create another buffer, so SMARTDrive
provides a facility for having all the disk read/write activity it intercepts
pass through its intermediate buffer. This double-buffering, as Microsoft
calls it, usually works in tandem with the Virtual DMA interface. This is
implemented as protected-mode interrupt 4Bh but indicates its activity to real
mode and virtual-8086 mode programs by setting a flag in the BIOS data area.
By monitoring the flag, SMARTDrive supports double-buffering when needed.
Other schemes exist for overcoming DMA problems, so command-line switches are
provided both to disable double-buffering (/b-) or to force it to be on all
the time (/b+).
Having covered SMARTDrive's interception of requests to read data from disk,
it remains only to elaborate the meaning of "write-through." In present
versions, SMARTDrive does not retain in memory tracks not sent to disk. On
receipt of a write request, SMARTDrive passes it along the chain immediately,
catching the return so that any tracks which were in the cache may be updated.
Any form of error, be it with the disk operation or with the transfer to
extended memory, causes SMARTDrive to invalidate the whole cache.


Speaking Terms


As noted earlier, the interface provided for interrogating SMARTDrive or
modifying its behavior has a style which may seem foreign to many DOS
programmers. It is commonplace for resident programs to be coded as device
drivers to get them into memory as early as possible, but most such programs
follow the familiar method of hooking a software interrupt on which to support
their communication with other programs. Few programs take advantage of the
I/O Control interface which DOS provides in parallel to normal read and write
functions, despite the opportunity offered to avoid interrupt conflicts.
Character device drivers are known to DOS by name and as such may be opened
with the usual DOS functions or high-level language equivalents. Ordinarily,
device drivers are opened to transfer data to or from a physical device such
as a video screen or printer, though in these cases the relevant drivers, CON
and PRN, will usually have been opened as predefined handles. IOCTL is
provided as a means of communicating with the driver rather than the physical
device behind it, most especially for controlling the way in which the driver
regards both the device and the data being passed to and fro.
Reading and writing both require specifying the handle for the device, the
number of bytes to transfer, and the whereabouts of the data or buffer. IOCTL
operates the same way, so similarly in fact that the file IOCTL.C presented in
Listing One (page 90) contains functions with exactly the same prototypes as
the Microsoft C library functions _dos_read() and _dos_write().
By no means is it obligatory for a device driver to acknowledge an IOCTL
call--indeed, DOS does not actually attempt the communication without first
inspecting the attribute word in the device driver's header to establish that
IOCTL is supported. Given that a known device driver name has to be supplied
in the first place, this makes IOCTL communication much more secure than
issuing a software interrupt with only a vague idea of what might be at the
receiving end. With IOCTL, problems are reported by standard DOS error codes.
Note, though, that just as the details of an interrupt-based API vary from one
to another, so too is the interpretation of IOCTL data a different matter for
each different driver.
In SMARTDrive's case, the name to use is SMARTAAR, and the structures it
understands for the IOCTL read and write functions are described in Tables 2
and 3, respectively. An IOCTL read function, properly executed, should fill a
44-byte buffer with information on SMARTDrive's performance and configuration.
For the IOCTL write function, the first byte in the data packet is interpreted
as a command code, to be followed by extra data if the particular command
requires it.
Table 2: Data structure for SMARTDrive IOCTL Read

 Offset Size Description
 -------------------------------------------------------------------------

 00h byte Unused, except that it may be changed by IOCTL Write
 function 04h subfunctions 00h and 01h
 01h byte Unused, except that it may be changed by IOCTL Write
 function 04h subfunctions 02h and 03h
 02h byte 00h if the cache has been deactivated, 01h if active
 03h byte 01h if cache is in extended memory, 02h if in expanded
 memory
 04h word Number of timer ticks between flushes -- defaults to 1
 minute but is ineffective in current versions because
 the int 1Ch handler is not installed
 06h byte 00h normally, but 01h if the cache contains tracks marked
 nondiscardable
 07h byte 00h normally, but 01h to ensure that the cache be flushed
 when int 19h is received
 08h byte Unused, except that it may be changed by IOCTL Write
 function 0Ah subfunctions 00h and 01h
 09h byte 00h if Virtual DMA buffering is disabled, 01h if Virtual
 DMA buffering is forced, 02h if the need for Virtual DMA
 buffering is determined dynamically
 0Ah dword Address of the handler which is immediately below SMARTDRV
 in the int 13h chain
 0Eh word SMARTDRV version number (minor version in the low byte)
 10h 2 bytes Unused
 12h 3 words These are values maintained for the number of sectors
 attempted to be read, the number found in the cache and
 the number found in the intermediate buffer--only ratios
 should be regarded as meaningful, because all three
 values are halved whenever one of them is about to
 overflow
 18h 2 bytes Statistical information in the form of percentage ratios
 for cache his and buffer hits respectively--these are
 maintained over the whole of the session and are not
 cleared by resetting the cache
 1Ah word Number of tracks the cache can hold
 1Ch word Number of valid tracks in the cache
 1Eh word Number of nondiscardable tracks in the cache
 20h word Number of dirty tracks in the cache
 22h word Current cache size (in multiples of 16Kbytes)
 24h word Maximum cache size (in multiples of 16Kbytes)
 26h word Minimum cache size (in multiples of 16Kbytes)

 28h dword Address of a flag for locking the cache--00h by default,
 but set to 01h for global lock (so that a new track may
 be entered in the cache only if an empty track buffer
 exists, not by displacing an existing track, whether
 discardable or not).


*The transfer count will be truncated to 0, 28h, or 2Ch, as applicable.
Table 3: SMARTDrive IOCTL Write Commands

 Command Data Description
 -------------------------------------------------------------------------

 00h -- Flush all dirty tracks to disk
 01h -- Flush and reset
 02h -- Deactivate cache (flush, reset, and disable)
 03h -- Activate cache
 04h 00h or 01h Set an otherwise unused flag to 00h or 01h,
 respectively
 04h 2h or 03h Set (a second) otherwise unused flag to 00h or 01h,
 respectively, but also flush the cache
 05h word Set the number of timer ticks between flushes to
 the value supplied (but note that current versions
 of SMARTDRV do not proceed with the installation
 of the int 1Ch handler, thereby rendering this
 function ineffective).
 06h -- Flush cache and mark current contents as
 nondiscardable
 07h -- Mark all cached tracks as discardable
 08h 00h or 01h Disable or enable respectively the facility for
 flushing the cache on int 19h
 09h -- Unused (genuinely!)
 0Ah 00h or 01h Set an otherwise unused flag to 00h or 01h,
 respectively
 0Bh word Reduce cache by the designated multiple of
 16Kbytes-- inability to resize the cache's memory
 allocation produces the General Failure error, as
 do attempts to reduce the cache by more than its
 current size
 0Ch word Increase cache by the designated multiple of
 16Kbytes--inability to resize the cache's memory
 allocation produces the General Failure error, but
 attempts to increase the cache to more than its
 maximum configured size are simply truncated
 0Dh dword Thread the given address into the int 13h chain
 below SMARTDRV (note that a pointer to the int 13h
 handler currently below SMARTDRV is returned in the
 IOCTL Read structure)


The data packet contains a command number as its first byte. For some
commands, extra data may be required, as indicated. In all such cases, an
insufficiency will be indicated by returning a transfer count of 0 bytes.
The commands which resize the cache may be especially attractive to those who
need more extended memory and would like to recover the memory used by
SMARTDrive for the disk cache, as does Windows. After opening SMARTAAR, an
IOCTL read will return the current cache size, the minimum to which it may be
reduced, and the maximum to which it may be expanded. Units of measurement are
multiples of 16 Kbytes. The resizing is conducted by supplying an increment or
decrement from the current size, not an absolute size. Reducing the cache is a
simple matter of shrinking the extended memory allocation and removing from
the linked list all CCBs corresponding to excised track buffers.
Increasing cache size is less simple, and in fact, the code responsible for
this contains a bug which will overwrite 64 Kbytes of important memory should
it occur. The problem arises only when the cache size has already been reduced
to nothing. In this case, the list of CCBs must be rebuilt from scratch. The
first CCB must provide the end-of-list markers and is therefore constructed
outside the loop which builds the others. If, however, the new cache size will
accommodate only one track, as happens if the new cache size is 16 Kbytes (and
conceivably 32 Kbytes if a disk exists with more than 32 sectors per track),
then no more CCBs need be built. Unfortunately, the assembly language loop
instruction treats 0 as a very large number.
Listings Two, Three, and Four (page 90) present the C code for a program which
I hope is commented sufficiently well that you can adapt it for your own use.
Called without command-line parameters, it displays a selection of information
reported by the SMARTDrive IOCTL read function. Its main purpose is to show
you how to resize the cache, which it does repeatedly in order to determine
the optimum cache size for a given task. Hopefully, by now you are
sufficiently familiar with SMARTDrive to deduce in which environment it will
be most effective.


_UNTANGLING SMARTDRIVE_
by Geoff Chappell



[LISTING ONE]

/*ioctl.c-functions to support IOCTL to & from devices under DOS all
functions*
* return 0 if successful, having stored an integer at an address provided as *
* the last argument failure is indicated by returning a non-0 DOS error code
*/
unsigned _cdecl _dos_gethandlestatus (int handle, unsigned *status)
{
 _asm {
 mov ax,4400h
 mov bx,handle
 int 21h
 jc done
 mov bx,status
 mov [bx],ax
 xor ax,ax
 done:
 }
}
unsigned _cdecl _dos_ioctlread (int handle, void _far *buffer,
 unsigned count, unsigned *numread)
{
 _asm {
 push ds
 mov ax,4402h
 mov bx,handle
 mov cx,count
 lds dx,buffer
 int 21h
 pop ds
 jc done
 mov bx,numread
 mov [bx],ax
 xor ax,ax
 done:
 }
}
unsigned _cdecl _dos_ioctlwrite (int handle, void _far *buffer,
 unsigned count, unsigned *numwrt)
{
 _asm {
 push ds
 mov ax,4403h
 mov bx,handle
 mov cx,count
 lds dx,buffer
 int 21h
 pop ds
 jc done
 mov bx,numwrt
 mov [bx],ax
 xor ax,ax
 done:
 }
}








[LISTING TWO]

/*** ioctl.h-function prototypes for IOCTL to & from devices under DOS ***/
unsigned _dos_gethandlestatus (int, unsigned *);
unsigned _dos_ioctlread (int, void _far *, unsigned, unsigned *);
unsigned _dos_ioctlwrite (int, void _far *, unsigned, unsigned *);







[LISTING THREE]

/* smartdrv.h - structures and definitions relating to smartdrv.sys */
#pragma pack (1)

/* The data packet returned by performing an IOCTL Read from SMARTDrive */
struct SD_READ {
 char unused_1;
 char unused_2;
 char IsActive;
 char MemoryType;
 unsigned Ticks;
 char IsLocked;
 char FlushOnReboot;
 char unused_3;
 char DoubleBuffer;
 void _far *OrgInt13;
 char MinorVersion;
 char MajorVersion;
 char unused_4;
 char unused_5;
 unsigned SectorsRead;
 unsigned SectorsHit;
 unsigned SectorsBuffered;
 char HitRatio;
 char BufferRatio;
 unsigned TracksInCache;
 unsigned CurrentTracks;
 unsigned LockedTracks;
 unsigned DirtyTracks;
 unsigned CurrentSize;
 unsigned ConfiguredSize;
 unsigned MinimumSize;
 char _far *GlobalLockFlag;
};
struct SD_WRITE {
 char command;
 union {
 char subcommand;
 int size;
 char _far *address;
 };
};

#pragma pack ()







[LISTING FOUR]

/* smartchk.c-main source file for program smartchk.exe. Compile under small*
 * or tiny memory models in Microsoft C 6.00 and link with ioctl.obj */
#include <bios.h>
#include <dos.h>
#include <fcntl.h>
#include <process.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "ioctl.h"
#include "smartdrv.h"
#define AND &&
#define NOT !
#define OR 

/**** Function prototypes ***/
void get_configuration (void);
void show_configuration (void);
char ** get_range (int, char **);
unsigned is_str_zero (char *);
void format_command (char *, char **);
void run_test (char *);
int yes_or_no (void);
int convert_to_seconds (long, long *, int *);
void resize_cache (unsigned, unsigned);
void reset_cache (void);
void increment_cache (void);

void set_traps (void);
void quit (char *);
void cleanup (void);
void _far CtrlC_trap (void);
int _far critfail (void);

/*** Data ***/
/* The string describing the program's syntax */
const char syntax [] = "\
\nGathers information about the SMARTDrive disk cache.\
\n\
\nSMARTCHK [/min [/max [/inc]]] command [arguments]\
\n\
\n times the execution of the designated command,\
\n using different sizes for the SMARTDrive cache\
\n\
\n min, max and inc should be multiples of 16KB\
\n\
\nType SMARTCHK without parameters to display information\
\nabout SMARTDrive's configuration and performance.";
/* Structures for SMARTDrive IOCTL */

struct SD_READ sd_read;
struct SD_WRITE sd_write;
/* Various pieces of data which must be shared between functions */
char terminating = 0;
char sd_cache_changed = 0;
unsigned sd_cache_size;
int sd_handle = -1;
unsigned min;
unsigned max;
unsigned inc;

/*** Code ***/
int main (register int argc, register char *argv [])
{
 char buffer [128];
 /* Install clean up routines and exception handlers to ensure termination. */
 set_traps ();
 /* Verify that SMARTDrive has been installed and perform an IOCTL Read to
 obtain configuration data. Error in this operation is fatal to program. */
 get_configuration ();
 /* If no argument has been supplied on the command line, then describe the
 configuration & report SMARTDrive's statistical estimates of its performance.
 If the only argument on the command line is "/?", then display a help
 message. The most complicated option involves executing and timing a command
 repeatedly, using different sizes for the SMARTDrive cache. Up to 3 command
 line arguments may specify range of cache size to use in test-all arguments
 after these are presumed to be part of the command and are formatted
 into a buffer for passing to the system () function. */
 argv ++;
 argc --;
 if (NOT argc) show_configuration ();
 else if (argc == 1 AND **argv == '/' AND *(*argv + 1) == '?'
 AND *(*argv + 2) == '\0') printf (syntax);
 else {
 argv = get_range (argc, argv);
 format_command (buffer, argv);
 run_test (buffer);
 }
 exit (0);
}
void get_configuration (void)
{
 register unsigned exitcode = 0;
 unsigned status;
 unsigned count;
 /* If smartdrv.sys has been loaded, it may be found in memory as a device
 driver with name SMARTAAR. Use _dos_open (), which is simply a front-end to
 the int 21h function 3Dh. */
 exitcode = _dos_open ("SMARTAAR", O_RDWR, &sd_handle);
 /* On success, call int 21h function 4400h (not supported in MS-C library)
 to verify that the handle corresponds to a character device driver capable
 of IOCTL. DOS returns a 16-bit flag whose low byte it takes from the System
 File Table and high byte from the device driver attribute word. Status bit
 for a device is masked by 0x0080 and for IOCTL support by 0x4000 - both
 bits must be set. */
 if (NOT exitcode) {
 exitcode = _dos_gethandlestatus (sd_handle, &status);
 if (NOT exitcode) {
 if ((status & 0x4080) != 0x4080) exitcode = 0xFFFF;

 }
 }
 /* Any error encountered so far can be explained by a common message. Note
 that all errors occurring in this function are fatal to the program. */
 if (exitcode) quit ("cannot open SMARTDrive device");
 /* Perform IOCTL Read to get configuration info about SMARTDrive. Not only
 should read be successful but should return as many bytes as requested. */
 exitcode = _dos_ioctlread (sd_handle, &sd_read, sizeof (sd_read), &count);
 if (exitcode OR count != sizeof (sd_read))
 quit ("cannot read data from SMARTDrive device");
}
void show_configuration (void)
{
 printf ("\nSMARTDrive Version %u.%02u has been configured to use %uKB",
 sd_read.MajorVersion, sd_read.MinorVersion,
 sd_read.ConfiguredSize << 4);
 printf ("\nof %s memory with %uKB set as the minimum size.",
 (sd_read.MemoryType == 1 ? "extended" : "expanded"),
 sd_read.MinimumSize << 4);
 printf ("\n\nIts present capacity is %uKB, corresponding to %u tracks.",
 sd_read.CurrentSize << 4, sd_read.TracksInCache);
 printf ("\nOf these, %u tracks are in use.", sd_read.CurrentTracks);
 printf ("\n\nDuring this session, %u%% of sector reads have been filled",
 sd_read.HitRatio);
 printf ("\nfrom the cache and %u%% from the intermediate buffer.",
 sd_read.BufferRatio);
}
char ** get_range (int argc, register char *argv [])
{
 register unsigned temp;
 /* Getting numerical arguments for the testing range is a little tedious,
 but unavoidable if program is to be anything other than a pointless toy.
 This function returns a value for the argv variable, advanced past any
 arguments that specify range. Begin by setting default values for range.
 Change these only after establishing validity of command line value. */
 min = sd_read.MinimumSize;
 max = sd_read.ConfiguredSize;
 inc = 2;
 if (**argv == '/') {
 temp = atoi (*argv + 1) >> 4;
 if (temp == 0 OR temp < min OR temp >= max) {
 if (NOT is_str_zero (*argv + 1)) quit ("invalid minimum size");
 }
 min = temp;
 argv ++;
 argc --;
 if (argc AND **argv == '/') {
 temp = atoi (*argv + 1) >> 4;
 if (temp <= min OR temp > max) quit ("invalid maximum size");
 max = temp;
 argv ++;
 argc --;
 if (argc AND **argv == '/') {
 temp = atoi (*argv + 1);
 if (NOT temp OR temp & 15 OR (temp >> 4 > max - min))
 quit ("invalid increment");
 inc = temp >> 4;
 argv ++;
 argc --;

 if (min == 0 AND inc == 1)
 quit ("parameters rejected to avoid SMARTDrive bug!");
 }
 }
 }
 /* Name of a command to execute is mandatory. If no arguments remain,
 processing can't continue. Otherwise, return adjusted value for argv. */
 if (NOT argc) quit ("program name not supplied");
 return (argv);
}
/* The following simple function returns a TRUE value iff the string at
address ptr is composed entirely of the character '0'. */
unsigned is_str_zero (register char *ptr)
{
 unsigned ch;
 while ((ch = *ptr ++) AND (ch == '0')) {
 }
 return (ch ? 0 : 1);
}
/* Given an array of pointers to character strings, the following function
concatenates all strings, separating them with spaces and copying them to
specified buffer. Its role is to piece together a command string for system ()
function. Note that it could be developed to strip double-quotes. */
void format_command (register char *buffer, char *argv [])
{
 register char *ptr;
 while (ptr = *argv ++) {
 while (*buffer ++ = *ptr ++) {
 }
 *(buffer - 1) = ' ';
 }
 *buffer = '\0';
}
void run_test (char *command_string)
{
 long start, end;
 long seconds;
 unsigned hundredths;
 /* Shrink the cache to the range's minimum. */
 resize_cache (min, sd_read.CurrentSize);
 sd_cache_size = min;
 sd_cache_changed = -1;
 for (;;) {
 /* Before executing the command, flush the disk cache and discard its
 contents. Each test is therefore started under the same conditions (at
 least with respect to SMARTDrive, but note that test is not entirely
 fair, because data from disk is also held in DOS BUFFERS system). */
 reset_cache ();
 /* Use BIOS functions to get the system clock count, in spite of slight
 problem with the "midnight" flag. Calling MS-C library functions to obtain
 the time and convert the difference to seconds brings in floating point
 arithmetic & a great increase in code size if using an emulator library. */
 _bios_timeofday (_TIME_GETCLOCK, &start);
 system (command_string);
 _bios_timeofday (_TIME_GETCLOCK, &end);
 convert_to_seconds (end - start, &seconds, &hundredths);
 /* Report the cache size and execution time of the test program,
 then give the user a chance to leave the testing cycle. */
 printf ("\n\nExecution time with %uKB disk cache was %lu.%02u seconds",

 sd_cache_size << 4, seconds, hundredths);
 printf ("\nContinue (y/n)? ");
 if (NOT yes_or_no ()) break;
 printf ("\n");
 /* Increase the cache size for the next round of testing. */
 if (sd_cache_size + inc > max) break;
 increment_cache ();
 sd_cache_size += inc;
 }
}
/* The following is a simple function which waits at stdin, returning
a true value on receipt of 'y' or a false value for 'n'. */
int yes_or_no (void)
{
 unsigned ch;
 for (;;) {
 ch = getch ();
 if (NOT ch) getch ();
 else if (ch == 'N' OR ch == 'n') return (0);
 else if (ch == 'Y' OR ch == 'y') return (1);
 }
}
convert_to_seconds (long clocks, long *seconds, int *hundredths)
{
/* Clock ticks at approximately 18.2/second, most easily rounded to 91 / 5. */
 *seconds = (clocks * 5) / 91;
 *hundredths = (((clocks * 5) % 91) * 100) / 91;
}
/* === Cache manipulation ==== */
/* Next three functions perform IOCTL Write operations to change SMARTDrive
cache. In this particular program, error-reporting simplified by regarding all
errors as fatal. */
void resize_cache (unsigned new, unsigned old)
{
 unsigned exitcode, count;
 sd_write.command = 0x0B;
 sd_write.size = old - new;
 if (sd_write.size == 0) return;
 if (sd_write.size < 0) {
 sd_write.size = - sd_write.size;
 sd_write.command = 0x0C;
 }
 exitcode = _dos_ioctlwrite (sd_handle, &sd_write, 3, &count);
 if (exitcode OR count != 3) quit ("cannot resize SMARTDrive cache");
}
void reset_cache (void)
{
 unsigned exitcode, count;
 sd_write.command = 0x01;
 exitcode = _dos_ioctlwrite (sd_handle, &sd_write, 1, &count);
 if (exitcode OR count != 1) quit ("cannot flush SMARTDrive cache");
}
void increment_cache (void)
{
 unsigned exitcode, count;
 sd_write.command = 0x0C;
 sd_write.size = inc;
 exitcode = _dos_ioctlwrite (sd_handle, &sd_write, 3, &count);
 if (exitcode OR count != 3) quit ("cannot increase SMARTDrive cache");

}
/* === Code relating to termination - error messages and cleanup === */
void set_traps (void)
{
 /* Direct C run-time to call a cleanup routine before it exits program. */
 atexit (cleanup);
 /* Trap Ctrl-C to ensure proper cleanup rather than let DOS terminate
 program pre-emptively. Critical errors (which might also cause premature
 termination) may be failed automatically. An advantage to using _harderr ()
 is that it allows compilation under tiny memory model. Were error handler
 installed directly, it would have to be declared using the _interrupt
 keyword, which is (perhaps surprisingly) incompatible with tiny memory
 model. Even so, coercion to far addresses will generate unwanted segment
 references in the tiny memory model without some additional manipulation.*/
 #define FAR_FUNCTION (void (_far *)())
 #define FAR_INTERRUPT (void (_interrupt _far *)())
 #define CODE_SEG (void _based (_segname ("_CODE")) *)
 _dos_setvect (0x23, FAR_INTERRUPT CODE_SEG CtrlC_trap);
 _harderr (FAR_FUNCTION CODE_SEG critfail);
}
void cleanup (void)
{
 terminating = 1;
 printf ("\n");
 if (sd_handle != -1) {
 if (sd_cache_changed) {
 sd_cache_changed = 0;
 reset_cache ();
 resize_cache (sd_read.CurrentSize, sd_cache_size);
 }
 _dos_close (sd_handle);
 sd_handle = -1;
 }
}
void _far CtrlC_trap (void)
{
 if (NOT terminating) {
 terminating ++;
 quit ("terminated by user");
 }
}
int _far critfail (void)
{
 return (_HARDERR_FAIL);
}
void quit (char *errmsg)
{
 printf ("\nUnable to continue - \n%s", errmsg);
 exit (0xFF);
}








































































January, 1992
PROGRAMMING PARADIGMS


Still Running Light




Michael Swaine


Not many people got in on the microcomputer revolution as early as Tom
Pittman. His first computer, on which he ran a business, used an Intel 4004
processor, back before Intel discovered the numeral 8. Tom picked up this
magazine in 1976 when its subtitle was Running Light Without Overbyte and its
sole mission was to show programmers how to implement something called "Tiny
Basic" on the microcomputers just arriving on the scene. Tom wrote one of the
first implementations of Tiny Basic. He's also a sought-after embedded system
developer. Recent vice-chair of the US delegation to the ISO working group on
Modula-2. Ex-university lecturer in computer science. President, janitor, and
sole proprietor of Itty Bitty Computers. And the author of a product that he
expects us to believe can turn HyperCard into a serious software development
environment and HyperTalk into a serious competitor for C as a Macintosh
programming language. He's serious.
DDJ: Tom, for someone with a personal stake in HyperCard's survival, you've
been saying some unnerving things on-line about the future of
HyperCard--unnerving to HyperCard fanatics, anyway.
TP: Well, yeah. Apple doesn't know what HyperCard is, so they're trying to
make it something that it isn't.
DDJ: You've actually laid out a timetable for HyperCard's demise.
TP: Oh, you saw that? Well, it's kind of a guess, but I figured last year when
Apple was transferring it to Claris, that's going to kill HyperCard. At that
time I wasn't real sure about the timetable, but it's a little more clear in
my head now. The first step is that Claris gets control, and then that Claris
gets ownership control, which means that they can cut off the bundled
HyperCard. I expected that to happen already, but I think it's going to happen
in the next year. Claris is going to say, we can't make money selling a
product that Apple is giving away free. So sometime in the next year you can
expect Apple to stop bundling the same HyperCard that Claris sells.
DDJ: They tried to do that already.
TP: They tried to do that, but with the hue and cry, they let people know how
to get around it. I don't know exactly where the decision was made. But
Apple--or Claris--didn't expect, I think, the negative reaction.
DDJ: What happens next?
TP: Sales will fall off, and Claris will discover that it's not making money
like they thought. So they'll raise the price.
DDJ: And when will HyperCard finally give up the ghost?
TP: I would guess that two to three years after it stops being free in the box
and a year or two after they raise the price, Claris is going to discontinue
it because it won't be a profitable product.
DDJ: What does profitable mean?
TP: It won't make money like MacWrite, like a spreadsheet, like MacDraw. And
they'll decide it's not worth the effort to support it.
DDJ: The first steps have already taken place, of course. Is there an
empirical basis for the rest of your conclusions?
TP: Well, you can hear the gears grinding in there, trying to get Apple to
stop bundling it. Things like hiding the scripting buttons. I think that part
is almost surely going to get more rigid. I think they're going to do their
very darndest to make sure that the HyperCard that Apple is bundling does not
have scripting capability. At all. Whether they do a runtime version or stop
bundling it entirely. I don't think they'll stop bundling it entirely.
DDJ: Doesn't your talking about this sort of undermine your own business?
TP: Yeah. I guess it was a year ago, I went out on a limb and said that I was
out to make HyperTalk the programming language of choice for Macintosh
programming. I'm kind of on the way to trying to do that, but it doesn't have
the rosy, profitable future that I once thought.
DDJ: I'm a CompileIt fan. When I saw the first version I thought it was a neat
idea, a compiler for HyperTalk. Very slow, though.
TP: It was awful.
DDJ: But the second version, which I recently reviewed for another magazine,
seems to have the depth and power of a real programming language, and yet
anyone who knows a little bit about HyperTalk can get into it. It seems to me
you've taken some of the jaggies out of that learning curve. I'm using it
regularly now to develop External Commands (XCMDs).
TP: My goal in life, ever since 1975, was empowerment. I like Albrecht's
phrase, "computer power to the people." I guess I'm a bit of a populist, and
I've never really lost that. I would like to give computer power to the common
person. I always saw myself as that even when I was in high school ten years
earlier.
DDJ: You weren't a computer kid?
TP: When I was in high school, I was going to build my own computer. I
couldn't afford to buy one, so I was going to build one. I accumulated 200
dual triodes and dozens of filament transformers. I had no idea how to build a
computer. But when the microprocessor came out in 1972, I jumped on it. I had,
in 1972, a microprocessor on my desk. A computer. By 1975 I had a disk
operating system and not exactly a WYSIWYG screen editor, but certainly not
TECO, which was the competition.
DDJ: Your own editor?
TP: My own editor. It was kind of WYSIWYG in the sense that I could see the
fragment of the file that it was looking at. That was the editor that I used
when I maintained the Homebrew Computer Club mailing list.
DDJ: On your 1972-vintage microcomputer.
TP: Lee Felsenstein would get up at the Homebrew meetings and ask, how many of
you have this type of computer, and how many have this type. And then just for
laughs he would ask, and how many run 4004. Because that's what I was running.
Most people there were using paper tape on 8080s and I had a disk operating
system on a 4004. But I could do the Homebrew mailing list because I could
sort it on my computer.
DDJ: So far as I know, there was no commercial 4004 computer. Where did you
get this thing?
TP: When the microcomputer first came out, I signed up for a free seminar on
programming it. That was in April of '72.
DDJ: At Intel?
TP: At Intel, yeah. I asked, do you have anything that runs resident, and they
said no. I said, if somebody wrote one, would you buy it? They said, come up
and talk with us afterward. So after the seminar I went up there and they
said, how about if we trade you a computer for it? And I thought, you're in!
So I wrote a resident assembler. Unfortunately I only had 1K; four 256-byte
erasable ROMs. But I succeeded, and they continued to use it for several
years.
DDJ: Tell me how you came to write your Tiny Basic.
TP: Bob Albrecht gave a talk on Basic at a computer conference. I went up
afterward, and he pointed me to the Homebrew Computer Club. I was at that
first meeting in Gordon French's garage. Then when the "Build Your Own Basic"
article came out, I think it was written by Dennis Allison, I really grooved
on that. Because I understood writing interpreters, I had done that, but for
the first time I understood how a parser worked. So I wrote an underlying
interpreter in assembly language for that metalanguage. I think I was the only
one who took that approach, but I wasn't the first one [to write a Tiny
Basic].
DDJ: Arnold and Whipple, down in Texas, may have that honor. Of course there
was that Harvard dropout down in Albuquerque who was heavily into Basic
interpreters then, too.
TP: The Microsoft Basic paper tape.
DDJ: The Homebrew Club sort of unilaterally lowered the price of Microsoft
Basic.
TP: Everybody was whining about the $150 price. So I got up and said, if it
was $5, would you buy it?
DDJ: Copies were made. Bill got mad.
TP: Actually, those copies were legal. Bill Gates had forgotten to include a
copyright notice, and under the copyright law at that time, they were not
copyrighted. Bill Gates never made that mistake again. But those copies were
perfectly legal. Immoral perhaps, but perfectly legal. They also made
Microsoft a software giant.
DDJ: Tiny Basic didn't do quite as well for you.
TP: I announced that I had it, and got no response. I was very disappointed.
But BYTE magazine ran a little notice about it in the back of the magazine,
and the first weekend after the issue came out, I got 50 calls. And for
several years, while it never supported me, it made a substantial contribution
to my income.
DDJ: At $5 a copy?
TP: At $5 a copy.
DDJ: I know you went back to school to study computer science and taught it
for a while at Kansas State. Where was it that the Mac came into the picture?
TP: I needed all these symbols in my dissertation and just about then the
Macintosh came out, and the fonts were all software. I said, that's the tool I
need. My thesis advisor lent me his. He never got it back.
DDJ: Did it have any effect on your programming?

TP: I had always written programs on paper. The Mac was the first time I
actually wrote software online.
DDJ: What was your first Macintosh program?
TP: AutoBlank. The screen blanker. I needed a screen blanker and I don't wear
a watch, so I said, why not bounce an analog clock around on the screen?
DDJ: You wrote it in assembly language?
TP: I had a Tiny Pascal compiler that I wrote on a project before the Mac came
out, for a 68000 machine. The project never went anywhere but I had this Tiny
Pascal compiler, so I adapted it to produce Macintosh code. I had the first
resident Pascal compiler in the world for the Macintosh. I never sold it, but
that's what I wrote AutoBlank with.
DDJ: Why did you decide to write CompileIt?
TP: Mac World ran this SuperStacks contest. I thought, HyperCard, that's a
programming environment. Bill Atkinson understood what it was, but Apple was
not willing to admit that was what it was. So I said, let's do a stack that
proves that this is a programming language. Let's do Tiny Basic in HyperTalk.
Then I said, wait a minute, that's silly. Why don't we compile HyperTalk? So
that's where the idea came from. I worked furiously on it from the time the
contest was announced to the submission deadline, and I submitted it. It was
working imperfectly, but it was working. And the philistines in San Francisco
didn't understand it at all. I didn't win. I was really disappointed. I'd put
so much work into it. So I called up Heizer Software, and I've worked with
Brian Molyneaux since summer '88 to turn this thing into a product.
DDJ: I had intended to ask you about programming languages and environments.
It sounds like you use assembly language and HyperCard. Anything else?
TP: I got started in Pascal when I got the UCSD P-system for a Z80 computer I
got in exchange for some work I did for Zilog. I was always doing assembly
language for the target code to make money, because that was what I was good
at, writing high-performance assembly language code.
DDJ: What are you using now?
TP: Professionally now I'm using exclusively Modula-2, which is a pretty good
language. The standards people are turning it into Ada. What they're doing is
a travesty of Modula-2. I've gotten into that. So when I program for the Mac
it's HyperCard, and otherwise it's Modula-2.
DDJ: Never C?
TP: C is interesting. It's kind of a high-level assembler for the PDP-11. You
can see that in the operators; they're all PDP-11 operators. When you get to
machines that are different from the PDP- 11, it gets harder and harder to
write a good C compiler. But because it's such a low-level language,
programmers think they can write better code with it. So compiler writers are
doing hand-stands and jumping through hoops to try and get C code efficiency
up. It's interesting, because in every benchmark I've ever seen comparing C
and Pascal Pascal wins.
DDJ: Always?
TP: Always. And every time, C programmers express disbelief or surprise at the
fact. The problem is that C is such a low-level language that programmer are
doing these hacks to give them superefficient code, and the compiler has to
undo all those optimizations to do what it really should do in that processor.
Whereas in a high-level language like Pascal or Modula-2, the compiler can
start with the more abstract representation and get to the efficient code much
more quickly. So I ask, why should I use a brain-dead language like C?
DDJ: C isn't an altogether stationary target; it is evolving. Is C++ a move in
a direction you like?
TP: Yeah. Well, eventually C programmers are going to recognize the failings
of C. C++ has more type management in it, and ANSI C has gone that way. And
these variant implementations of C like Neuron C and the Echelon system are
much more strongly typed than vanilla C. Pascal is awful. It's rigid and you
have to think about your problem a lot harder in order to get it to compile
right. And people don't want to do it; they'd rather just hack the brush,
produce code, and then find the errors at debugging time. That ad for Symantec
C said, "Make errors faster." That's how they think! I'd rather not make
errors. And if it takes me longer to think about my program so that I can
compile it with fewer errors, I will actually get done faster. It's not a
programming preference issue, it's actually a productivity and quality issue.
DDJ: Spoken like someone who spent years writing programs with a pencil.
TP: Having the ability to look at how your code is running is extremely
productive and important. But it should never substitute for thinking about
your problem. I think one of the biggest advantages of object-oriented
programming is that it's so much harder to do the same thing that you have to
think a lot harder about the whole program before you do it. Now if
object-oriented programming could produce efficient code as well, I'd be for
it.
DDJ: Maybe you should tell me what kind of environment you have created in
CompileIt.
TP: In 1988 when I designed it, it was a toy. But I have turned it into the
programming environment I would want as a programmer. I've been writing
programs for 25 years, so I know what I like. I like the Lightspeed Pascal
environment, and I've imported as many of its ideas as I can into CompileIt.
DDJ: But no editor?
TP: An integrated editor is nice, but it's hard to get one that meets
everybody's preferences. I use HyperCard for formatting. You set the script of
button X to field 1, put the script of button X into field 1, and that formats
it. Now, debugging is one of the most powerful things to come out of
microcomputer software design, and I [wanted] a source - level debugger. It
took me a year to write and debug that, along with compiling the compiler,
which I did at the same time. I was having a terrible time getting it to work,
and then I realized, this isn't Pascal or Modula-2 in programming style, it's
C. I hope I never have to do anything that large again in that language. It's
great for doing little things. I saw an interview with one of the designers of
C that said C was not intended for large projects. Absolutely right.
DDJ: But you do use HyperTalk for your Mac programming?
TP: When I have little throwaway programs, I write them in HyperCard. When
something is too slow, I compile it, but HyperCard by itself is a great tool
for little personal programming projects, when you need to do a quick hack
job, reformat this text file, rearrange the data, or do a quick graph. I did a
project with reports and frequency graphs and data simulations and error
analysis, all in HyperCard. When a client wants a new interface, I mock it up
in HyperCard.
DDJ: In CompileIt, you've really opened up the ROM Toolbox.
TP: It was free. It was easy to put in. CompileIt may be the first compiler to
produce inline code for all of them, not just the Pascal-glued ones.
DDJ: What's next?
TP: About a year ago, as I said, I was working to make HyperTalk a language of
choice for programmers, an alternative to C. It's already there in terms of
XCMDs; C offers nothing over HyperTalk in doing XCMDs, in my opinion. I've
worked hard to make that true. But what about other kinds of things? I got to
thinking that it should be possible to do entire stand-alone double-clickable
applications in HyperTalk.
DDJ: I'll be interested in seeing that.
































January, 1992
C PROGRAMMING


D-Flat Edit Boxes


 This article contains the following executables: DFLAT9.ARC DF9TXT.ARC


Al Stevens


Last month I described the D-Flat TEXTBOX window class, the base class for
derived window classes that display text. The TEXTBOX class does not concern
itself with manipulation or meaning of the text that it stores. Rather, it
just provides the means to store the text, display it, and scroll and page
through it. Other window classes add functionality to the text, and they use
the base TEXTBOX class to support the low-level text management processes.
This month we discuss one such window class, the EDITBOX class.


The D-Flat EDITBOX Class


The EDITBOX class adds text editor functions to the TEXTBOX class. An EDITBOX
supports the user's entry and modification of text by processing keystrokes
and commands for text editing. Listing One, page 132, is editbox.c, the source
file that implements the EDITBOX window class. The program consists of a
window-processing module, named EditBoxProc, and several other functions that
process the D-Flat messages that EditBoxProc intercepts.
The CreateWindowMsg function processes the CREATE_WINDOW message, and it is
the first function you see in the source file. Most of the functions are named
similarly and have comments that identify the message that they process. Some
functions process subsets of the COMMAND message, and they have names and
comments that associate the function with the COMMAND. I'll address each
message and command and trust that you will be able to find the correct
function that processes it.
The CREATE_WINDOW message sets initial values into the window's edit box
fields. It establishes the default maximum text length, and sets up an initial
empty buffer. The SETTEXT message allows the application program to specify a
buffer of text for the edit box. The message makes sure that the length of the
text does not exceed the maximum length specified for the edit box. Then it
passes the message to the window-processing module for the TEXTBOX class via
the BaseWndProc function.
The ADDTEXT message adds a line of text to a TEXTBOX window. When the window
is an EDITBOX class as well, the message first checks to make sure that the
length of the buffer with the new text added will not exceed the maximum text
length for the window. Then the program passes the message to the TEXTBOX
class's window-processing module.
After that, if the EDITBOX is a single-line editbox--typical of many data
entry fields on dialog boxes--the program sets the current column pointer to
the end of the text and marks a block that spans all of the text in the edit
box. This block supports the CUA convention where single line edit boxes are
marked when the user first tabs to the field. Then, if the user begins typing,
the block is deleted. If the user moves the cursor first, the text remains,
and the block mark is removed.
The GETTEXT message copies the text from the edit box's text buffer into the
space pointed to by the message sender's first parameter.
The SETTEXTLENGTH message sets the maximum text length for an edit box.
Applications send the KEYBOARD_CURSOR message to move the cursor to a new
position. So do parts of D-Flat. The EDITBOX class itself moves the cursor
around by using the KEYBOARD_CURSOR message. Ultimately, the message gets into
the code in message.c, which I discussed last July, to physically move the
cursor. First, however, the EDITBOX class needs to update its pointers that
specify where the cursor is with respect to the text buffer and the window.
Then the program must make sure that the cursor will be in view--that the
window has the focus, is visible, and that the part of it where the cursor is
being moved is on screen, within the borders of any ancestor windows, and not
overlapped by another window. If all of these conditions are true, the program
sends the SHOW_CURSOR message to the system. Otherwise, it sends the
HIDE_CURSOR message.
The SETFOCUS, PAINT, and MOVE messages pass themselves to the base window
class and then send the window a KEYBOARD_CURSOR message. This process assures
that the cursor is set properly when an EDITBOX window gets painted or moved
and that it gets turned on when the window comes into focus and turned off
when the window leaves the focus.
The SIZE message passes itself to the base class and then makes sure that the
size operation did not cause the cursor position to go beyond the new window
borders. If it did, the program adjusts the cursor position to stay within the
window.
The SCROLL, HORIZSCROLL, SCROLLPAGE, and HORIZSCROLLPAGE messages first pass
themselves to the base class to perform the scrolling or paging operation. The
TEXTBOX class maintains variables that indicate the line of text in the buffer
at the top of the window and the column of text in the left margin. These
variables change as the text scrolls and pages horizontally and vertically.
Scrolling is moving the text within the window one line or character position
at a time. Paging is moving it one window height or width at a time. The
EDITBOX class has to intercept the scrolling and paging messages to keep the
keyboard cursor within the boundaries of the window. If the window scrolls the
keyboard cursor position off the window, these intercepts will reposition the
cursor so that it stays in view.
The LEFT_BUTTON message moves the keyboard cursor to where the mouse cursor is
positioned. If the user positions the mouse where there is no text--beyond the
end of a line or below the last line in the buffer, for example--the program
puts the keyboard cursor as close as possible on a valid character position.
If the program is in the process of marking text, the LEFT_BUTTON message has
no effect unless the mouse cursor is in the border. If so, the program scrolls
the window in one of four directions depending on which border the mouse is
in. Then it extends the marked block by one line or column to reflect the
movement. This process allows the user to mark text blocks with the mouse
where the block size exceeds the size of the window. By dragging the mouse
into a border and holding the button down, the user scrolls the window and
continues to mark the block.
The EDITBOX class receives the MOUSE_MOVED message whenever the user moves the
mouse and its cursor is inside the edit box window. If the mouse button is
down, the user is marking text in the edit box. The program establishes the
position of the mouse when the user pressed the button as the anchor point of
the block. Now, as long as the user holds the button and moves the mouse, the
block will be defined in the range from the anchor point to the present mouse
position. This message sends the MOUSE_TRAVEL message to restrict the mouse's
movements to inside the edit box window. The window will keep the mouse
restricted that way until the user releases the button.
The BUTTON_RELEASED message arrives when the user releases the mouse button.
If the user was marking text with the mouse, then the program resets the
text-marking mode, releases the restriction on the mouse travel, and adjusts
the block markers so that the lower line and column define the beginning of
the block.
The KEYBOARD message starts in the KeyboardMsg function and breaks down into
several more function calls. The message itself begins by ignoring all keys if
the user is moving or sizing the window or holding down the Alt key. Those
keys and certain Ctrl-key combinations will be processed by window classes
higher in the base class hierarchy.
The DoMultiLines function takes care of initiating the marking of a block. If
the edit box has the MULTILINE attribute, and the user is holding down a shift
key while typing a key that moves the cursor, and keyboard marking has not
begun, the program puts the window into text marking mode and sets the current
keyboard cursor position as the anchor point for the marked block.
The DoScrolling function processes all keys that will scroll or page through
the text. Some of these keys can be processed by the base TEXTBOX class.
Others are processed as the result of a combination of the base class and
functions unique to the EDITBOX class. The Home, End, NextWord, PrevWord,
Upward, Downward, Forward, and Backward functions move the keyboard cursor and
the text pointers. If there is a marked block, and the window is not in the
text marking mode--which means the user released the shift key or the mouse
button--this function unmarks the block.
If the KeyboardMsg function sees that the user is marking a block with the
keyboard and the DoScrolling function reported that it did indeed process a
scrolling or paging key, the KeyboardMsg function calls ExtendBlock to extend
the marked block to the new keyboard cursor location.
If DoScrolling reports that the key is not a scrolling or paging key, then
KeyboardMsg calls DoKeyStroke to process the typed key. DoKeyStroke processes
the Rubout, Del, Tab, Shift+Tab, and Enter keys. It moves the cursor to the
left one position for the Rubout key and then drops into the code to process
the Del key, which calls the DelKey function. The DelKey function deletes the
key from the text buffer at the position pointed to by the keyboard cursor.
DoKeyStroke calls the TabKey function to process the Tab and Shift+Tab keys.
For the tab key, the function sends KEYBOARD messages with either a space
character or the FWD character depending on the insert mode of the window. If
the edit box is single-line, the function passes the Tab or Shift+Tab
character to the window's parent. A dialog box parent will use this character
to move the focus out of the edit box control.
DoKeyStroke passes all other keys, which will be displayable values, to the
KeyTyped function. First, however, if the user types a key while a block is
marked, the program deletes the block to be replaced by the typed key. The
KeyTyped function inserts the key into the text buffer and performs the word
wrap operation when the keyboard cursor reaches the window's right margin.
The SHIFT_CHANGED message watches for the user to release the shift key while
marking a block of text with the keyboard. When that happens, the program
calls StopMarking to take the window out of text marking mode.
The COMMAND message processes commands to a window from other places. A window
typically receives a COMMAND message when the user executes a menu choice,
although nothing prevents any part of a program from sending a COMMAND
message. In fact, the EDITBOX class uses its own ID_DELETETEXT command to
delete blocks of text when the user does something outside the menu that would
cause text deletion.
The EDITBOX class processes five commands. All five are on the Edit menu in
the example Memopad application. The menu commands correspond to standard menu
items specified in the SAA/CUA specification. Whether or not your application
implements the menu in the same way, most edit boxes will need some or all of
these functions.
The ID_DELETETEXT command deletes a marked block of text. D-Flat maintains a
buffer of deleted text so that the user can undo the most recent deletion. The
ID_DELETETEXT command calls SaveDeletedText first to copy the marked block
into the undo buffer. Then it moves the text buffer from one past the end of
the block to the beginning of the block and updates the window's text
pointers.
The ID_CLEAR command is similar to ID_DELETETEXT command except that it
deletes the text, leaving an open space. The program must delete the lines in
the marked block, leaving the newline characters in place.
The ID_UNDO command inserts the contents of the deleted text buffer into the
active text buffer at the current cursor position.
The ID_PARAGRAPH command forms a new paragraph from the marked block if one
exists. If not, the command forms the paragraph from the current cursor
location to the end of the paragraph. Text in an edit box consists of lines
with terminating newline characters. A paragraph is a group of lines up to the
next blank line.
The CLOSE_WINDOW message hides the cursor and frees the deleted text undo
buffer if one exists.


How to Get D-Flat Now


The D-Flat source code is on CompuServe in Library O of the DDJ Forum and on
M&T's OnLine. The source code is in a file named DFLATn.ARC. The n is a
version number. A second file, DFnTXT.ARC, includes the following:
A README.DOC file that describes the changes and how to build the software
A calendar of the issues and the D-Flat subjects that my column addresses
The D-Flat Help system database
Documentation for the D-Flat API

D-Flat compiles with Turbo C 2.0, Borland C++ 2.0, Microsoft C 6.0, and Watcom
C 8.0. There are makefiles for the TC, MSC, and Watcom compilers. There is an
example program, the MEMOPAD program, a multiple document notepad.
If you cannot use either online service, send me a formatted diskette -- 360K
or 720K -- and an addressed, stamped diskette mailer. Send it to me in care of
DDJ, 501 Galveston Drive, Redwood City, CA 94063. I'll send you the latest
copy of the library. The software is free, but if you care to, stick a dollar
bill in the mailer for the Brevard County Food Bank. They take care of
homeless and hungry families. We've collected about $500 so far from generous
D-Flat "careware" users. I took a pile of money over there today. They are
very grateful.
If you want to discuss D-Flat with me, use CompuServe. My CompuServe ID is
71101,1262, and I monitor DDJ Forum daily.


Cheap Editor


Fast, good, cheap. Pick any two. Remember that? Well, there's a word processor
that delivers all three. When I find a bargain, I want to share it. The VDE
word processor/text editor is a bargain. For companies it's cheap. For
individuals it's free. You can download it from Library 1 of the IBMAPP forum
on CompuServe. Its file name is VDE161.ZIP. I'm using VDE to write this
column. I have always wanted a word processor that devotes the entire screen
to the text, is configurable, programmable, fast, small, and inexpensive. VDE
is all of that. There is nothing that I want to do with text that VDE will not
do. It is small and fast because it does not include all the dings and toots
of those so-called full-featured behemoth word processors. VDE is not a
WYSIWYG desktop publisher. It does not do graphics. There is no integrated
spell checker or thesaurus. It is not a Windows app. So, just what is it? It
is simply the best DOS text-based word processor I have ever seen, and I've
tried most of them.
You can make VDE emulate the commands of several word processors. Its default
mode uses the old WordStar command keys. Many PC users, particularly the
old-timers, have those commands burned into their brains. WordStar was a
staple in the Wonder years. Even though I haven't used it for a long time, my
fingers leapt immediately to Ctrl-KD, Ctrl-QF, and all the other commands. You
forget that you liked them, that you ever knew them. They're comfortable, like
an old pair of slippers, a 1957 Tri-Pacer, or Uncle Jim's tobacco-reeking
leather Morris chair.
You don't pays any money but you still gets your choice--about what goes on
the screen, for example. I prefer a screen with nothing but text, but if you
like a ruler line and a status bar that tells you the file name, cursor
position, and other stuff, you can have them. You can have a screen border,
too. When the old peepers get tired, you can switch into a VGA 20-line mode.
For high-density text, you can have 28, 33, 40, 50, or 57 lines and 132
columns--your choice of colors, of course.
VDE will work with the file formats of WordStar, Word Perfect, MS Word,
XWrite, as well as with ASCII text. It supports a two-window split screen, and
you can have several files in memory at once. Everything it does, it does
fast. It has returned validity to my old 4.77-MHz, 640K, one-diskette, T1000
laptop.
VDE's author is Eric Meyer. He wrote VDE in assembly language and maintains it
as freely distributed shareware that individuals may use without charge. You
can register for $30 if you want, and get support and the latest version.
Companies can get site licenses that range from $2.10 down to $1 per user,
depending on how many users there are. You can't beat those prices. If I was
in charge at Microsoft or Word Perfect, I'd pay Meyer a million bucks just to
get this thing off the market and out of the competition. In case they want to
take my advice, or in case you want to send for a registered copy, here's the
address: Eric Meyer, 3541 Smuggler Way, Boulder, CO 80303 USA, CompuServe:
[74415,1305].
What's all that got to do with C? Well, VDE includes a C-language
configuration package that does C indenting, so VDE is more than acceptable as
a C programmer's editor.


The Standard C Library


The Standard C Library (1992, Prentice Hall) by P. J. Plauger is a new book
from a member of the ANSI X3J11 committee. From its title you might expect it
to be a complete reference to the standard C functions as defined by ANSI, but
it is not. I am not sure who the audience is for this book, so I will let you
decide if it is for you.
The Standard C Library describes and publishes the source code for the header
files, macros, and functions defined in standard C. In effect, it is a
C-library implementation published in book form. You could use the library if
you were building a new C compiler, but there are hardly enough new C compiler
builders to justify the cost of publishing a book. Most readers already have a
C compiler, and their compiler already has a library, so the book does not
bring to the typical programmer some useful and heretofore unavailable piece
of code.
So what is this book all about? For starters, it is a good study in how to
implement a library. Also, the implementation has good examples of some of the
more arcane features of C. I found the book useful for understanding some of
the functions that the ANSI standard does not clearly describe. There are
notes of historical interest throughout the book which describe the rationale
behind some of the decisions made by the committee. You can often read between
the lines and guess where the squeaky wheels prevailed over common sense. Even
though he is an active member of the ANSI committee and contributes
significantly to its work, Dr. Plauger pulls no punches when he addresses the
deficiencies and failures of the standard.
The book is organized into chapters dedicated to the ANSI header files and the
macros and functions defined and declared in each. The chapter on locale.h
addresses the subject of writing C programs that you must port among
international operating environments. This chapter is the best treatment of
that subject that I have ever seen and is worth the price of the book by
itself. So is the chapter on stdarg.h, which addresses functions that can
accept a variable number of arguments. If you ever wondered how printf does
its thing, this book is for you. If you ever wondered why scanf is not the
answer to all your input prayers, you'll find out why here. There are places
where more explanations would help. For example, the book does not explain why
the implementation includes functions as well as masking macros in the header
files for many of the standard C functions. Some readers will not know that
older C programs often failed to include the header files, and because
standard C compilers must compile old programs, they must provide the
functions even when the macros are more efficient. The chapter on setjmp and
longjmp is the weakest with respect to its code, and the author admits as
much, telling you not to use the "grubby" code, which serves only to describe
what a more platform-dependent implementation should do.
My most serious criticism of the book is that there are too many typographical
errors. I can only guess that the book is a symptom of the rush-to-publish
syndrome from which most managing editors and all publishers suffer. It is an
author's responsibility to resist and counterbalance that pressure, and
Plauger failed in that effort. Most of the errors that I found would be caught
during a casual proof reading of the book, never mind the intense copy editing
that most works receive.
There are some errors in the code examples that accompany the text, further
indication of poor or no proof reading. I did not test the library code, so I
cannot comment on its quality except to say that the brace indenting and
placement style is not my favorite. You can get a diskette for $49.95 by using
an order form in the back of the book. That's twice the usual cost of a
companion diskette. The author claims to have validated the code with a
validation suite and compiled and executed it with a number of compilers from
UNIX, DOS, and other platforms. I believe that claim, but because of the
surplus of publishing errors in the text, I would not trust the printed code
enough to key it in. Get the diskette if you want the code. You should know,
however, that if you compile a program that uses the code, the silly terms of
the copyright require you to insert a string in your executable module that
gives credit to Plauger and Prentice-Hall.


Why I'm Glad My Name Isn't Pee Wee


If you've been watching the news, you've seen a recent item from my home
state, Florida. Some irate parents video-taped a couple making love in view of
the neighborhood and its children. The couple was arrested and have since been
on Donahue, in all the papers, and on the 6 o'clock national news. The
gentleman being prosecuted is a fellow Floridian named Al Stevens. I don't
know what he does when he isn't performing for the neighbors, but please be
advised that he DOES NOT WRITE THIS COLUMN!


The Ascent of Language


On the whole, I like DOS 5.0--now that I've got it properly installed. I
think, however, that once you've gone through the process, the experience can
add to the programming languages listed on your resume. The two new ones are
CONFIG.SYS and AUTOEXEC.BAT, which have now become slightly more difficult to
master than APL. Here are some of the keywords that DOS 5.0 adds to our
lexicon: high, loadhigh, himem, devicehigh, umb, hma, setver, fastopen,
wina2O.386, temp, smartdrv, doskey, and emm386.
Judy makes kitchen samplers with poems and quotes reminiscent of her
Pennsylvania Dutch origins. I think I'll ask her to cross-stitch this one:
My patience is all, The features are yet, The opener the architecture, The
behinder I get.

_C PROGRAMMING COLUMN_
by Al Stevens


[LISTING ONE]

/* ------------- editbox.c ------------ */
#include "dflat.h"

#define EditBufLen(wnd) (isMultiLine(wnd) ? EDITLEN : ENTRYLEN)
#define SetLinePointer(wnd, ln) (wnd->CurrLine = ln)
#define isWhite(c) ((c)==' '(c)=='\n')
/* ---------- local prototypes ----------- */
static void SaveDeletedText(WINDOW, char *, int);
static void Forward(WINDOW);
static void Backward(WINDOW);
static void End(WINDOW);
static void Home(WINDOW);
static void Downward(WINDOW);

static void Upward(WINDOW);
static void StickEnd(WINDOW);
static void NextWord(WINDOW);
static void PrevWord(WINDOW);
static void ResetEditBox(WINDOW);
static void ModTextPointers(WINDOW, int, int);
/* -------- local variables -------- */
static int KeyBoardMarking, ButtonDown;
static int TextMarking;
static int ButtonX, ButtonY;
static int PrevY = -1;
/* ----------- CREATE_WINDOW Message ---------- */
static int CreateWindowMsg(WINDOW wnd)
{
 int rtn = BaseWndProc(EDITBOX, wnd, CREATE_WINDOW, 0, 0);
 wnd->MaxTextLength = MAXTEXTLEN+1;
 wnd->textlen = EditBufLen(wnd);
 wnd->InsertMode = TRUE;
 ResetEditBox(wnd);
 return rtn;
}
/* ----------- SETTEXT Message ---------- */
static int SetTextMsg(WINDOW wnd, PARAM p1)
{
 int rtn = FALSE;
 if (strlen((char *)p1) <= wnd->MaxTextLength) {
 rtn = BaseWndProc(EDITBOX, wnd, SETTEXT, p1, 0);
 wnd->CurrLine = 0;
 }
 return rtn;
}
/* ----------- ADDTEXT Message ---------- */
static int AddTextMsg(WINDOW wnd, PARAM p1, PARAM p2)
{
 int rtn = FALSE;
 if (strlen((char *)p1)+wnd->textlen <= wnd->MaxTextLength) {
 rtn = BaseWndProc(EDITBOX, wnd, ADDTEXT, p1, p2);
 if (rtn != FALSE) {
 if (!isMultiLine(wnd)) {
 wnd->CurrLine = 0;
 wnd->CurrCol = strlen((char *)p1);
 if (wnd->CurrCol >= ClientWidth(wnd)) {
 wnd->wleft = wnd->CurrCol-ClientWidth(wnd);
 wnd->CurrCol -= wnd->wleft;
 }
 wnd->BlkEndCol = wnd->CurrCol;
 SendMessage(wnd, KEYBOARD_CURSOR,
 WndCol, wnd->WndRow);
 }
 }
 }
 return rtn;
}
/* ----------- GETTEXT Message ---------- */
static int GetTextMsg(WINDOW wnd, PARAM p1, PARAM p2)
{
 char *cp1 = (char *)p1;
 char *cp2 = wnd->text;
 if (cp2 != NULL) {

 while (p2-- && *cp2 && *cp2 != '\n')
 *cp1++ = *cp2++;
 *cp1 = '\0';
 return TRUE;
 }
 return FALSE;
}
/* ----------- SETTEXTLENGTH Message ---------- */
static int SetTextLengthMsg(WINDOW wnd, unsigned int len)
{
 if (++len < MAXTEXTLEN) {
 wnd->MaxTextLength = len;
 if (len < wnd->textlen) {
 if ((wnd->text=realloc(wnd->text, len+2)) != NULL) {
 wnd->textlen = len;
 *((wnd->text)+len) = '\0';
 *((wnd->text)+len+1) = '\0';
 BuildTextPointers(wnd);
 }
 }
 return TRUE;
 }
 return FALSE;
}
/* ----------- KEYBOARD_CURSOR Message ---------- */
static int KeyboardCursorMsg(WINDOW wnd, PARAM p1, PARAM p2)
{
 int rtn;
 wnd->CurrCol = (int)p1 + wnd->wleft;
 wnd->WndRow = (int)p2;
 wnd->CurrLine = (int)p2 + wnd->wtop;
 rtn = BaseWndProc(EDITBOX, wnd, KEYBOARD_CURSOR, p1, p2);
 if (wnd == inFocus && CharInView(wnd, (int)p1, (int)p2))
 SendMessage(NULL, SHOW_CURSOR, wnd->InsertMode, 0);
 else SendMessage(NULL, HIDE_CURSOR, 0, 0);
 return rtn;
}
/* ----------- SIZE Message ---------- */
int SizeMsg(WINDOW wnd, PARAM p1, PARAM p2)
{
 int rtn = BaseWndProc(EDITBOX, wnd, SIZE, p1, p2);
 if (WndCol > ClientWidth(wnd)-1)
 wnd->CurrCol = ClientWidth(wnd)-1 + wnd->wleft;
 if (wnd->WndRow > ClientHeight(wnd)-1) {
 wnd->WndRow = ClientHeight(wnd)-1;
 SetLinePointer(wnd, wnd->WndRow+wnd->wtop);
 }
 SendMessage(wnd, KEYBOARD_CURSOR, WndCol, wnd->WndRow);
 return rtn;
}
/* ----------- SCROLL Message ---------- */
static int ScrollMsg(WINDOW wnd, PARAM p1)
{
 int rtn = FALSE;
 if (isMultiLine(wnd)) {
 rtn = BaseWndProc(EDITBOX,wnd,SCROLL,p1,0);
 if (rtn != FALSE) {
 if (p1) {
 /* -------- scrolling up --------- */

 if (wnd->WndRow == 0) {
 wnd->CurrLine++;
 StickEnd(wnd);
 }
 else
 --wnd->WndRow;
 }
 else {
 /* -------- scrolling down --------- */
 if (wnd->WndRow == ClientHeight(wnd)-1) {
 if (wnd->CurrLine > 0)
 --wnd->CurrLine;
 StickEnd(wnd);
 }
 else
 wnd->WndRow++;
 }
 SendMessage(wnd,KEYBOARD_CURSOR,WndCol,wnd->WndRow);
 }
 }
 return rtn;
}
/* ----------- HORIZSCROLL Message ---------- */
static int HorizScrollMsg(WINDOW wnd, PARAM p1)
{
 int rtn = FALSE;
 char *currchar = CurrChar;
 if (!(p1 &&
 wnd->CurrCol == wnd->wleft && *currchar == '\n')) {
 rtn = BaseWndProc(EDITBOX, wnd, HORIZSCROLL, p1, 0);
 if (rtn != FALSE) {
 if (wnd->CurrCol < wnd->wleft)
 wnd->CurrCol++;
 else if (WndCol == ClientWidth(wnd))
 --wnd->CurrCol;
 SendMessage(wnd,KEYBOARD_CURSOR,WndCol,wnd->WndRow);
 }
 }
 return rtn;
}
/* ----------- SCROLLPAGE Message ---------- */
static int ScrollPageMsg(WINDOW wnd, PARAM p1)
{
 int rtn = FALSE;
 if (isMultiLine(wnd)) {
 rtn = BaseWndProc(EDITBOX, wnd, SCROLLPAGE, p1, 0);
 SetLinePointer(wnd, wnd->wtop+wnd->WndRow);
 StickEnd(wnd);
 SendMessage(wnd, KEYBOARD_CURSOR,WndCol, wnd->WndRow);
 }
 return rtn;
}
/* ----------- HORIZSCROLLPAGE Message ---------- */
static int HorizPageMsg(WINDOW wnd, PARAM p1)
{
 int rtn = BaseWndProc(EDITBOX, wnd, HORIZPAGE, p1, 0);
 if ((int) p1 == FALSE) {
 if (wnd->CurrCol > wnd->wleft+ClientWidth(wnd)-1)
 wnd->CurrCol = wnd->wleft+ClientWidth(wnd)-1;

 }
 else if (wnd->CurrCol < wnd->wleft)
 wnd->CurrCol = wnd->wleft;
 SendMessage(wnd, KEYBOARD_CURSOR, WndCol, wnd->WndRow);
 return rtn;
}
/* ----- Extend the marked block to the new x,y position ---- */
static void ExtendBlock(WINDOW wnd, int x, int y)
{
 int bbl, bel;
 int ptop = min(wnd->BlkBegLine, wnd->BlkEndLine);
 int pbot = max(wnd->BlkBegLine, wnd->BlkEndLine);
 char *lp = TextLine(wnd, wnd->wtop+y);
 int len = (int) (strchr(lp, '\n') - lp);
 x = min(x, len-wnd->wleft);
 wnd->BlkEndCol = x+wnd->wleft;
 wnd->BlkEndLine = y+wnd->wtop;
 bbl = min(wnd->BlkBegLine, wnd->BlkEndLine);
 bel = max(wnd->BlkBegLine, wnd->BlkEndLine);
 while (ptop < bbl) {
 WriteTextLine(wnd, NULL, ptop, FALSE);
 ptop++;
 }
 for (y = bbl; y <= bel; y++)
 WriteTextLine(wnd, NULL, y, FALSE);
 while (pbot > bel) {
 WriteTextLine(wnd, NULL, pbot, FALSE);
 --pbot;
 }
}
/* ----------- LEFT_BUTTON Message ---------- */
static int LeftButtonMsg(WINDOW wnd, PARAM p1, PARAM p2)
{
 int MouseX = (int) p1 - GetClientLeft(wnd);
 int MouseY = (int) p2 - GetClientTop(wnd);
 RECT rc = ClientRect(wnd);
 char *lp;
 int len;
 if (KeyBoardMarking)
 return TRUE;
 if (WindowMoving WindowSizing)
 return FALSE;
 if (isMultiLine(wnd)) {
 if (TextMarking) {
 if (!InsideRect(p1, p2, rc)) {
 if ((int)p1 == GetLeft(wnd))
 if (SendMessage(wnd, HORIZSCROLL, 0, 0))
 ExtendBlock(wnd, MouseX-1, MouseY);
 if ((int)p1 == GetRight(wnd))
 if (SendMessage(wnd, HORIZSCROLL, TRUE, 0))
 ExtendBlock(wnd, MouseX+1, MouseY);
 if ((int)p2 == GetTop(wnd))
 if (SendMessage(wnd, SCROLL, FALSE, 0))
 ExtendBlock(wnd, MouseX, MouseY+1);
 if ((int)p2 == GetBottom(wnd))
 if (SendMessage(wnd, SCROLL, TRUE, 0))
 ExtendBlock(wnd, MouseX, MouseY-1);
 SendMessage(wnd, PAINT, 0, 0);
 }

 return TRUE;
 }
 if (!InsideRect(p1, p2, rc))
 return FALSE;
 if (TextBlockMarked(wnd)) {
 ClearTextBlock(wnd);
 SendMessage(wnd, PAINT, 0, 0);
 }
 if (wnd->wlines) {
 if (MouseY > wnd->wlines-1)
 return TRUE;
 lp = TextLine(wnd, MouseY+wnd->wtop);
 len = (int) (strchr(lp, '\n') - lp);
 MouseX = min(MouseX, len);
 if (MouseX < wnd->wleft) {
 MouseX = 0;
 SendMessage(wnd, KEYBOARD, HOME, 0);
 }
 ButtonDown = TRUE;
 ButtonX = MouseX;
 ButtonY = MouseY;
 }
 else
 MouseX = MouseY = 0;
 wnd->WndRow = MouseY;
 SetLinePointer(wnd, MouseY+wnd->wtop);
 }
 if (isMultiLine(wnd) 
 (!TextBlockMarked(wnd)
 && MouseX+wnd->wleft < strlen(wnd->text)))
 wnd->CurrCol = MouseX+wnd->wleft;
 SendMessage(wnd, KEYBOARD_CURSOR, WndCol, wnd->WndRow);
 return TRUE;
}
/* ----------- MOUSE_MOVED Message ---------- */
static int MouseMovedMsg(WINDOW wnd, PARAM p1, PARAM p2)
{
 int MouseX = (int) p1 - GetClientLeft(wnd);
 int MouseY = (int) p2 - GetClientTop(wnd);
 RECT rc = ClientRect(wnd);
 if (!InsideRect(p1, p2, rc))
 return FALSE;
 if (MouseY > wnd->wlines-1)
 return FALSE;
 if (ButtonDown) {
 SetAnchor(wnd, ButtonX+wnd->wleft, ButtonY+wnd->wtop);
 TextMarking = TRUE;
 SendMessage(NULL,MOUSE_TRAVEL,
 (PARAM)&WindowRect(wnd),0);
 ButtonDown = FALSE;
 }
 if (TextMarking && !(WindowMoving WindowSizing)) {
 ExtendBlock(wnd, MouseX, MouseY);
 return TRUE;
 }
 return FALSE;
}
static void StopMarking(WINDOW wnd)
{

 TextMarking = FALSE;
 if (wnd->BlkBegLine > wnd->BlkEndLine) {
 swap(wnd->BlkBegLine, wnd->BlkEndLine);
 swap(wnd->BlkBegCol, wnd->BlkEndCol);
 }
 if (wnd->BlkBegLine == wnd->BlkEndLine &&
 wnd->BlkBegCol > wnd->BlkEndCol)
 swap(wnd->BlkBegCol, wnd->BlkEndCol);
}
/* ----------- BUTTON_RELEASED Message ---------- */
static int ButtonReleasedMsg(WINDOW wnd)
{
 if (isMultiLine(wnd)) {
 ButtonDown = FALSE;
 if (TextMarking && !(WindowMoving WindowSizing)) {
 /* release the mouse ouside the edit box */
 SendMessage(NULL, MOUSE_TRAVEL, 0, 0);
 StopMarking(wnd);
 return TRUE;
 }
 else
 PrevY = -1;
 }
 return FALSE;
}
/* ---- Process text block keys for multiline text box ---- */
static void DoMultiLines(WINDOW wnd, int c, PARAM p2)
{
 if (isMultiLine(wnd)) {
 if ((int)p2 & (LEFTSHIFT RIGHTSHIFT)) {
 int kx, ky;
 SendMessage(NULL, CURRENT_KEYBOARD_CURSOR,
 (PARAM) &kx, (PARAM) &ky);
 kx -= GetClientLeft(wnd);
 ky -= GetClientTop(wnd);
 switch (c) {
 case HOME:
 case END:
 case CTRL_HOME:
 case CTRL_END:
 case PGUP:
 case PGDN:
 case CTRL_PGUP:
 case CTRL_PGDN:
 case UP:
 case DN:
 case FWD:
 case BS:
 case CTRL_FWD:
 case CTRL_BS:
 if (!KeyBoardMarking) {
 if (TextBlockMarked(wnd)) {
 ClearTextBlock(wnd);
 SendMessage(wnd, PAINT, 0, 0);
 }
 KeyBoardMarking = TextMarking = TRUE;
 SetAnchor(wnd, kx+wnd->wleft,
 ky+wnd->wtop);
 }

 break;
 default:
 break;
 }
 }
 }
}
/* ---------- page/scroll keys ----------- */
static int DoScrolling(WINDOW wnd, int c, PARAM p2)
{
 switch (c) {
 case PGUP:
 case PGDN:
 if (isMultiLine(wnd))
 BaseWndProc(EDITBOX, wnd, KEYBOARD, c, p2);
 break;
 case CTRL_PGUP:
 case CTRL_PGDN:
 BaseWndProc(EDITBOX, wnd, KEYBOARD, c, p2);
 break;
 case HOME:
 Home(wnd);
 break;
 case END:
 End(wnd);
 break;
 case CTRL_FWD:
 NextWord(wnd);
 break;
 case CTRL_BS:
 PrevWord(wnd);
 break;
 case CTRL_HOME:
 if (isMultiLine(wnd)) {
 SendMessage(wnd, SCROLLDOC, TRUE, 0);
 wnd->CurrLine = 0;
 wnd->WndRow = 0;
 }
 Home(wnd);
 break;
 case CTRL_END:
 if (isMultiLine(wnd) && wnd->wlines > 0) {
 SendMessage(wnd, SCROLLDOC, FALSE, 0);
 SetLinePointer(wnd, wnd->wlines-1);
 wnd->WndRow =
 min(ClientHeight(wnd)-1, wnd->wlines-1);
 Home(wnd);
 }
 End(wnd);
 break;
 case UP:
 if (isMultiLine(wnd))
 Upward(wnd);
 break;
 case DN:
 if (isMultiLine(wnd))
 Downward(wnd);
 break;
 case FWD:

 Forward(wnd);
 break;
 case BS:
 Backward(wnd);
 break;
 default:
 return FALSE;
 }
 if (!KeyBoardMarking && TextBlockMarked(wnd)) {
 ClearTextBlock(wnd);
 SendMessage(wnd, PAINT, 0, 0);
 }
 SendMessage(wnd, KEYBOARD_CURSOR, WndCol, wnd->WndRow);
 return TRUE;
}
/* -------------- Del key ---------------- */
static int DelKey(WINDOW wnd)
{
 char *currchar = CurrChar;
 int repaint = *currchar == '\n';
 if (TextBlockMarked(wnd)) {
 SendMessage(wnd, COMMAND, ID_DELETETEXT, 0);
 SendMessage(wnd, PAINT, 0, 0);
 return TRUE;
 }
 if (*(currchar+1) == '\0')
 return TRUE;
 strcpy(currchar, currchar+1);
 if (repaint) {
 BuildTextPointers(wnd);
 SendMessage(wnd, PAINT, 0, 0);
 }
 else {
 ModTextPointers(wnd, wnd->CurrLine+1, -1);
 WriteTextLine(wnd, NULL, wnd->WndRow+wnd->wtop, FALSE);
 }
 wnd->TextChanged = TRUE;
 return FALSE;
}
/* ------------ Tab key ------------ */
static int TabKey(WINDOW wnd, PARAM p2)
{
 if (isMultiLine(wnd)) {
 int insmd = wnd->InsertMode;
 do {
 char *cc = CurrChar+1;
 if (!insmd && *cc == '\0')
 break;
 if (wnd->textlen == wnd->MaxTextLength)
 break;
 SendMessage(wnd,KEYBOARD,insmd ? ' ' : FWD,0);
 } while (wnd->CurrCol % cfg.Tabs);
 return TRUE;
 }
 PostMessage(GetParent(wnd), KEYBOARD, '\t', p2);
 return FALSE;
}
/* --------- All displayable typed keys ------------- */
static void KeyTyped(WINDOW wnd, int c)

{
 char *currchar = CurrChar;
 if ((c != '\n' && c < ' ') (c & 0x1000))
 /* ---- not recognized by editor --- */
 return;
 if (!isMultiLine(wnd) && TextBlockMarked(wnd)) {
 ResetEditBox(wnd);
 currchar = CurrChar;
 }
 if (*currchar == '\0') {
 /* ---- typing at end of text ---- */
 if (currchar == wnd->text+wnd->MaxTextLength) {
 /* ---- typing at the end of maximum buffer ---- */
 beep();
 return;
 }
 /* --- insert a newline at end of text --- */
 *currchar = '\n';
 *(currchar+1) = '\0';
 BuildTextPointers(wnd);
 }
 /* --- displayable char or newline --- */
 if (c == '\n' wnd->InsertMode *currchar == '\n') {
 /* ------ inserting the keyed character ------ */
 if (wnd->text[wnd->textlen-1] != '\0') {
 /* --- the current text buffer is full --- */
 if (wnd->textlen == wnd->MaxTextLength) {
 /* --- text buffer is at maximum size --- */
 beep();
 return;
 }
 /* ---- increase the text buffer size ---- */
 wnd->textlen += GROWLENGTH;
 /* --- but not above maximum size --- */
 if (wnd->textlen > wnd->MaxTextLength)
 wnd->textlen = wnd->MaxTextLength;
 wnd->text = realloc(wnd->text, wnd->textlen+2);
 wnd->text[wnd->textlen-1] = '\0';
 currchar = CurrChar;
 }
 memmove(currchar+1, currchar, strlen(currchar)+1);
 ModTextPointers(wnd, wnd->CurrLine+1, 1);
 if (isMultiLine(wnd) && wnd->wlines > 1)
 wnd->textwidth = max(wnd->textwidth,
 (int) (TextLine(wnd, wnd->CurrLine+1)-
 TextLine(wnd, wnd->CurrLine)));
 else
 wnd->textwidth = max(wnd->textwidth,
 strlen(wnd->text));
 WriteTextLine(wnd, NULL,
 wnd->wtop+wnd->WndRow, FALSE);
 }
 /* ----- put the char in the buffer ----- */
 *currchar = c;
 wnd->TextChanged = TRUE;
 if (c == '\n') {
 wnd->wleft = 0;
 BuildTextPointers(wnd);
 End(wnd);

 Forward(wnd);
 SendMessage(wnd, PAINT, 0, 0);
 return;
 }
 /* ---------- test end of window --------- */
 if (WndCol == ClientWidth(wnd)-1) {
 int dif;
 char *cp = currchar;
 while (*cp != ' ' && cp != TextLine(wnd, wnd->CurrLine))
 --cp;
 if (!isMultiLine(wnd) 
 cp == TextLine(wnd, wnd->CurrLine) 
 !wnd->WordWrapMode)
 SendMessage(wnd, HORIZSCROLL, TRUE, 0);
 else {
 dif = 0;
 if (c != ' ') {
 dif = (int) (currchar - cp);
 wnd->CurrCol -= dif;
 SendMessage(wnd, KEYBOARD, DEL, 0);
 --dif;
 }
 SendMessage(wnd, KEYBOARD, '\r', 0);
 currchar = CurrChar;
 wnd->CurrCol = dif;
 if (c == ' ')
 return;
 }
 }
 /* ------ display the character ------ */
 SetStandardColor(wnd);
 PutWindowChar(wnd, c, WndCol, wnd->WndRow);
 /* ----- advance the pointers ------ */
 wnd->CurrCol++;
}
/* ------------ screen changing key strokes ------------- */
static int DoKeyStroke(WINDOW wnd, int c, PARAM p2)
{
 switch (c) {
 case RUBOUT:
 Backward(wnd);
 case DEL:
 if (DelKey(wnd))
 return TRUE;
 break;
 case CTRL_FIVE: /* same as Shift+Tab */
 if (!((int)p2 & (LEFTSHIFT RIGHTSHIFT)))
 break;
 case '\t':
 if (TabKey(wnd, p2))
 return TRUE;
 break;
 case '\r':
 if (!isMultiLine(wnd)) {
 PostMessage(GetParent(wnd), KEYBOARD, c, p2);
 break;
 }
 c = '\n';
 default:

 if (TextBlockMarked(wnd)) {
 SendMessage(wnd, COMMAND, ID_DELETETEXT, 0);
 SendMessage(wnd, PAINT, 0, 0);
 }
 KeyTyped(wnd, c);
 break;
 }
 return FALSE;
}
/* ----------- KEYBOARD Message ---------- */
static int KeyboardMsg(WINDOW wnd, PARAM p1, PARAM p2)
{
 int c = (int) p1;
 if (WindowMoving WindowSizing ((int)p2 & ALTKEY))
 return FALSE;
 switch (c) {
 /* --- these keys get processed by lower classes --- */
 case ESC:
 case F1:
 case F2:
 case F3:
 case F4:
 case F5:
 case F6:
 case F7:
 case F8:
 case F9:
 case F10:
 case INS:
 case SHIFT_INS:
 case SHIFT_DEL:
 return FALSE;
 /* --- these keys get processed here --- */
 case CTRL_FWD:
 case CTRL_BS:
 case CTRL_HOME:
 case CTRL_END:
 case CTRL_PGUP:
 case CTRL_PGDN:
 break;
 default:
 /* other ctrl keys get processed by lower classes */
 if ((int)p2 & CTRLKEY)
 return FALSE;
 /* --- all other keys get processed here --- */
 break;
 }
 DoMultiLines(wnd, c, p2);
 if (DoScrolling(wnd, c, p2)) {
 if (KeyBoardMarking)
 ExtendBlock(wnd, WndCol, wnd->WndRow);
 }
 else if (!TestAttribute(wnd, READONLY)) {
 DoKeyStroke(wnd, c, p2);
 SendMessage(wnd, KEYBOARD_CURSOR, WndCol, wnd->WndRow);
 }
 return TRUE;
}
/* ----------- SHIFT_CHANGED Message ---------- */

static void ShiftChangedMsg(WINDOW wnd, PARAM p1)
{
 if (!((int)p1 & (LEFTSHIFT RIGHTSHIFT)) &&
 KeyBoardMarking) {
 StopMarking(wnd);
 KeyBoardMarking = FALSE;
 }
}
/* ----------- ID_DELETETEXT Command ---------- */
static void DeleteTextCmd(WINDOW wnd)
{
 if (TextBlockMarked(wnd)) {
 char *bbl=TextLine(wnd,wnd->BlkBegLine)+wnd->BlkBegCol;
 char *bel=TextLine(wnd,wnd->BlkEndLine)+wnd->BlkEndCol;
 int len = (int) (bel - bbl);
 SaveDeletedText(wnd, bbl, len);
 wnd->TextChanged = TRUE;
 strcpy(bbl, bel);
 wnd->CurrLine = TextLineNumber(wnd, bbl-wnd->BlkBegCol);
 wnd->CurrCol = wnd->BlkBegCol;
 wnd->WndRow = wnd->BlkBegLine - wnd->wtop;
 if (wnd->WndRow < 0) {
 wnd->wtop = wnd->BlkBegLine;
 wnd->WndRow = 0;
 }
 SendMessage(wnd, KEYBOARD_CURSOR, WndCol, wnd->WndRow);
 ClearTextBlock(wnd);
 BuildTextPointers(wnd);
 }
}
/* ----------- ID_CLEAR Command ---------- */
static void ClearCmd(WINDOW wnd)
{
 if (TextBlockMarked(wnd)) {
 char *bbl=TextLine(wnd,wnd->BlkBegLine)+wnd->BlkBegCol;
 char *bel=TextLine(wnd,wnd->BlkEndLine)+wnd->BlkEndCol;
 int len = (int) (bel - bbl);
 SaveDeletedText(wnd, bbl, len);
 wnd->CurrLine = TextLineNumber(wnd, bbl);
 wnd->CurrCol = wnd->BlkBegCol;
 wnd->WndRow = wnd->BlkBegLine - wnd->wtop;
 if (wnd->WndRow < 0) {
 wnd->WndRow = 0;
 wnd->wtop = wnd->BlkBegLine;
 }
 /* ------ change all text lines in block to \n ----- */
 while (bbl < bel) {
 char *cp = strchr(bbl, '\n');
 if (cp > bel)
 cp = bel;
 strcpy(bbl, cp);
 bel -= (int) (cp - bbl);
 bbl++;
 }
 ClearTextBlock(wnd);
 BuildTextPointers(wnd);
 SendMessage(wnd, KEYBOARD_CURSOR, WndCol, wnd->WndRow);
 SendMessage(wnd, PAINT, 0, 0);
 wnd->TextChanged = TRUE;

 }
}
/* ----------- ID_UNDO Command ---------- */
static void UndoCmd(WINDOW wnd)
{
 if (wnd->DeletedText != NULL) {
 PasteText(wnd, wnd->DeletedText, wnd->DeletedLength);
 free(wnd->DeletedText);
 wnd->DeletedText = NULL;
 wnd->DeletedLength = 0;
 SendMessage(wnd, PAINT, 0, 0);
 }
}
/* ----------- ID_PARAGRAPH Command ---------- */
static void ParagraphCmd(WINDOW wnd)
{
 int bc, ec, fl, el, Blocked;
 char *bl, *bbl, *bel, *bb;

 el = wnd->BlkEndLine;
 ec = wnd->BlkEndCol;
 if (!TextBlockMarked(wnd)) {
 Blocked = FALSE;
 /* ---- forming paragraph from cursor position --- */
 fl = wnd->wtop + wnd->WndRow;
 bbl = bel = bl = TextLine(wnd, wnd->CurrLine);
 if ((bc = wnd->CurrCol) >= ClientWidth(wnd))
 bc = 0;
 Home(wnd);
 /* ---- locate the end of the paragraph ---- */
 while (*bel) {
 int blank = TRUE;
 char *bll = bel;
 /* --- blank line marks end of paragraph --- */
 while (*bel && *bel != '\n') {
 if (*bel != ' ')
 blank = FALSE;
 bel++;
 }
 if (blank) {
 bel = bll;
 break;
 }
 if (*bel)
 bel++;
 }
 if (bel == bbl) {
 SendMessage(wnd, KEYBOARD, DN, 0);
 return;
 }
 if (*bel == '\0')
 --bel;
 if (*bel == '\n')
 --bel;
 }
 else {
 /* ---- forming paragraph from marked block --- */
 Blocked = TRUE;
 bbl = TextLine(wnd, wnd->BlkBegLine) + wnd->BlkBegCol;

 bel = TextLine(wnd, wnd->BlkEndLine) + wnd->BlkEndCol;
 fl = wnd->BlkBegLine;
 bc = wnd->CurrCol = wnd->BlkBegCol;
 wnd->CurrLine = fl;
 if (fl < wnd->wtop)
 wnd->wtop = fl;
 wnd->WndRow = fl - wnd->wtop;
 SendMessage(wnd, KEYBOARD, '\r', 0);
 el++, fl++;
 if (bc != 0) {
 SendMessage(wnd, KEYBOARD, '\r', 0);
 el++, fl ++;
 }
 bc = 0;
 bl = TextLine(wnd, fl);
 wnd->CurrLine = fl;
 bbl = bl + bc;
 bel = TextLine(wnd, el) + ec;
 }
 /* --- change all newlines in block to spaces --- */
 while (CurrChar < bel) {
 if (*CurrChar == '\n') {
 *CurrChar = ' ';
 wnd->CurrLine++;
 wnd->CurrCol = 0;
 }
 else
 wnd->CurrCol++;
 }
 /* ---- insert newlines at new margin boundaries ---- */
 bb = bbl;
 while (bbl < bel) {
 bbl++;
 if ((int)(bbl - bb) == ClientWidth(wnd)-1) {
 while (*bbl != ' ' && bbl > bb)
 --bbl;
 if (*bbl != ' ') {
 bbl = strchr(bbl, ' ');
 if (bbl == NULL bbl >= bel)
 break;
 }
 *bbl = '\n';
 bb = bbl+1;
 }
 }
 ec = (int)(bel - bb);
 BuildTextPointers(wnd);
 if (Blocked) {
 /* ---- position cursor at end of new paragraph ---- */
 if (el < wnd->wtop 
 wnd->wtop + ClientHeight(wnd) < el)
 wnd->wtop = el-ClientHeight(wnd);
 if (wnd->wtop < 0)
 wnd->wtop = 0;
 wnd->WndRow = el - wnd->wtop;
 wnd->CurrLine = el;
 wnd->CurrCol = ec;
 SendMessage(wnd, KEYBOARD, '\r', 0);
 SendMessage(wnd, KEYBOARD, '\r', 0);

 }
 else {
 /* --- put cursor back at beginning --- */
 wnd->CurrLine = TextLineNumber(wnd, bl);
 wnd->CurrCol = bc;
 if (fl < wnd->wtop)
 wnd->wtop = fl;
 wnd->WndRow = fl - wnd->wtop;
 }
 SendMessage(wnd, PAINT, 0, 0);
 SendMessage(wnd, KEYBOARD_CURSOR, WndCol, wnd->WndRow);
 wnd->TextChanged = TRUE;
 BuildTextPointers(wnd);
}
/* ----------- COMMAND Message ---------- */
static int CommandMsg(WINDOW wnd, PARAM p1)
{
 switch ((int)p1) {
 case ID_DELETETEXT:
 DeleteTextCmd(wnd);
 return TRUE;
 case ID_CLEAR:
 ClearCmd(wnd);
 return TRUE;
 case ID_UNDO:
 UndoCmd(wnd);
 return TRUE;
 case ID_PARAGRAPH:
 ParagraphCmd(wnd);
 return TRUE;
 default:
 break;
 }
 return FALSE;
}
/* ---------- CLOSE_WINDOW Message ----------- */
static void CloseWindowMsg(WINDOW wnd)
{
 SendMessage(NULL, HIDE_CURSOR, 0, 0);
 if (wnd->DeletedText != NULL)
 free(wnd->DeletedText);
}
/* ------- Window processing module for EDITBOX class ------ */
int EditBoxProc(WINDOW wnd, MESSAGE msg, PARAM p1, PARAM p2)
{
 int rtn;
 switch (msg) {
 case CREATE_WINDOW:
 return CreateWindowMsg(wnd);
 case ADDTEXT:
 return AddTextMsg(wnd, p1, p2);
 case SETTEXT:
 return SetTextMsg(wnd, p1);
 case CLEARTEXT:
 ResetEditBox(wnd);
 break;
 case GETTEXT:
 return GetTextMsg(wnd, p1, p2);
 case SETTEXTLENGTH:

 return SetTextLengthMsg(wnd, (unsigned) p1);
 case KEYBOARD_CURSOR:
 return KeyboardCursorMsg(wnd, p1, p2);
 case SETFOCUS:
 case PAINT:
 case MOVE:
 rtn = BaseWndProc(EDITBOX, wnd, msg, p1, p2);
 SendMessage(wnd,KEYBOARD_CURSOR,WndCol,wnd->WndRow);
 return rtn;
 case SIZE:
 return SizeMsg(wnd, p1, p2);
 case SCROLL:
 return ScrollMsg(wnd, p1);
 case HORIZSCROLL:
 return HorizScrollMsg(wnd, p1);
 case SCROLLPAGE:
 return ScrollPageMsg(wnd, p1);
 case HORIZPAGE:
 return HorizPageMsg(wnd, p1);
 case LEFT_BUTTON:
 if (LeftButtonMsg(wnd, p1, p2))
 return TRUE;
 break;
 case MOUSE_MOVED:
 if (MouseMovedMsg(wnd, p1, p2))
 return TRUE;
 break;
 case BUTTON_RELEASED:
 if (ButtonReleasedMsg(wnd))
 return TRUE;
 break;
 case KEYBOARD:
 if (KeyboardMsg(wnd, p1, p2))
 return TRUE;
 break;
 case SHIFT_CHANGED:
 ShiftChangedMsg(wnd, p1);
 break;
 case COMMAND:
 if (CommandMsg(wnd, p1))
 return TRUE;
 break;
 case CLOSE_WINDOW:
 CloseWindowMsg(wnd);
 break;
 default:
 break;
 }
 return BaseWndProc(EDITBOX, wnd, msg, p1, p2);
}
/* ------ save deleted text for the Undo command ------ */
static void SaveDeletedText(WINDOW wnd, char *bbl, int len)
{
 wnd->DeletedLength = len;
 if ((wnd->DeletedText=realloc(wnd->DeletedText,len))!=NULL)
 memmove(wnd->DeletedText, bbl, len);
}
/* ---- cursor right key: right one character position ---- */
static void Forward(WINDOW wnd)

{
 char *cc = CurrChar+1;
 if (*cc == '\0')
 return;
 if (*CurrChar == '\n') {
 Home(wnd);
 Downward(wnd);
 }
 else {
 wnd->CurrCol++;
 if (WndCol == ClientWidth(wnd))
 SendMessage(wnd, HORIZSCROLL, TRUE, 0);
 }
}
/* ----- stick the moving cursor to the end of the line ---- */
static void StickEnd(WINDOW wnd)
{
 char *cp = TextLine(wnd, wnd->CurrLine);
 char *cp1 = strchr(cp, '\n');
 int len = cp1 ? (int) (cp1 - cp) : 0;
 wnd->CurrCol = min(len, wnd->CurrCol);
 if (wnd->wleft > wnd->CurrCol) {
 wnd->wleft = max(0, wnd->CurrCol - 4);
 SendMessage(wnd, PAINT, 0, 0);
 }
 else if (wnd->CurrCol-wnd->wleft >= ClientWidth(wnd)) {
 wnd->wleft = wnd->CurrCol - (ClientWidth(wnd)-1);
 SendMessage(wnd, PAINT, 0, 0);
 }
}
/* --------- cursor down key: down one line --------- */
static void Downward(WINDOW wnd)
{
 if (isMultiLine(wnd) &&
 wnd->WndRow+wnd->wtop+1 < wnd->wlines) {
 wnd->CurrLine++;
 if (wnd->WndRow == ClientHeight(wnd)-1)
 SendMessage(wnd, SCROLL, TRUE, 0);
 wnd->WndRow++;
 StickEnd(wnd);
 }
}
/* -------- cursor up key: up one line ------------ */
static void Upward(WINDOW wnd)
{
 if (isMultiLine(wnd) && wnd->CurrLine != 0) {
 if (wnd->CurrLine > 0)
 --wnd->CurrLine;
 if (wnd->WndRow == 0)
 SendMessage(wnd, SCROLL, FALSE, 0);
 --wnd->WndRow;
 StickEnd(wnd);
 }
}
/* ---- cursor left key: left one character position ---- */
static void Backward(WINDOW wnd)
{
 if (wnd->CurrCol) {
 if (wnd->CurrCol-- <= wnd->wleft)

 if (wnd->wleft != 0)
 SendMessage(wnd, HORIZSCROLL, FALSE, 0);
 }
 else if (isMultiLine(wnd) && wnd->CurrLine != 0) {
 Upward(wnd);
 End(wnd);
 }
}
/* -------- End key: to end of line ------- */
static void End(WINDOW wnd)
{
 while (*CurrChar && *CurrChar != '\n')
 ++wnd->CurrCol;
 if (WndCol >= ClientWidth(wnd)) {
 wnd->wleft = wnd->CurrCol - (ClientWidth(wnd)-1);
 SendMessage(wnd, PAINT, 0, 0);
 }
}
/* -------- Home key: to beginning of line ------- */
static void Home(WINDOW wnd)
{
 wnd->CurrCol = 0;
 if (wnd->wleft != 0) {
 wnd->wleft = 0;
 SendMessage(wnd, PAINT, 0, 0);
 }
}
/* -- Ctrl+cursor right key: to beginning of next word -- */
static void NextWord(WINDOW wnd)
{
 int savetop = wnd->wtop;
 int saveleft = wnd->wleft;
 ClearVisible(wnd);
 while (!isWhite(*CurrChar)) {
 char *cc = CurrChar+1;
 if (*cc == '\0')
 break;
 Forward(wnd);
 }
 while (isWhite(*CurrChar)) {
 char *cc = CurrChar+1;
 if (*cc == '\0')
 break;
 Forward(wnd);
 }
 SetVisible(wnd);
 SendMessage(wnd, KEYBOARD_CURSOR, WndCol, wnd->WndRow);
 if (wnd->wtop != savetop wnd->wleft != saveleft)
 SendMessage(wnd, PAINT, 0, 0);
}
/* -- Ctrl+cursor left key: to beginning of previous word -- */
static void PrevWord(WINDOW wnd)
{
 int savetop = wnd->wtop;
 int saveleft = wnd->wleft;
 ClearVisible(wnd);
 Backward(wnd);
 while (isWhite(*CurrChar)) {
 if (wnd->CurrLine == 0 && wnd->CurrCol == 0)

 break;
 Backward(wnd);
 }
 while (!isWhite(*CurrChar)) {
 if (wnd->CurrLine == 0 && wnd->CurrCol == 0)
 break;
 Backward(wnd);
 }
 if (isWhite(*CurrChar))
 Forward(wnd);
 SetVisible(wnd);
 if (wnd->wleft != saveleft)
 if (wnd->CurrCol >= saveleft)
 if (wnd->CurrCol - saveleft < ClientWidth(wnd))
 wnd->wleft = saveleft;
 SendMessage(wnd, KEYBOARD_CURSOR, WndCol, wnd->WndRow);
 if (wnd->wtop != savetop wnd->wleft != saveleft)
 SendMessage(wnd, PAINT, 0, 0);
}
/* ----- reset the text attributes of an EDITBOX ------- */
static void ResetEditBox(WINDOW wnd)
{
 unsigned blen = EditBufLen(wnd)+2;
 wnd->text = realloc(wnd->text, blen);
 memset(wnd->text, 0, blen);
 wnd->wlines = 0;
 wnd->CurrLine = 0;
 wnd->CurrCol = 0;
 wnd->WndRow = 0;
 wnd->wleft = 0;
 wnd->wtop = 0;
 wnd->textwidth = 0;
 wnd->TextChanged = FALSE;
 ClearTextPointers(wnd);
 ClearTextBlock(wnd);
}
/* ----- modify text pointers from a specified position
 by a specified plus or minus amount ----- */
static void ModTextPointers(WINDOW wnd, int lineno, int var)
{
 while (lineno < wnd->wlines)
 *((wnd->TextPointers) + lineno++) += var;
}



















January, 1992
STRUCTURED PROGRAMMING


Chewing the Wrapper




Jeff Duntemann KG7JF


It was 1971, and I was a college sophomore at a beer bust put on by a
fraternity hungry enough for pledges to admit anyone. I was dressed in a
bright yellow sweater and bright purple bell-bottoms, trying very hard to grow
my hair without realizing the ultimate futility of the effort. (Can you
picture me with shoulder-length hair? Sigh. I can't either.)
As often happens at parties, an impassioned discussion between two people
begins to attract a crowd, and before long a considerable fraction of the
party was watching me debate some half-sloshed prelaw type on the merits of
bringing the United Nations into the Vietnam conflict. Or maybe it was the
moral imperative of passing the E.R.A. I forget--because all the while I was
half-watching a gorgeous young woman who was hanging on my every word,
following my discourse with this look of unbelieving awe on her face.
Shall we say this was not an everyday occurrence, and her interest inspired me
to even greater heights of eloquence. Was it my sweater? My sideburns? Or
could it be that at least one girl in this five-and-dime college appreciated
the power of brains over biceps?
The prelaw slurred some minor insult at me and slunk away, defeated. The crowd
wandered off--but she hung on, eyes like sapphires riveted upon me, and in our
single moment of intimacy she breathlessly revealed the secret of her
admiration: "You know, you always talk in complete sentences!"


Chewing the Wrapper


My God! She had thrown away the gum and was chewing the wrapper! What about my
passion? What about my social awareness? What about my obvious allegiance to
the greater good of mankind? No matter--she went home with some football
player, and I went home with my complete sentences. I guess in the long run we
both got what we deserved.
There's a lesson here. Rarely are our creative efforts admired for what we as
creators consider most admirable. Isaac Newton wanted to be remembered for his
theology--calculus was just a throwaway. The seminal object orientation of
Smalltalk was ignored for 15 years because people were too busy ooh-ing and
ahh-ing at its primordial GUI.
I expect this will happen more and more these days, as it ironically grows
easier and easier to create a flashy user interface and positively murder to
sort out an application's internals. It's humbling to keep in mind as you
struggle to master event-driven programming under Turbo Vision or Windows:
They're not going to admire the intricate subtlety or robustness of your event
loop. They're going to admire the color coordination of your scroll bars.


Wrestling Events


It's time to duck under the scroll bars and confront that goblin that's been
making so many self-educated programmers tear out their hair these modern
days. Event-driven programming is the way it's gonna be from now on. Get used
to it. And consider the fact that without it, those pretty scroll bars just
wouldn't come together as easily.
I've already laid a lot of the conceptual groundwork for Turbo Vision and
event-driven architectures generally, in my December 1990 column. If you have
it lying around, it wouldn't hurt to read it through once again. I don't have
the room to do much recap here...but a little won't hurt:
Deep in the heart of Turbo Vision's application object, TApplication, a loop
runs in endless circles waiting for certain things to happen. Mostly those
certain things are keystrokes, mouse clicks, and changes in mouse position.
When one of these significant user-generated happenings happens, TApplication
creates a Pascal record called an event record, fills it with a description of
what happened and where, and sends it bubbling upward for some other part of
the application to respond to.
A lot of the problem with Turbo Vision is that so much of this process happens
beneath the surface. A great many events the user generates never get high
enough for your own code to "see" them--other parts of the Turbo Vision
infrastructure grab them and react to them first. In general, your code sees
an event only after all the rest of Turbo Vision has had a crack at it.
But on the other hand, the major reason for creating an event-driven
application is to allow this submerging of detail beneath the level of the
application program. In one sense, the whole raison d'etre of event-driven
programming is to formalize the handling of user input, so the application
framework can handle much of that input on its own. Turbo Vision modularizes a
program into distinct and well-defined contexts, and separates relevant from
irrelevant input within each context. When you're not working specifically in
a given context, Turbo Vision hides input irrelevant to that context.
Once you've become familiar with the details of this service (which is
unprecedented in the Turbo Pascal world) its peculiarities cease to be
peculiar.


Where Events Come From


Turbo Vision doesn't really do away with the old-fashioned "loop for input"
style of programming. The loop is there. You just can't see it anymore. It's
down in the Run method of TApplication, constantly checking its built-in
indicators to see if any sort of input has appeared since it last looked. If
new input is available, Run wraps it up in a Pascal record called an event
record. It then routes the event record to its appropriate destination. What
that destination is depends on a number of things, most of which we'll cover
in time.
Many events are handled internally to TApplication, without your ever knowing,
perhaps, that an event occurred. Some events are handled automatically in
methods inherited by views you "created"--so that even one of "your" views can
handle events without your necessary knowledge or understanding.
If it were that simple--pass up an event from TApplication, and let an
appropriate view grab it and react to it--Turbo Vision wouldn't be the
conceptual problem that it is. The path an event takes from the user to the
end of one particular event road can be a tangled one. An event can change
shape while it winds its way to its destination as well, not once but several
times. You can watch the path an event takes (sort of) if you know where to
look, and when.


Following a Simple Event


Let me give you an example, from the HCalc program published here a couple of
months ago. HCalc has a fairly simple menu, with only two items: one that
creates a new mortgage window, and another that closes all open mortgage
windows. When your user pulls down the Mortgage menu and clicks the mouse on
the New menu option, that's an event.
Even though it may appear that way, the event doesn't go directly to the menu.
It goes first to TApplication, which determines (by reading the location of
the mouse when it was clicked) which menu option was being selected. It then
routes the event to the TMenuBar object.
TMenuBar performs a translation. The mouse click event it was handed (and this
happens where you can't see it) becomes a different sort of event, one called
a command. The transformed command is then sent to your specific application
object, THouseCalcApp, for processing. THouseCalcApp has its own
event-processing method called HandleEvent. When THouseCalcApp is routed an
event, it first gives its parent class (TApplication) a shot at handling the
command. (You can see this in THouseCalcApp.HandleEvent, the first state-ment
of which is TApplication.HandleEvent.) If TApplication chooses not to handle
the command, THouseCalcApp then handles the event itself--in this case, by
calling the constructor for a new mortgage window, thus creating the window.
I've drawn a picture of the process, with all its ups and downs, in Figure 1.
Whew. Let's think about what's gone on so far. The user clicks the mouse on a
menu item. This click is an event. TApplication detects the event, and creates
a mouse-click event record. It sends this record to the menu-bar object it
owns, TMenuBar. TMenuBar sends the event to the particular menu that was
accessed; in this case, the Mortgage menu. That menu is set up to respond to a
mouse click event by creating a new command. This command has a name
(cmNewMortgage) and the menu sends the command to the application object,
THouseCalcApp.
THouseCalcApp sends the event "upstairs" to its parent object type to look at
and perhaps handle. (Note this is the second time TApplication has worked with
this event!) TApplication does not know how to handle a cmNewMortgage command,
so it does nothing. Finally, THouseCalcApp handles the command by creating a
new mortgage.


Getting There from Here



If you stare at it long enough, the process begins to make sense. Now let's
take a look and see what has to be done in the code you write to make all this
happen.
First of all, why this translation of a mouse event into a command? Why not
just route the mouse event through the menu and on to THouseCalcApp? The
answer is that two different kinds of events can select the New option on the
Mortgage menu. One is the mouse-generated event we followed in the example,
which we call a positional event, because it relates to a specific position on
the screen. The other is a keyboard event, which (for reasons I'll explain a
little later) is called a focused event. Either kind of event can pick a menu
selection, but once the menu selection is picked, there is only one
unambiguous path until the event is handled. The command translation serves to
force multiple inputs into the menu to a single input out of the menu and into
the application's event handler.
This connection of a menu item with a command is defined when you define the
menus in the menu bar. It's done in the THouseCalcApp.InitMenuBar method, near
line 190 in HCALC.PAS. The definition of the New item looks like this:
NewItem('~N~ew','F6',kbF6,cm NewMortgage, hcNoContext,.
Each item on a menu gets a line such as this in the InitMenuBar method. The
first item is the name of the menu as it appears in the menu, in string form.
The tilde characters surround the single-key abbreviation for the menu item.
This abbreviation may be typed, when the menu in question has the focus (more
on focus later on; for now consider the focus to be where the keyboard is
currently sending characters) and is highlighted in a distinctive color or
gray shade to indicate that fact.
The second parameter is the displayable text form of the single-key shortcut
that will select that menu item, even when the menu is not pulled down. The
third item is the key code for that shortcut. Here, "F6" is what is shown on
the New menu line to the right of the word New, and kbF6 is what tells Turbo
Vision that the F6 function key is the shortcut to that menu item.
The fourth parameter is the name of the command this particular menu item
issues when selected, however it is selected (mouse, menu selection, or
shortcut key, regardless). A command is in fact a numeric value you define as
a constant and give (one would hope) a descriptive name. The cmNewMortgage
constant is defined at line 8 of HCALC.PAS. The value (here, 199) is
arbitrary, as long as it doesn't conflict with any predefined command
constants. I always begin with the value 199 and work down from there. The
"cm" in front of the command name is merely a convention--but there's enough
difficulty in understanding TV code as it is, and every reasonable convention
should be embraced like a life jacket in a hurricane.
Initializing a TMenuBar object with only one two-item menu is simple and
compact. Initializing a seven- or eight-menu menu bar, where each menu has six
or seven items, can take pages of nested calls to NewMenu and NewItem. Setting
up the menu bar is one of those cases where interactive tools (such as
Blaise's Turbo Vision Development Toolkit, described last month) really earn
their pay.


How Events Know Where to Go


One nice thing about events, TV-style, is that they allow you to "stub out" a
menu item without doing anything extra. If you've done your UI-design homework
and you have your complete menu structure down on paper before you start
coding, you can create the entire menu structure long before you write any of
the code that implements the menu choices. You "stub out" a menu item by
simply not writing anything that handles the command generated by that menu
item.
And then when you finally get around to writing code to support that menu
item, well, then it's supported.
This is because command events generated from the menu bar aren't sent to one
specific destination for handling. In a way, generated commands are passed
from hand to hand through the system until some object somewhere knows how to
respond to that command. If no object responds to the command, well, nothing
bad has to happen.
There are in fact several different ways for events to be routed through a
Turbo Vision application. One way is the broadcast event, which, when
generated, goes to literally everything within the application that has the
ability to respond to events. High-level commands such as "Close up everything
and shut down!" are generally broadcast events. I did this in HCALC.PAS, as
I'll explain a little later.
Broadcast events are rare, however. Most of the time, you'll be dealing with
focused events, which, while they don't have any specific destination, have a
definite path to follow.


Focused Activity


Somewhere in every active Turbo Vision application is the focus. Think of the
focus as the spotlight: Where it shines is where events go. The focus is
typically a control; that is, a software gadget such as a button, an edit
field, or some other construct that accepts input from the mouse or keyboard.
The focus is indicated by a distinctive color on color screens, or by some
sort of bracketing characters on monochrome screens. You can move the focus by
pressing the tab key, or by clicking on the new focus location with the mouse.
There are some shortcut keys as well; pressing F10 always brings the focus to
the menu bar, no matter where else you were before. (This isn't quite true.
F10 won't get you out of a modal view, such as a typical dialog box. This is
one of those "gotta remembers" that will drive you nuts during that crucial
first week or so of trying to comprehend TV....)
Think of the focus as the other end of an event hose that begins deep inside
TApplication. You can move one end of the hose--the end where the events come
out--but the opposite end of the hose is inextricably and invisibly connected
to the place where events come from.


Mode-ing You In


Moving the focus from a user perspective is simple: You select something, or
hit a shortcut key that moves the focus someplace specific (such as the menu
bar). Understanding the routing of events to the focus from a programmer's
perspective requires understanding a few other things, such as what a modal
view is.
In that seminal August 1981 issue of BYTE that introduced Smalltalk to a
slathering world, Larry Tesler (he of the somewhat later Object Pascal spec)
wrote of a T-shirt he had that read, "Don't Mode Me In!" The T-shirt's
complaint was of environments that got you into a mode of some sort, and while
in that mode, severely restricted the meaning and effects of ordinary user
operations. The Cuba Lake Effect has something to do with modes; while you're
in Menu Mode you can't jump out and check available memory, or clear
unnecessary files from disk to make room for something new--perhaps something
the menu is demanding before letting you move on. You're stuck in the menu
until the menu lets you out on the menu's own terms.
A modal view is a view that won't let you out until you actually close the
view. The best example is a typical dialog box such as the one that accepts
mortgage values within HCALC.PAS. Once you bring the dialog up, you either
have to cancel it or accept its values and close it before you can go on to do
anything else. Trying to click on another window or on the menu bar while a
dialog box (or any modal view) is open will just get your efforts ignored.
There is always a modal view in control while a TV application is running.
Nearly all of the time, that modal view is the application itself, which just
means you have to close the application and exit to DOS to get out of it.
Occasionally you'll open a more limited modal view like a dialog box. Think of
it as pulling the fences in a little closer to force you to focus (there's
that word again!) on the tasks immediately at hand.


The Focus Chain


Focused events begin their journey at the current modal view. Most of the time
that means the application itself. When a dialog box is open, all positional
(for example, mouse) events and focused events begin at the dialog box, and
the portions of the application outside the dialog box never see them. (This
is why clicking on things outside the dialog box doesn't get you anywhere--the
mouse events themselves are not being "seen" by anything outside the dialog
box!)
Mouse clicks are sent through the modal view's subviews in Z-order, one by
one, until the subview that contained the original positional event finds the
mouse event and handles it.
Focused events, on the other hand, must travel down the focus chain. The focus
chain starts at the current model view, and moves to the modal view's selected
subview. For example, if the application itself is the modal view, the
selected window (if there is one) is the first step on the focus chain. If the
selected subview has subviews itself, the focus chain continues at the
selected subview owned by the current subview. And so it goes, from selected
subview to selected subview, until the end of the line is reached, at a
selected view that has no subviews. A view with no subviews is called a
terminal view. That selected terminal view at the end of the focus chain is
what we informally call "the focus"--it's where keystrokes and commands end
up.
So if you've selected a TInputLine object in--and thus owned by--a window,
that TInputLine is the focus, and when you press a key, that keyboard event
passes down the focus chain to be handled by the TInputLine object. I need to
emphasize again that the focus chain has nothing to do with relationships in
the Turbo Vision object hierarchy. Object ownership is the key here.
Keyboard events typically travel all the way to the terminal view at the
focus. Commands, on the other hand, can be handled anywhere along the focus
chain. Certainly they can be handled by the terminal view--but they can just
as easily be handled by the window that owns the terminal view.


Broadcast Events


A broadcast event is well-named: It is sent to every single subview of the
current modal view, regardless of the focus chain. A broadcast event is called
for in one of two general circumstances: When you want all views to be able to
respond in their own way to some sort of command; or when you don't know where
or even whether a certain kind of view is active in the application and need
for it to take some sort of action. So you send the event to everybody--and
anybody who knows how to react to it will.
I used a broadcast event in HCALC.PAS to tell all the mortgage windows to call
their destructors and close up. You can have any reasonable number of mortgage
windows on the screen at once, so you might as well broadcast the close up
message to all of them. The path the event takes from user to action is even
more complex than the one we followed for creating a new mortgage window.
Here's the sequence:
The user pulls down the Mortgage menu and selects Close all. The positional
event represented by the mouse click is sent to the menu bar, which determines
which menu item was selected. Just as with the New menu item, a command is
generated in the menu bar, this one called cmCloseAll. The command is sent to
the current modal view, which in this case is the application itself,
THouseCalcApp. Keep in mind that we have not yet generated a broadcast event.
The cmCloseAll command is not a broadcast event and is handled no differently
from the way cmNewMortgage was handled earlier. The cmCloseAll event finds its
way to the HandleEvent method of THouseCalcApp. There, the cmCloseAll command
triggers a call to the THouseCalcApp. CloseAll method, which contains this
single statement:
 Who:=Message(Desktop,evBroadcast, cmCloseBC,@Self);
This statement generates a broadcast event that emanates from the desktop
object to all subviews owned by the desktop object. The Desktop parameter is a
pointer to the view from which the broadcast event is to be disseminated. The
evBroadcast parameter defines the event as a broadcast event. (The Message
function has other uses with other kinds of events.) The cmCloseBC is the
command defined as the broadcast event directing mortgage windows to call
their destructors and close up. The @Self parameter points to the object that
originated the broadcast event, in case any of the recipients of the event
need to know who's hollering. (In this case, they don't, so the @Self pointer
goes unused.)
At this point, the desktop begins passing the broadcast event to all of its
subviews. If you look in the TMortgageView.HandleEvent method, you'll see
there is a handler for cmCloseBC events. If a cmCloseBC event turns up, the
method handles it by calling the destructor Done on the mortgage window.
Subviews of the desktop that don't understand the cmCloseBC event get a shot
at handling it as well, but because cmCloseBC isn't present in their
HandleEvent methods, the broadcast is simply ignored.


Practice Makes Comprehension



More on Turbo Vision next column (and the column after that, and maybe a few
more as well...). This is not easy stuff to swallow, and if you don't quite
get it yet you shouldn't feel bad, especially if you haven't tried your own
hand at a TV application. Just reading about TV isn't enough. You have to go
in there and start generating error messages and system crashes. By all means
read and tinker with HCALC.PAS and the demo programs Borland provides. But at
some point you have to come up with your own little program and have at it. If
the aggravation bothers you, keep this in mind: Once your little training
program works, its usefulness is gone. You learn nothing from your successes
compared to what you learn from your mistakes.




























































January, 1992
GRAPHICS PROGRAMMING


3-D Animation


 This article contains the following executables: 3D_1.ARC


Michael Abrash


When first I started programming micros, more than 11 years ago now, there
wasn't much money in it, or visibility, or anything you could call a promising
career. Sometimes, it was a way to accomplish things that would never have
gotten done otherwise because minicomputer time cost too much; other times, it
paid the rent; mostly, though, it was just for fun. Given free computer time
for the first time in my life, I went wild, writing versions of all sorts of
software I had seen on mainframes, in arcades, wherever. It was a wonderful
way to learn how computers work: trial and error in an environment where
nobody minded the errors, with no meter ticking.
Many sorts of software demanded no particular skills other than a quick mind
and a willingness to experiment: Space Invaders, for instance, or full-screen
operating system shells. Others, such as compilers, required a good deal of
formal knowledge. Still others required not only knowledge but also more
horse-power than I had available. The latter I filed away on my ever-growing
wish list, and then forgot about for a while.
Three-dimensional animation was the most alluring of the areas I passed over
long ago. The information needed to do rotation, projection, rendering, and
the like was neither so well developed nor widely so available then as it is
now, although, in truth, it seemed more intimidating than it ultimately proved
to be. Even had I possessed the knowledge, though, it seems unlikely that I
could have coaxed satisfactory 3-D animation out of a 4-MHz Z80 system with
160x72 monochrome graphics. In those days, 3-D was pretty much limited to
outrageously expensive terminals attached to minis or mainframes.
Times change, and they seem to do so much faster in computer technology than
in other parts of the universe. A 486 is capable of decent 3-D animation,
owing to its integrated math coprocessor; not in the class of, say, an i860,
but pretty good nonetheless. A 386 is less satisfactory, though; the 387 is no
match for the 486's coprocessor, and most 386 systems lack coprocessors.
However, all is not lost; 32-bit registers and built-in integer multiply and
divide hardware make it possible to do some very interesting 3-D animation on
a 386 with fixed-point arithmetic. Actually, it's possible to do a surprising
amount of 3-D animation in real mode, and even on lesser 80x86 processors; in
fact, the code in this article will perform real-time 3-D animation
(admittedly very simple, but nonetheless real-time and 3-D) on a 286 without a
287, even though the code is written in real-mode C and uses floating-point
arithmetic. In short, the potential for 3-D animation on the 80x86 family may
be quite a bit greater than you think.
This month, we kick off an exploration of some of the sorts of 3-D animation
that can be performed on the 80x86 family. Mind you, I'm talking about
arbitrary 3-D animation, with all calculations and drawing performed
on-the-fly; generating frames ahead of time and playing them back is an
excellent technique, but I'm interested in seeing how far we can push purely
real-time animation. Granted, we're not going to make it to the level of
Terminator 2, but we should have some fun nonetheless. The initial columns may
seem pretty basic to those of you experienced with 3-D programming, and, at
the same time, 3-D neophytes will inevitably be distressed at the amount of
material I skip or skim over. That can't be helped, but at least there'll
working code, the references mentioned later, and some explanation; that
should be enough to start you on your way with 3-D.
Animating in three dimensions is a complex task, so this will be an ongoing
topic, with later columns building on previous ones; even this first 3-D
column will rely on polygon fill and page-flip code from earlier columns. From
time to time I'll skip to other topics, but I'll return to 3-D animation on a
regular basis, because, to my mind, it's one of the most exciting things that
can be done with a computer--and because, with today's hardware, it can be
done.


References on 3-D Drawing


There are several good sources for information about 3-D graphics. Foley and
van Dam's Computer Graphics: Principles and Practice (Second Edition,
Addison-Wesley, 1990) provides a lengthy discussion of the topic and a great
many references for further study. Unfortunately, this book is heavy going at
times; a more approachable discussion is provided in Principles of Interactive
Computer Graphics, by Newman and Sproull (McGraw-Hill, 1979). Although the
latter book lacks the last decade's worth of graphics developments, it
nonetheless provides a good overview of basic 3-D techniques, including many
of the approaches likely to work well in real time on a PC.
A source that you may or may not find useful is the series of six books on C
graphics by Lee Adams, as exemplified by High-Performance CAD Graphics in C
(Windcrest/Tab, 1986). (I don't know if all six books discuss 3-D graphics,
but the four I've seen do.) To be honest, this book has a number of problems,
including: relatively little theory and explanation; incomplete and sometimes
erroneous discussions of graphics hardware; use of nothing but global
variables, with cryptic names like "array3" and "B21;" and--well, you get the
idea. On the other hand, the book at least touches on a great many aspects of
3-D drawing, and there's a lot of C code to back that up. A number of people
have spoken warmly to me of Adams' books as their introduction to 3-D
graphics. I wouldn't recommend these books as your only 3-D references, but if
you're just starting out, you might want to look at one and see if it helps
you bridge the gap between the theory and implementation of 3-D graphics.


The 3-D Drawing Pipeline


Each 3-D object that we'll handle will be built out of polygons that represent
the surface of the object. Figure 1 shows the stages a polygon goes through
enroute to being drawn on the screen. (For the present, we'll avoid
complications such as clipping, lighting, and shading.) First, the polygon is
transformed from object space, the coordinate system the object is defined in,
to world space, the coordinate system of the 3-D universe. Transformation may
involve rotating, scaling, and moving the polygon. Fortunately, applying the
desired transformation to each of the polygon vertices in an object is
equivalent to transforming the polygon; in other words, transformation of a
polygon is fully defined by transformation of its vertices, so it is not
necessary to transform every point in a polygon, just the vertices. Likewise,
transformation of all the polygon vertices in an object fully transforms the
object.
Once the polygon is in world space, it must again be transformed, this time
into view space, the space defined such that the viewpoint is at (0,0,0),
looking down the Z axis, with the Y axis straight up and the X axis off to the
right. Once in view space, the polygon can be perspective-projected to the
screen, with the projected X and Y coordinates of the vertices finally being
used to draw the polygon.
That's really all there is to basic 3-D drawing: transformation from object
space to world space to view space to the screen. Next, we'll look at the
mechanics of transformation.
One note: I'll use a purely right-handed convention for coordinate systems.
Right-handed means that if you hold your right hand with your fingers curled
and the thumb sticking out, the thumb points along the Z axis and the fingers
point in the direction of rotation from the X axis to the Y axis, as shown in
Figure 2. Rotations about an axis are counter-clockwise, as viewed looking
down an axis toward the origin. The handedness of a coordinate system is just
a convention, and left-handed would do equally well; however, right-handed is
generally used for object and world space. Sometimes, the handedness is
flipped for view space, so that increasing Z equals increasing distance from
the viewer along the line of sight, but I have chosen not to do that here, to
avoid confusion. Therefore, Z decreases as distance along the line of sight
increases; a view space coordinate of (0,0,-1000) is directly ahead, twice as
far away as a coordinate of (0,0,-500).


Projection


Working backward from the final image, we want to take the vertices of a
polygon, as transformed into view space, and project them to 2-D coordinates
on the screen, which, for projection purposes, is assumed to be centered on
and perpendicular to the Z axis in view space, at some distance from the
screen. We're after visual realism, so we'll want to do a perspective
projection, in order that farther objects look smaller than nearer objects,
and so that the field of view will widen with distance. This is done by
scaling the X and Y coordinates of each point proportionately to the Z
distance of the point from the viewer, a simple matter of similar triangles,
as shown in Figure 3. It doesn't really matter how far down the Z axis the
screen is assumed to be; what matters is the ratio of the distance of the
screen from the viewpoint to the width of the screen. This ratio defines the
rate of divergence of the viewing pyramid--the full field of view--and is used
for performing all perspective projections. Once perspective projection has
been performed, all that remains before calling the polygon filler is to
convert the projected X and Y coordinates to integers, appropriately clipped
and adjusted as necessary to center the origin on the screen or otherwise map
the image into a window, if desired.


Translation


Translation means adding X, Y, and Z offsets to a coordinate in order to move
it linearly through space. Translation is as simple as it seems; it requires
nothing more than an addition for each axis. Translation is, for example, used
to move objects from object space, in which the center of the object is
typically the origin (0,0,0), into world space, where the object may be
located anywhere.


Rotation


Rotation is the process of circularly moving coordinates around the origin.
For our present purposes, it's necessary only to rotate objects about their
centers in object space, so as to turn them to the desired attitude before
translating them into world space.
Rotation of a point about an axis is accomplished by transforming it according
to the formulas shown in Figure 4. These formulas map into the more generally
useful matrix-multiplication forms also shown in Figure 4. Matrix
representation is more useful for two reasons: First, it is possible to
concatenate multiple rotations into a single matrix by multiplying them
together in the desired order; that single matrix can then be used to perform
the rotations more efficiently. Second, 3x3 rotation matrices can become the
upper-left-hand portions of 4x4 matrices that also perform translation (and
scaling as well, but we won't need scaling in the near future), as shown in
Figure 5. A 4x4 matrix of this sort utilizes homogeneous coordinates; that's a
topic way beyond this column, but, basically, homogeneous coordinates allow
you to handle both rotations and translations with 4x4 matrices, thereby
allowing the same code to work with either, and making it possible to
concatenate a long series of rotations and translations into a single matrix
that performs the same transformation as the sequence of rotations and
transformations.
There's much more to be said about transformations and the supporting matrix
math, but, in the interests of getting to working code this month, I'll leave
that to be discussed as the need arises.


A Simple 3-D Example



At this point, we know enough to be able to put together a simple 3-D
animation example. The example will do nothing more complicated than display a
single polygon as it sits in 3-D space, rotating around the Y axis. To make
things a little more interesting, we'll let the user move the polygon around
in space with the arrow keys, and with the "A" (away), and "T" (toward) keys.
The sample program requires two sorts of functionality: the ability to
transform and project the polygon from object space onto the screen (3-D
functionality), and the ability to draw the projected polygon (complete with
clipping) and handle the other details of animation (2-D functionality).
Happily (and not coincidentally), we put together a nice 2-D animation
framework back in the July, August, and September columns, so we don't much
have to worry about non-3-D details. Basically, we'll use mode X (320x240, 256
colors, as discussed in the above-mentioned columns), and we'll flip between
two display pages, drawing to one while the other is displayed. One new 2-D
element that we need is the ability to clip polygons; while we could avoid
this for the moment by restricting the range of motion of the polygon so that
it stays fully on the screen, certainly in the long run we'll want to be able
to handle partially or fully clipped polygons. Listing One (page 140) is the
low-level code for a mode X polygon filler that supports clipping. (The
high-level polygon fill code is mode independent, and is the same as in the
February and March columns, as noted further on.) The clipping is implemented
at the low level, by trimming the Y extent of the scan line list up front,
then clipping the X coordinates of each scan line in turn. This is not a
particularly fast approach to chipping--ideally, the polygon would be clipped
before it was scanned into a line list, avoiding potentially wasted scanning
and eliminating the line-by-line X clipping--but it's much simpler, and, as we
shall see, polygon filling performance is the least of our worries at the
moment.
The other 2-D element we need is some way to erase the polygon at its old
location before it's moved and redrawn. We'll do that by remembering the
bounding rectangle of the polygon each time it's drawn, then erasing by
clearing that area with a rectangle fill.
With the 2-D side of the picture well under control, we're ready to
concentrate on the good stuff. Listings Two through Five are the sample 3-D
animation program. Listing Two (page 140) provides matrix multiplication
functions in a straightforward fashion. Listing Three (page 140) transforms,
projects, and draws polygons. Listing Four (page 142) is the general header
file for the program, and Listing Five (page 143) is the main animation
program. Other modules required are: Listings One and Six from July (mode X
mode set, rectangle fill); Listing Six from September (page 146); Listing Four
from March (polygon edge scan); and the FillConvexPolygon( ) function from
Listing One a make file, will be made available as part of the listings from
this issue wherever DDJ code is posted.


Notes on the 3-D Animation Example


The sample program transforms the polygon's vertices from object space to
world space to view space to the screen, as described earlier. In this case,
world space and view space are congruent--we're looking right down the
negative Z axis of world space--so the transformation matrix from world to
view is the identity matrix; you might want to experiment with changing this
matrix to change the viewpoint. The sample program uses 4x4 homogeneous
coordinate matrices to perform transformations, as described above.
Floating-point arithmetic is used for all 3-D calculations. Setting the
translation from object space to world space is a simple matter of changing
the appropriate entry in the fourth column of the object-to-world
transformation matrix. Setting the rotation around the Y axis is almost as
simple, requiring only the setting of the four matrix entries that control the
Y rotation to the sines and cosines of the desired rotation. However,
rotations involving more than one axis require multiple rotation matrices, one
for each axis rotated around; those matrices are then concatenated together to
produce the object-to-world transformation. This area is trickier than it
might initially appear to be; more in the near future.
The maximum translation along the Z axis is limited to -40; this keeps the
polygon from extending past the viewpoint to positive Z coordinates. This
would wreak havoc with the projection and 2-D clipping, and would require 3-D
clipping, which is far more complicated than 2-D. We'll get to 3-D clipping
someday, but, for now, it's much simpler just to limit all vertices to
negative Z coordinates. The polygon does get mighty close to the viewpoint,
though; run the program and use the "T" key to move the polygon as close as
possible--the near vertex swinging past provides a striking sense of
perspective.
The performance of Listing Five is, perhaps, surprisingly good, clocking in at
16 frames per second on a 20-MHz 386 with a VGA of average speed and no 387,
although there is, of course, only one polygon being drawn, rather than the
hundreds or thousands we'd ultimately like. What's far more interesting is
where the execution time goes. Even though the program is working with only
one polygon, 73 percent of the time goes for transformation and projection. An
additional 7 percent is spent waiting to flip the screen. Only 20 percent of
the total time is spent in all other activity--and only 2 percent is spent
actually drawing polygons. Clearly, we'll want to tackle transformation and
projection first when we look to speed things up. (Note, however, that a math
coprocessor would considerably decrease the time taken by floating-point
calculations.)
In Listing Three, when the extent of the bounding rectangle is calculated for
later erasure purposes, that extent is clipped to the screen. This is due to
the lack of clipping in the rectangle fill code from July; the problem would
more appropriately be addressed by putting clipping into the fill code, but,
unfortunately, I lack the space to do that here.
Finally, observe the jaggies crawling along the edges of the polygon as it
rotates. This is temporal aliasing at its finest! It'll be a while before we
address antialiasing, real-time antialiasing being decidedly nontrivial, but
this should give you an idea of why antialiasing is so desirable.


Coming Up


Next time, we'll assign fronts and backs to polygons, and start drawing only
those that are facing the viewer. That will enable us to handle convex
polyhedrons, such as tetrahedrons and cubes. We'll also look at interactively
controllable rotation, and at more complex rotations than the simple rotation
around the Y axis that we did this month. If there's room, we'll try moving
the viewpoint, and perhaps we'll even use fixed-point arithmetic to speed
things up. If not, patience; we'll get to all that and more (shading, hidden
surfaces, maybe even a little rendering and antialiasing) soon.


_GRAPHIC PROGRAMMING COLUMN_
by Michael Abrash


[LISTING ONE]


; Draws all pixels in the list of horizontal lines passed in, in
; mode X, the VGA's undocumented 320x240 256-color mode. Clips to
; the rectangle specified by (ClipMinX,ClipMinY),(ClipMaxX,ClipMaxY).
; Draws to the page specified by CurrentPageBase.
; C near-callable as:
; void DrawHorizontalLineList(struct HLineList * HLineListPtr,
; int Color);
;
; All assembly code tested with TASM 2.0 and MASM 5.0

SCREEN_WIDTH equ 320
SCREEN_SEGMENT equ 0a000h
SC_INDEX equ 03c4h ;Sequence Controller Index
MAP_MASK equ 2 ;Map Mask register index in SC

HLine struc
XStart dw ? ;X coordinate of leftmost pixel in line
XEnd dw ? ;X coordinate of rightmost pixel in line
HLine ends

HLineList struc
Lngth dw ? ;# of horizontal lines
YStart dw ? ;Y coordinate of topmost line
HLinePtr dw ? ;pointer to list of horz lines
HLineList ends

Parms struc
 dw 2 dup(?) ;return address & pushed BP

HLineListPtr dw ? ;pointer to HLineList structure
Color dw ? ;color with which to fill
Parms ends
 .model small
 .data
 extrn _CurrentPageBase:word,_ClipMinX:word
 extrn _ClipMinY:word,_ClipMaxX:word,_ClipMaxY:word
; Plane masks for clipping left and right edges of rectangle.
LeftClipPlaneMask db 00fh,00eh,00ch,008h
RightClipPlaneMask db 001h,003h,007h,00fh
 .code
 align 2
ToFillDone:
 jmp FillDone
 public _DrawHorizontalLineList
 align 2
_DrawHorizontalLineList proc
 push bp ;preserve caller's stack frame
 mov bp,sp ;point to our stack frame
 push si ;preserve caller's register variables
 push di
 cld ;make string instructions inc pointers
 mov dx,SC_INDEX
 mov al,MAP_MASK
 out dx,al ;point SC Index to the Map Mask
 mov ax,SCREEN_SEGMENT
 mov es,ax ;point ES to display memory for REP STOS
 mov si,[bp+HLineListPtr] ;point to the line list
 mov bx,[si+HLinePtr] ;point to the XStart/XEnd descriptor
 ; for the first (top) horizontal line
 mov cx,[si+YStart] ;first scan line to draw
 mov si,[si+Lngth] ;# of scan lines to draw
 cmp si,0 ;are there any lines to draw?
 jle ToFillDone ;no, so we're done
 cmp cx,[_ClipMinY] ;clipped at top?
 jge MinYNotClipped ;no
 neg cx ;yes, discard however many lines are
 add cx,[_ClipMinY] ; clipped
 sub si,cx ;that many fewer lines to draw
 jle ToFillDone ;no lines left to draw
 shl cx,1 ;lines to skip*2
 shl cx,1 ;lines to skip*4
 add bx,cx ;advance through the line list
 mov cx,[_ClipMinY] ;start at the top clip line
MinYNotClipped:
 mov dx,si
 add dx,cx ;bottom row to draw + 1
 cmp dx,[_ClipMaxY] ;clipped at bottom?
 jle MaxYNotClipped ;no
 sub dx,[_ClipMaxY] ;# of lines to clip off the bottom
 sub si,dx ;# of lines left to draw
 jle ToFillDone ;all lines are clipped
MaxYNotClipped:
 mov ax,SCREEN_WIDTH/4 ;point to the start of the first
 mul cx ; scan line on which to draw
 add ax,[_CurrentPageBase] ;offset of first line
 mov dx,ax ;ES:DX points to first scan line to
 ; draw
 mov ah,byte ptr [bp+Color] ;color with which to fill

FillLoop:
 push bx ;remember line list location
 push dx ;remember offset of start of line
 push si ;remember # of lines to draw
 mov di,[bx+XStart] ;left edge of fill on this line
 cmp di,[_ClipMinX] ;clipped to left edge?
 jge MinXNotClipped ;no
 mov di,[_ClipMinX] ;yes, clip to the left edge
MinXNotClipped:
 mov si,di
 mov cx,[bx+XEnd] ;right edge of fill
 cmp cx,[_ClipMaxX] ;clipped to right edge?
 jl MaxXNotClipped ;no
 mov cx,[_ClipMaxX] ;yes, clip to the right edge
 dec cx
MaxXNotClipped:
 cmp cx,di
 jl LineFillDone ;skip if negative width
 shr di,1 ;X/4 = offset of first rect pixel in scan
 shr di,1 ; line
 add di,dx ;offset of first rect pixel in display mem
 mov dx,si ;XStart
 and si,0003h ;look up left edge plane mask
 mov bh,LeftClipPlaneMask[si] ; to clip & put in BH
 mov si,cx
 and si,0003h ;look up right edge plane
 mov bl,RightClipPlaneMask[si] ; mask to clip & put in BL
 and dx,not 011b ;calculate # of addresses across rect
 sub cx,dx
 shr cx,1
 shr cx,1 ;# of addresses across rectangle to fill - 1
 jnz MasksSet ;there's more than one byte to draw
 and bh,bl ;there's only one byte, so combine the left
 ; and right edge clip masks
MasksSet:
 mov dx,SC_INDEX+1 ;already points to the Map Mask reg
FillRowsLoop:
 mov al,bh ;put left-edge clip mask in AL
 out dx,al ;set the left-edge plane (clip) mask
 mov al,ah ;put color in AL
 stosb ;draw the left edge
 dec cx ;count off left edge byte
 js FillLoopBottom ;that's the only byte
 jz DoRightEdge ;there are only two bytes
 mov al,00fh ;middle addresses are drawn 4 pixels at a pop
 out dx,al ;set the middle pixel mask to no clip
 mov al,ah ;put color in AL
 rep stosb ;draw the middle addresses four pixels apiece
DoRightEdge:
 mov al,bl ;put right-edge clip mask in AL
 out dx,al ;set the right-edge plane (clip) mask
 mov al,ah ;put color in AL
 stosb ;draw the right edge
FillLoopBottom:
LineFillDone:
 pop si ;retrieve # of lines to draw
 pop dx ;retrieve offset of start of line
 pop bx ;retrieve line list location
 add dx,SCREEN_WIDTH/4 ;point to start of next line

 add bx,size HLine ;point to the next line descriptor
 dec si ;count down lines
 jnz FillLoop
FillDone:
 pop di ;restore caller's register variables
 pop si
 pop bp ;restore caller's stack frame
 ret
_DrawHorizontalLineList endp
 end






[LISTING TWO]


/* Matrix arithmetic functions.
 Tested with Borland C++ 2.0 in the small model */

/* Matrix multiplies Xform by SourceVec, and stores the result in
 DestVec. Multiplies a 4x4 matrix times a 4x1 matrix; the result
 is a 4x1 matrix, as follows:
 -- -- -- -- -- --
 4 4 
 4x4 X x = x 
 1 1 
 -- -- -- -- -- -- */
void XformVec(double Xform[4][4], double * SourceVec,
 double * DestVec)
{
 int i,j;

 for (i=0; i<4; i++) {
 DestVec[i] = 0;
 for (j=0; j<4; j++)
 DestVec[i] += Xform[i][j] * SourceVec[j];
 }
}

/* Matrix multiplies SourceXform1 by SourceXform2 and stores the
 result in DestXform. Multiplies a 4x4 matrix times a 4x4 matrix;
 the result is a 4x4 matrix, as follows:
 -- -- -- -- -- --
 
 4x4 X 4x4 = 4x4 
 
 -- -- -- -- -- -- */
void ConcatXforms(double SourceXform1[4][4], double SourceXform2[4][4],
 double DestXform[4][4])
{
 int i,j,k;

 for (i=0; i<4; i++) {
 for (j=0; j<4; j++) {
 DestXform[i][j] = 0;
 for (k=0; k<4; k++)

 DestXform[i][j] += SourceXform1[i][k] * SourceXform2[k][j];
 }
 }
}








[LISTING THREE]


/* Transforms convex polygon Poly (which has PolyLength vertices),
 performing the transformation according to Xform (which generally
 represents a transformation from object space through world space
 to view space), then projects the transformed polygon onto the
 screen and draws it in color Color. Also updates the extent of the
 rectangle (EraseRect) that's used to erase the screen later.
 Tested with Borland C++ 2.0 in the small model */
#include "polygon.h"

void XformAndProjectPoly(double Xform[4][4], struct Point3 * Poly,
 int PolyLength, int Color)
{
 int i;
 struct Point3 XformedPoly[MAX_POLY_LENGTH];
 struct Point ProjectedPoly[MAX_POLY_LENGTH];
 struct PointListHeader Polygon;

 /* Transform to view space, then project to the screen */
 for (i=0; i<PolyLength; i++) {
 /* Transform to view space */
 XformVec(Xform, (double *)&Poly[i], (double *)&XformedPoly[i]);
 /* Project the X & Y coordinates to the screen, rounding to the
 nearest integral coordinates. The Y coordinate is negated to
 flip from view space, where increasing Y is up, to screen
 space, where increasing Y is down. Add in half the screen
 width and height to center on the screen */
 ProjectedPoly[i].X = ((int) (XformedPoly[i].X/XformedPoly[i].Z *
 PROJECTION_RATIO*(SCREEN_WIDTH/2.0)+0.5))+SCREEN_WIDTH/2;
 ProjectedPoly[i].Y = ((int) (XformedPoly[i].Y/XformedPoly[i].Z *
 -1.0 * PROJECTION_RATIO * (SCREEN_WIDTH / 2.0) + 0.5)) +
 SCREEN_HEIGHT/2;
 /* Appropriately adjust the extent of the rectangle used to
 erase this page later */
 if (ProjectedPoly[i].X > EraseRect[NonDisplayedPage].Right)
 if (ProjectedPoly[i].X < SCREEN_WIDTH)
 EraseRect[NonDisplayedPage].Right = ProjectedPoly[i].X;
 else EraseRect[NonDisplayedPage].Right = SCREEN_WIDTH;
 if (ProjectedPoly[i].Y > EraseRect[NonDisplayedPage].Bottom)
 if (ProjectedPoly[i].Y < SCREEN_HEIGHT)
 EraseRect[NonDisplayedPage].Bottom = ProjectedPoly[i].Y;
 else EraseRect[NonDisplayedPage].Bottom = SCREEN_HEIGHT;
 if (ProjectedPoly[i].X < EraseRect[NonDisplayedPage].Left)
 if (ProjectedPoly[i].X > 0)
 EraseRect[NonDisplayedPage].Left = ProjectedPoly[i].X;

 else EraseRect[NonDisplayedPage].Left = 0;
 if (ProjectedPoly[i].Y < EraseRect[NonDisplayedPage].Top)
 if (ProjectedPoly[i].Y > 0)
 EraseRect[NonDisplayedPage].Top = ProjectedPoly[i].Y;
 else EraseRect[NonDisplayedPage].Top = 0;
 }
 /* Draw the polygon */
 DRAW_POLYGON(ProjectedPoly, PolyLength, Color, 0, 0);
}






[LISTING FOUR]

/* POLYGON.H: Header file for polygon-filling code, also includes
 a number of useful items for 3D animation. */

#define MAX_POLY_LENGTH 4 /* four vertices is the max per poly */
#define SCREEN_WIDTH 320
#define SCREEN_HEIGHT 240
#define PAGE0_START_OFFSET 0
#define PAGE1_START_OFFSET (((long)SCREEN_HEIGHT*SCREEN_WIDTH)/4)
/* Ratio: distance from viewpoint to projection plane / width of
 projection plane. Defines the width of the field of view. Lower
 absolute values = wider fields of view; higher values = narrower */
#define PROJECTION_RATIO -2.0 /* negative because visible Z
 coordinates are negative */
/* Draws the polygon described by the point list PointList in color
 Color with all vertices offset by (X,Y) */
#define DRAW_POLYGON(PointList,NumPoints,Color,X,Y) \
 Polygon.Length = NumPoints; \
 Polygon.PointPtr = PointList; \
 FillConvexPolygon(&Polygon, Color, X, Y);

/* Describes a single 2D point */
struct Point {
 int X; /* X coordinate */
 int Y; /* Y coordinate */
};
/* Describes a single 3D point in homogeneous coordinates */
struct Point3 {
 double X; /* X coordinate */
 double Y; /* Y coordinate */
 double Z; /* Z coordinate */
 double W;
};
/* Describes a series of points (used to store a list of vertices that
 describe a polygon; each vertex is assumed to connect to the two
 adjacent vertices, and the last vertex is assumed to connect to the
 first) */
struct PointListHeader {
 int Length; /* # of points */
 struct Point * PointPtr; /* pointer to list of points */
};

/* Describes the beginning and ending X coordinates of a single

 horizontal line */
struct HLine {
 int XStart; /* X coordinate of leftmost pixel in line */
 int XEnd; /* X coordinate of rightmost pixel in line */
};

/* Describes a Length-long series of horizontal lines, all assumed to
 be on contiguous scan lines starting at YStart and proceeding
 downward (used to describe a scan-converted polygon to the
 low-level hardware-dependent drawing code) */
struct HLineList {
 int Length; /* # of horizontal lines */
 int YStart; /* Y coordinate of topmost line */
 struct HLine * HLinePtr; /* pointer to list of horz lines */
};
struct Rect { int Left, Top, Right, Bottom; };

extern void XformVec(double Xform[4][4], double * SourceVec,
 double * DestVec);
extern void ConcatXforms(double SourceXform1[4][4],
 double SourceXform2[4][4], double DestXform[4][4]);
extern void XformAndProjectPoly(double Xform[4][4],
 struct Point3 * Poly, int PolyLength, int Color);
extern int FillConvexPolygon(struct PointListHeader *, int, int, int);
extern void Set320x240Mode(void);
extern void ShowPage(unsigned int StartOffset);
extern void FillRectangleX(int StartX, int StartY, int EndX,
 int EndY, unsigned int PageBase, int Color);
extern int DisplayedPage, NonDisplayedPage;
extern struct Rect EraseRect[];






[LISTING FIVE]

/* Simple 3D drawing program to view a polygon as it rotates in
 mode X. View space is congruent with world space, with the
 viewpoint fixed at the origin (0,0,0) of world space, looking in
 the direction of increasingly negative Z. A right-handed
 coordinate system is used throughout.
 Tested with Borland C++ 2.0 in the small model */
#include <conio.h>
#include <stdio.h>
#include <dos.h>
#include <math.h>
#include "polygon.h"
void main(void);

/* Base offset of page to which to draw */
unsigned int CurrentPageBase = 0;
/* Clip rectangle; clips to the screen */
int ClipMinX=0, ClipMinY=0;
int ClipMaxX=SCREEN_WIDTH, ClipMaxY=SCREEN_HEIGHT;
/* Rectangle specifying extent to be erased in each page */
struct Rect EraseRect[2] = { {0, 0, SCREEN_WIDTH, SCREEN_HEIGHT},
 {0, 0, SCREEN_WIDTH, SCREEN_HEIGHT} };

/* Transformation from polygon's object space to world space.
 Initially set up to perform no rotation and to move the polygon
 into world space -140 units away from the origin down the Z axis.
 Given the viewing point, -140 down the Z axis means 140 units away
 straight ahead in the direction of view. The program dynamically
 changes the rotation and translation */
static double PolyWorldXform[4][4] = {
 {1.0, 0.0, 0.0, 0.0},
 {0.0, 1.0, 0.0, 0.0},
 {0.0, 0.0, 1.0, -140.0},
 {0.0, 0.0, 0.0, 1.0} };
/* Transformation from world space into view space. In this program,
 the view point is fixed at the origin of world space, looking down
 the Z axis in the direction of increasingly negative Z, so view
 space is identical to world space; this is the identity matrix */
static double WorldViewXform[4][4] = {
 {1.0, 0.0, 0.0, 0.0},
 {0.0, 1.0, 0.0, 0.0},
 {0.0, 0.0, 1.0, 0.0},
 {0.0, 0.0, 0.0, 1.0}
};
static unsigned int PageStartOffsets[2] =
 {PAGE0_START_OFFSET,PAGE1_START_OFFSET};
int DisplayedPage, NonDisplayedPage;

void main() {
 int Done = 0;
 double WorkingXform[4][4];
 static struct Point3 TestPoly[] =
 {{-30,-15,0,1},{0,15,0,1},{10,-5,0,1}};
#define TEST_POLY_LENGTH (sizeof(TestPoly)/sizeof(struct Point3))
 double Rotation = M_PI / 60.0; /* initial rotation = 3 degrees */
 union REGS regset;

 Set320x240Mode();
 ShowPage(PageStartOffsets[DisplayedPage = 0]);
 /* Keep rotating the polygon, drawing it to the undisplayed page,
 and flipping the page to show it */
 do {
 CurrentPageBase = /* select other page for drawing to */
 PageStartOffsets[NonDisplayedPage = DisplayedPage ^ 1];
 /* Modify the object space to world space transformation matrix
 for the current rotation around the Y axis */
 PolyWorldXform[0][0] = PolyWorldXform[2][2] = cos(Rotation);
 PolyWorldXform[2][0] = -(PolyWorldXform[0][2] = sin(Rotation));
 /* Concatenate the object-to-world and world-to-view
 transformations to make a transformation matrix that will
 convert vertices from object space to view space in a single
 operation */
 ConcatXforms(WorldViewXform, PolyWorldXform, WorkingXform);
 /* Clear the portion of the non-displayed page that was drawn
 to last time, then reset the erase extent */
 FillRectangleX(EraseRect[NonDisplayedPage].Left,
 EraseRect[NonDisplayedPage].Top,
 EraseRect[NonDisplayedPage].Right,
 EraseRect[NonDisplayedPage].Bottom, CurrentPageBase, 0);
 EraseRect[NonDisplayedPage].Left =
 EraseRect[NonDisplayedPage].Top = 0x7FFF;
 EraseRect[NonDisplayedPage].Right =

 EraseRect[NonDisplayedPage].Bottom = 0;
 /* Transform the polygon, project it on the screen, draw it */
 XformAndProjectPoly(WorkingXform, TestPoly, TEST_POLY_LENGTH,9);
 /* Flip to display the page into which we just drew */
 ShowPage(PageStartOffsets[DisplayedPage = NonDisplayedPage]);
 /* Rotate 6 degrees farther around the Y axis */
 if ((Rotation += (M_PI/30.0)) >= (M_PI*2)) Rotation -= M_PI*2;
 if (kbhit()) {
 switch (getch()) {
 case 0x1B: /* Esc to exit */
 Done = 1; break;
 case 'A': case 'a': /* away (-Z) */
 PolyWorldXform[2][3] -= 3.0; break;
 case 'T': /* towards (+Z). Don't allow to get too */
 case 't': /* close, so Z clipping isn't needed */
 if (PolyWorldXform[2][3] < -40.0)
 PolyWorldXform[2][3] += 3.0; break;
 case 0: /* extended code */
 switch (getch()) {
 case 0x4B: /* left (-X) */
 PolyWorldXform[0][3] -= 3.0; break;
 case 0x4D: /* right (+X) */
 PolyWorldXform[0][3] += 3.0; break;
 case 0x48: /* up (+Y) */
 PolyWorldXform[1][3] += 3.0; break;
 case 0x50: /* down (-Y) */
 PolyWorldXform[1][3] -= 3.0; break;
 default:
 break;
 }
 break;
 default: /* any other key to pause */
 getch(); break;
 }
 }
 } while (!Done);
 /* Return to text mode and exit */
 regset.x.ax = 0x0003; /* AL = 3 selects 80x25 text mode */
 int86(0x10, &regset, &regset);
}







[LISTING SIX]

LISTING SIX IS CURRENTLY UNAVAILABLE












January, 1992
PROGRAMMER'S BOOKSHELF


How to Solve It




Ray Duncan


Every professional software developer is sure to have one of the great books
on algorithms close at hand, such as Knuth's
mother-of-all-programming-references Art of Computer Programming or
Sedgewick's kinder, gentler (if considerably less encyclopedic) Algorithms,
Second Edition. This is as it should be--books are tools too, and knowing
where to look up a technique when it is needed is a perfectly honorable
alternative to brute-force memorization and application of rules and recipes.
Besides, our field, though young, is already vast, and has progressed far
beyond the point where it's feasible for the average practitioner to master
every one of its many subdivisions.
Unfortunately, the algorithm cook-book approach short-circuits one of
programming's most intense emotional rewards and certainly its most profound
learning experience: the analysis of an unfamiliar problem, the search for a
solution, the realization of the solution in code, and finally the
satisfaction of seeing the solution in use by others. Polya, a Stanford
professor of mathematics who died in 1985, recognized the enormous
significance of finding one's own answers. He set out to teach a
problem-solving method, rather than a set of solutions--a sort of systematic
approach to invention--that would heighten the interest of his students and
serve their needs well in any situation.
How to Solve It has a high name-recognition factor. I daresay almost every
serious programmer has at least heard of the book, and most have a general
idea of its theme. And I venture to say hardly any--other than those who made
their way into the computer field via a major in mathematics--have ever
actually held a copy in their hands, let alone read it. I believe this to be
true because none of the scores of references to the book I've heard or seen
over 20 years gave me any feeling for the book that matched the book's true
personality. And what is that personality?
First and foremost, you must remember that Polya was a mathematician, and was
not concerned with computers at all. As a matter of fact, when the book was
originally written in 1944, general-purpose computers as we know them today
didn't even exist. So How to Solve It is written from the point of view of a
mathematician teaching other mathematicians how to teach mathematics to
students, and the style of writing is an odd admixture--sometimes direct,
fluent, and a tad ironic; sometimes stuffy in a typically mathematical sort of
way, with carefully numbered lists; sometimes Socratic, with imaginary dialogs
between teacher and student; sometimes pedantic; and sometimes exasperatingly
quaint. Nevertheless, when all is said and done, the book lives up to its
reputation.
Polya's "method" has four main stages or elements, which may be summarized as
follows:
1. Understand the problem. What is the unknown? What is the data? What is the
condition? Draw a figure. Introduce suitable notation.
2. Devise a plan. Find the connection between the data and the unknown. Have
you seen the problem before? Do you know a related problem which has already
been solved? Can you solve part of the problem? Can you restate the problem?
3. Carry out your plan. Check each step. Can you see clearly the step is
correct? Can you prove it is correct?
4. Examine the solution obtained. Can you check the result? Can you check the
argument? Can you derive the result differently? Can you see it at a glance?
Can you use the result, or the method, for some other problem?
Having set forth this list in the very first few pages of his book, Polya
proceeds to beat on the list repeatedly from every possible angle, using
various examples, analogies, anecdotes, and even startlingly modern-sounding
quotations from ancient philosophers. In short, he states and restates his
thesis until he gets his points across, just as he encourages the reader to
state and restate a problem until it has been massaged into a form more easily
understood. His thought experiments are all mathematical, of course, but the
mathematics are not complex and can be followed easily by anyone with a dim
recollection of algebra and solid geometry; the most sophisticated mathematics
in the book involve a simple differential.
Polya opts out, however, when faced with one of the topics that most intrigues
me: the phenomenon of subconscious work. Maybe once every year or two I run
into a program bug that is, at first, totally baffling. You know the kind of
bug I'm talking about--when a program's behavior simply can't be reconciled
with its source code, despite the most determined disk checking, execution
profiling, and runtime instrumentation. The first few times I ran into such
problems, I harbored the usual dark suspicions of hardware errors and even, I
must admit, entertained some paranoid (if temporary) fantasies about the
integrity of physical reality. But I've now learned the best approach to such
bugs is to scrutinize them with all my energy for a couple of days, learn
everything I can about the symptoms, and then put the problem completely aside
and go on to something else. In a few days, or at most a few weeks, an
explanation will suddenly pop into my mind--and the explanation is nearly
always right.
In such cases, the solution is welcome, but the nagging question remains: Who
found the solution? Polya says, "The fact is, that a problem, after prolonged
absence, may return into consciousness essentially clarified, much nearer to
its solution than it was when it first dropped out of consciousness. Who
clarified it, who brought it nearer to the solution? Obviously, oneself,
working at it subconsciously." This is, however, not a true answer but an
appeal to a higher (or lower) authority. It's disconcerting to realize there's
a more powerful problem-solving engine beneath the surface which is not under
our direct control. Could our conscious selves, our sense of time, and our
sensitive egos be little more than standing waves on a far deeper current,
ephemeral side-effects of a complex but secretive undermind going about its
unknown business? Polya remains adamantly practical and refuses to speculate,
noting only that:
Only such problems come back improved whose solution we passionately desire,
or for which we have worked with great tension; conscious effort and tension
seem to be necessary to set the subconscious work going. At any rate, it would
be too easy if it were not so; we could solve difficult problems just by
sleeping and waiting for a bright idea. Past ages regarded a sudden good idea
as an inspiration, a gift of the gods. You must deserve such a gift by work,
or at least by a fervent wish.




































January, 1992
OF INTEREST





New from Inmark Development is zApp, an applications framework for developing
Windows applications by encapsulating the Windows API into easy-to-use C++
objects. In addition, zApp provides complete compatibility with existing
C-based Windows applications, a hierarchical dynamic message handling
facility, high-level printing support, simplified dialog box creation, a
high-level forms package, and seamless integration with the Windows API. Also
included is an optimized memory-allocation system which affords faster
allocation and deallocation than with the standard Windows management routines
and the ability to allocate many more objects while minimizing the impact on
Windows' system resources.
zApp costs $195--$295 with source code--and is compatible with both Zortech
and Borland's C++ compilers. Reader service no. 20.
Inmark Development Corp. 2065 Landings Drive Mountain View, CA 94043
415-691-9000
Base Technology has released Liana, a new C-like, object-oriented language
intended for developing Windows applications. Liana is based on C and C++ and
includes a suite of software tools and an easy-to-learn, object-oriented class
library. The interpretive runtime environment supports dynamic typing,
automatic memory management, and string manipulation. Default variable
initialization, strings, Boolean values, and dynamic arrays are all defined as
part of the language and supported by the runtime environment. Runtime errors
are reported, making note of the source filename and line number. The class
library contains over 90 classes and supports accessing DLLs and communication
via DDE. The details of the Windows API are hidden by the Liana interpreter.
Liana supports fast development of dialogs and menus and simplifies
development of underlying application logic. You develop applications as
usual, creating source files using a text editor, compiling, linking, and
running the .EXE file.
Tools are provided for foreign language localization, building libraries,
displaying .EXE files, generating class hierarchy maps, screen capture, and
bitmap manipulation.
The price is $495; an evaluation kit costs $49. Real, standard, and enhanced
modes are supported. Reader service no. 21.
Base Technology 1543 Pine Street Boulder, CO 80302 303-440-4558
C-scape, Liant Software Corp.'s library of C routines for creating user
interfaces, now supports X-Window. The new release, C-scape 3.2C UNIX, enables
development of user interfaces for graphical applications under the X-Window
system, as well as recompilation of existing C-scape applications under X.
C-scape's graphical features include menu systems, borders, and pop-up
windows. Text editing functions include wordwrap, search and replace, and
block commands.
C-scape 3.2C UNIX retails for $1499. VMS, DOS, OS/2, and QNX versions are also
available. Each version comes with a set of example programs and source code.
Reader service no. 22.
Liant Software 959 Concord St. Framingham, MA 01701 508-872-8700
Version 4.0 of GFA-BASIC for Windows from GFA Software Technologies now
includes a visual dialog box editor and an icon editor. GFA-BASIC is also
available for DOS, enabling porting of any program developed for Windows 3.0
to DOS, complete with a Windows-like GUI. The new release lets you build a
dialog box by pointing and clicking: You simply select a box option from a
pull-down menu, then visually position the check box, radio button, text box,
scroll bar, or other option selected using the mouse. Once the screen layout
is complete, GFA-BASIC automatically generates the program code.
The new icon editor allows you to visually create and edit icons: An existing
icon can be displayed as a bitmap image and graphics added or deleted and
colors changed by just pointing and clicking the mouse. New icons can be
developed by either importing a bitmap image or drawing an image using the
bitmap display, color palette, and mouse.
GFA-BASIC, including example code for developing an executive information
system, costs $295. Reader service no. 23.
GFA Software Technologies Inc. 27 Congress Street Salem, MA 01970 508-744-0201
New from Chesapeake Computing is the Process Server, a development tool with
its own macro-scripted language that uses event-driven, distributed parallel
processing on LANs. The Process Server comes with over 200 ready-to-run
commands and a relational database for report generation, audit trails, and
network statistical analysis. Custom programs can be scripted for routine and
spawned tasks and application servers can be built. All application servers
built with the Process Server are compatible when run on the same or multiple
machines.
The Process Server simultaneously controls any number of tasks and machines. A
screen-mirroring feature provides a view of and remote control over all the
screens of machines in the server mode and the number of machines can be
increased or decreased. Tasks can be segmented, and the segments distributed
among machines. Transaction tracking ensures that if any machine fails,
another one takes over. The Process Server starts tasks on preassigned days
and times and completes all segments in a prescribed, event-driven order. A
new "pinging" technology analyzes network activity, searches for tasks
assigned to Process Server programs and levels the workload, overcoming the
slowness, bottlenecks, and contention of large LAN applications.
Also available is TPS/FAXServer, an add-on module that enables multiple
network users to send and receive faxes online. It transmits volumes of
documents, including graphics, to one or many individuals at preassigned dates
and times. Supports ASCII, PCX, and Epson FX-85 formats.
The Process Server with a five-server license retails for $1995. TPS/
FAXServer costs $199. Reader service no. 24.
Chesapeake Computing Inc. 8401 Corporate Drive, Suite 560 Landover, MD 20785
800-899-2255 or 301-459-7376
Outrider Systems has released ButtonTool and EditTool, new developer aids for
Visual Basic. ButtonTool lets you place graphics symbols, captions, and
VCR-like symbols (fast-forward, stop, reverse, and so on) on buttons. Eighteen
predefined symbols are included, and you can specify bitmaps, icons, or
metafiles as the button's foreground. ButtonTool affords the ability to create
shadows, 3-D effects, and the effect generated whenever the button is
depressed; it can be used as an enhanced picture field or label.
EditTool's most significant feature is that it allows you to perform input
field masking, an example of which is limiting characters to alphabetic or
numeric. You can specify input fields for date, currency, social security
number, phone number, and so on. EditTool also has the ability to control
shadows, borders, and colors and provides a spin control that facilitates
cycling through a specific range of programmer-definable values.
ButtonTool and EditTool are priced at $49.95 each, or $89.95 for both. Both
load into the Visual Basic tool palette and contain runtime versions that can
be distributed without royalties. Reader service no. 25.
Outrider Systems Inc. 3701 Kirby Drive, Suite 1196 Houston, TX 77098
713-521-0486
Imaging for DataEase, from Solana Software International, is a new development
tool that allows you to combine images and documents in one database
environment. Imaging for DataEase manages and outputs full-color or
black-and-white images and documents as integral parts of DataEase forms and
reports. Images and documents are input through scanners, video sources, and
preexisting files, allowing for complete image and text database management
and providing fast, high-resolution output to screen or printer.
The package comes with a scanning module that allows the user to set
resolution, image size, color balance, contrast, brightness, and orientation.
Video input is possible using an optional video capture board and any video
source, including video cameras, VCRs, or cable television. NTSC, PAL, and
SECAM are supported.
Imaging for DataEase costs $995 for a single user; $2495 for a 10-station LAN.
For applications requiring high-quality halftone or graphics output and high
printing speeds, an optional XLI Laser-Pix printer controller board is
available for $995. Standard VGA and Super-VGA monitors and boards are also
supported. Reader service no. 26.
Solana Software International 3751 Sixth Ave. San Diego, CA 92103-9804
800-748-5596
Serius has released the AppEvent Object, which provides for communication
between applications built with the Serius Programmer and Developer desktop
programming environments and other Macintosh applications that support Apple
Events. Using the AppEvent Object allows you to incorporate support for Apple
Events--integrating tools that use standardized definitions of events and data
types--without understanding their implementation at the code level.
The AppEvent Object can send and receive Core Apple Events, the most common
events and data types in Mac apps. It allows you to define your own Apple
Events and subscribe to Events defined by others. Data types supported include
text; number values; sound; color, gray-scale, and black-and-white pictures;
lists of text or values, documents; tables; objects; and commands. The
AppEvent Object--including the Serius Object Library, with 45 prefabricated
components--costs $49.95. Reader service no. 27.
Serius Corp. 1981 East 4800 South Salt Lake City, UT 84117 801-272-7788
REXX/Windows, an interpreter of the REXX language, is available from Kilowatt
Softwar . The core of the product, the Portable/REXX interpreter, is
compatible with IBM SAA Level 2 specifications of all REXX statements and
built-in functions for the Windows environment. Additional built-in functions
enable REXX programs to define and control GUI objects. REXX/Windows can be
connected to other concurrent applications using Windows' DDE protocols, and
"hot" DDE links can be defined so that a REXX subprogram automatically
processes a DDE data change message. Interfaces are included for future
connectivity with Windows 3.1 object linking and embedding capabilities.
Also included in REXX/Windows are RxAid/Windows and RxText/Windows, which let
you access online help and the source text of example programs, respectively.
REXX/Windows supports three development techniques: traditional source program
development, visual program generation, and live GUI development. Although it
is an interpreter, REXX/Windows can also be used as a compiler.
The introductory price of REXX/Windows (which also includes Portable/ REXX for
MS-DOS) is $109. Reader service no. 28.
Kilowatt Software 1945 Washington St. #410 San Francisco, CA 94109
800-848-9474 or 415-346-7353
The standard multiprocessing release of the UNIX System V, Release 4 operating
system can now be obtained from UNIX System Laboratories. SVR4 MP source code
is initially available for the Intel 386/486 architecture. The initial source
code is based on a Compaq Systempro as a reference platform. Wyse Series 9000i
VME and 7000 EISA and Corollary 486/smp and C-bus II architectures will
follow.
SVR4 MP affords compatibility with the UNIX SVR4 ABI and SVR4.0 device
drivers; an MP kernel debugger; a multi-threaded SVR4 kernel; support for
symmetric and asymmetric multiprocessing architectures; and support for VGA
and Super-VGA drivers, mouse drivers, and the 80387 coprocessor.
SVR4 MP is designed to scale up to 16 microprocessors and will be sold in
configurations of up to five CPUs in the Unisys U6000/65, up to eight CPUs in
the NCR 3550, up to nine CPUs with the Wyse Series 9000i, and up to six CPUs
with the Everex STEP MP. Reader service no. 29.
UNIX System Laboratories 190 River Road Summit, NJ 07901-8004 800-828-UNIX
Premia has announced Codewright, a programmer's editor for Windows. Designed
from the ground up to operate in Windows, Codewright is highly configurable
and extensible.
Configurability is provided by an .INI file that allows you to adjust settings
and preferences each time you run Codewright. A mechanism for assigning
keystrokes to editing functions lets you configure Codewright to operate like
an editor with which you're already familiar.
Codewright is extensible by an API of over 500 functions. Procedures can be
written in any language that supports the Windows DLL, and C source code is
provided for many of Codewright's higher-level functions. In addition to
standard editing features, Codewright lets you compile, link, and debug your
program without leaving the editor and includes an interface to version
control systems. Also included are selective text display and the ChromaCode
technique, with which designated parts of the file can be rendered in color.
Codewright's introductory price is $199. The regular price will be $249.
Reader service no. 30.
Premia Corp. 1075 NW Murray Blvd., Suite 268 Portland, OR 97229 503-647-9902
or 800-547-9902








January, 1992
SWAINE'S FLAMES


Name that Millionaire




Michael Swaine


Here's a little quiz to lighten the load in one of the least-loved months of
the year.
1. His company is giving away the SDK for its main product royalty-free,
license-free.
2. After nine years, he was ousted from the company that he cofounded.
3. He has announced that he expects to take his company public within about a
year.
4. He says that the processor wars of the '80s are over and that the '90s will
be the decade of operating system warfare.
5. His company is being sued by Texas Instruments for patent infringement.
6. In order to concentrate on networking and cross-platform connectivity
products, his company recently sold off its media tools.
7. He let it slip recently that his company may begin bundling application
software with its machines again.
8. He expects to put samples of his company's new processor in the hands of
strategic partners IBM and Apple by year's end.
9. His company will shortly be entering the database market on the Windows and
Macintosh platforms.
10. His company's product won't be viable until it has been scaled down, both
in cost and in size.
A. Bill Atkinson, General Magic
B. Rod Canion, Compaq
C. Les Crudele, Motorola
D. Mike Dell, Dell Computers
E. Bill Gates, Microsoft
F. David House, Intel
G. Steve Jobs, Next
H. Reese Jones, Farallon Computing
I. John Sculley, Apple Computer
J. Dave Winer, Userland Software


Answers:


1. J; the Frontier SDK includes sample programs, source code, and documents to
aid developers in adding IAC (interapplication communication) "wires" to their
applications, so they can communicate with other IAC-aware applications and
drive Userland's Frontier system scripting product. 2. B; last year the
company's sales dropped, and losses were significant; the board apparently
freaked. 3. G; most observers doubt that it will happen; at roughly the same
time, the company laid off five percent of its workforce. 4. F; Intel won the
former and will profit from the latter no matter how it goes, in his view. 5.
D; Compaq is also suing over Dell's advertising comparing the two companies'
machines. 6. H; Jones parlayed a trick involving unused bandwidth on home and
office phone wire into a major network products company on the Mac and is
developing some credibility in the PC market. 7. I; just one product, and that
uncertain: AccessPC, a PC emulator. 8. C; the PowerPC RISC chip is expected to
be used in computers from $1000 high-volume machines to 500-SPECmark
workstations. 9. E; the Windows version is apparently close; not so the Mac
product. 10. A; the personal communicator (cellular phone, modem, and network
link) is intended to be hand-size, not tablet-size as the prototype is, and is
intended to be affordable to a wide audience, which the current cost of parts
won't permit.






















February, 1992
February, 1992
EDITORIAL


A Room with a View




Jonathan Erickson


Rumors that we've up-and-moved, trading the salt flats of Redwood City for a
bird's-eye view of the San Mateo foothills, are true. In the process, we've
taken on a new address and phone number; effective immediately, you can
contact us at:
Dr. Dobb's Journal Phone: 415-358-9500
411 Borel Ave. FAX: 415-358-9749
San Mateo, CA 94402-3522 M&T Online: 415-358-8857
I'd also like to take a minute to refute the scurrilous scuttlebutt that the
sole motivation for the change was simply to get me to unclutter my office.
Neatness, I'm the first to say, counts.


Business or Pleasure?


Last month, I reported on the Baby Bells' flings at getting into the online
information service business. The ink was hardly dry before Southwestern Bell,
one of the seven Regional Bell Operating Companies, began trying to wring more
money from Missouri BBS operators.
At issue is that the RBOCs get to define your use of a public utility. If you
say your use of the phone line is personal and the RBOC says it's business,
it's business--and you'll start paying business rates that are three to four
times higher than residential charges. To this end, Southwestern Bell has
determined that every BBS is a business--even if it's a free service operated
by an individual as a hobby--and subject to the higher rates. The paranoid
among us might say that by upping charges for services the RBOCs control, and
on which BBSs depend, the Baby Bells can effectively force perceived
competitors (BBSs in this case) to pull their own plug.
I've no quibble over paying for the amount of electricity, water, or
phone-access time I use (although I may not like the rates). But when I pay
for it, it's mine to use as I want, including running a residential BBS on a
residential phone line. (My use must be within reason, of course. A while
back, a lady in my neighborhood was pumping thousands of gallons of water
daily onto her stamp-size lawn, destroying sidewalks, streets, and the
foundations of nearby houses. In this case, the utility company was correct in
putting restrictors on her input lines--her right to unlimited water use ended
at her neighbor's crumbling home. But then, she was a retired school teacher
and we all know how a lifetime of corralling other people's kids can alter
one's worldview.)
How do RBOCs know whether it's voice or data that's traveling over your phone
lines? They don't. However, if you have multiple residential phone lines (say,
four or five) at a single address, they'll assume you're using them for BBS
purposes. Or if you advertise that you're running a BBS (free or otherwise),
their fingers will do the walking right to your wallet. In any case, you'll
have to cough up special business deposits and start paying business rates.
There's another insect in the ointment, one that Southwestern Bell calls
"unrelated." In the past, businesses that transmitted data could pay
higher-than-business rates for "Information Terminal Services" that provided
access to special "clean" data transmission lines. But as standard lines have
been upgraded and the need to transmit data has increased, businesses have
shunned these services, connecting faxes and modems to the less-expensive,
business-rate phone lines. To make this legitimate, at about the same time the
courts gave RBOCs permission to launch online services, Southwestern Bell
petitioned the Missouri Public Services Commission to in effect phase out
terminal services. It isn't the sentiment behind this petition that's
bothersome, it's the wording. The petition appears to state that any use of
the phone lines for data transmission purposes is subject to business rates.
Chew on that the next time you're at home and dial up CompuServe or send a
fax.
I mentioned last month that CompuServe and other services seem blase about the
possibility of RBOC competition. If their customers have to start paying
higher access rates because residential lines are carrying data, I guarantee
you that online connect times will drop.
I'll continue to follow this tangled ball of twine as it unwinds, but I'd also
recommend you keep in touch with the folks at Boardwatch, a BBS-focused
magazine out of Loveland, Colorado that specializes in these issues.
In these emerging rate battles, BBSs have been chosen as the easiest of all
possible targets because there are BBSs that charge fees and function as a
business. But the timing of the BBS attacks, tariff revision request, and RBOC
online service announcements garbles the message, generating ominous overtones
for anyone who uses the phone line for data transmission.
For the pious, there is a loophole. One type of business office that has
residential status is a pastor's study in a church. If you get yourself
ordained and declare your BBS room a sanctuary, you just might be eligible for
residential rates. In doing so, you'll have my blessing, if not Southwestern
Bell's.





























February, 1992
LETTERS







One Good Answer...


Dear DDJ,
In the October 1991 issue, Steve Summit's "C Q&A" had a question proposed on
how to write a generic macro to swap two values. The answer given was that
there was no good answer. Certainly this is true in general for pointers and
floating-point values. However, for integer and character values, the
following is a simple solution: #define SWAP(a,b) a^=b; b^=a; a^=b.
By using UNION statements, pointer, and floating-point values can also be
swapped. But I admit that it surely would be a special case if the above could
be used without a lot of setup.
Ron Harper
Houston, Texas


...Deserves Another


Dear DDJ,
In his October 1991 "Programmer's Bookshelf," Andrew Schulman reprinted one of
his favorite functions, cardinality(), from Harbison and Steele, saying, "I
still can't get how it works." Dammit, I couldn't sleep till I got it figured
out, and it is cute.
The trick quantity is (x &...x); x is unsigned. Assuming a nonzero x, the
result has one and only one high bit, in the position of the rightmost 1 in x.
It is a neat trick of twos-complement arithmetic. See Example 1.
Example 1

 x = yyy...y100...0
 !x = YYY...Y011...1 ( Y = !y )
 -x = !x+1= YYY...Y100...0 (unary - is twos complement inversion)
 x & -x= 000...0100...0 !

Example 2 is a commented version of cardinality() in C++. I suspect the
technique has been reinvented many times for parity-bit check routines and
interrupt handlers.
Example 2

 // cardinality() from Harbison&Steele, over easy (lightly cooked)

 typedef unsigned SET;
 const SET emptyset=0;
 inline SET rightmost1 (SET x) { return (x & -x); };
 // this returns a SET with one and only one bit set, at the rightmost
 // high bit of x.

 int cardinality (SET x) {
 int count = 0;
 while (x ! = emptyset) {
 x ^= rightmost1(x); // clears the rightmost 1 in x
 ++count; // count the number of times we clear a 1
 }
 return count; // this is the number of 1's that were in x
 }

Greg Goodknight
Reseda, California


What's a MODder to Do?



Dear DDJ,
In reference to Jeff Duntemann's "Structured Programming" column (November
1990) about the MOD function in Turbo Pascal, Jeff states that in Turbo
Pascal, -17 MOD 7 returns -3, whereas -17 modulo 7 should be 4. If I remember
from my mathematics classes correctly, in modulo 7 arithmetic, -3 is
equivalent ("congruential to") 4. There are 7 congruencer classes in modulo 7
arithmetic. They are:
 ...-14=-7=0=7=14=21...
 ...-13=-6=1=-8=15=22...
 ...-12=-5=2=9=16=23...
 ...-11=-4=3=10=17=24...
 ...-10=-3=4=11=18=25...
 ...-9=-2=5=12=19=26...
 ...-8=-1=6=13=20=27...
This can be seen through use of the graphical interpretation of the modulus
operator Jeff gives in his article --count clockwise (for positive numbers) or
counterclockwise (for negative numbers) and see which numbers leave you at
each position on the "dial." It is conventional to name as a representative of
a congruence class for modulo n the value between 0 and n-1. Turbo Pascal (and
QuickBasic and other programming languages) return a different representative.
To return the desired representative, Jeff's method of using Turbo Pascal's
TRUNC (truncation) operator and forcing it to round in the proper direction if
a negative value is encountered appears to work. In QuickBasic, the FIX
operator seems to return the same results as does Turbo Pascal's TRUNC. The
MOD operator appears to give the same results in both languages. There are two
options which are probably more straightforward than the method you use to
return the desired representative:
Add n to the result of i MOD n if it is less than 0; that is, k=i MOD 7: IF
k<0 then k=k+7.
Build your own MOD using an equivalent of QuickBasic's INT operator as
k=i-7*INT(i/7).
INT x "returns the largest integer less than or equal to" x, whereas FIX x
"returns the truncated integer part" of x. The difference between INT and FIX
is that, for negative x, FIX returns the first negative integer greater than
x, while INT returns the first negative integer less than x.
So, in conclusion, I wouldn't consider the MOD operator to be "wrong" in Turbo
Pascal (or QuickBasic)--it just returns a representative of the correct
congruence class which may not be the one that you (or the rest of your
program) are expecting.
David Hall
Moscow, Idaho


Code B. Free


Dear DDJ,
I enjoy DDJ immensely. I thought the April 1991 "Editorial," "Mark's Modest
Patent Proposal," was highly interesting. The last comment, however, really
hit the spot.
If you study the patent situation carefully and in depth, you may come to the
same conclusion that I have: Government has no business interfering with
business. Dick Gabriel has the right idea. If you can't "out-engineer,
out-market, and generally out-perform the competition," you should not expect
the Government to pull your fat out of the fire. It's a good thing that Isaac
Newton couldn't patent his law of gravity or we'd all still be paying
royalties to his descendants. (Not really!)
The "patenting of software" issue points up the conceptual difficulties that
lie, lurking in the underbrush of legalisms. If we are going to have a free
market, let it be free!
Perhaps your readers can let their minds soar and think of the progress (and
money) that could be made when companies (and individuals) don't waste their
time trying to corner the market by inventing "claims." Of course, things
would change even faster, but there is only one thing that is certain in the
universe, change. (Who said that?)
David Michael Myers
Martinsburg, West Virginia


Don't Stop Now!


Dear DDJ,
In his August 1991 "Structured Programming" column, Jeff Duntemann stopped
just short of explaining the thing I most needed to know. He stated:
When an interrupt comes in from one of the second set of IRQs, the second 8259
enters an interrupt to IRQ2 of the first 8259. Then some additional protocols
must be followed to inform the CPU which of the second set of IRQs was the
ultimate source of the interrupt. Yes, it does get hairy, but the second eight
IRQs don't really involve serial communications in any way, and I won't be
discussing them further.
I am developing a system where I need to have interrupt-driven routines to
catch bytes arriving through COM3 and COM4, as well as COM1 and COM2. Could
you please explain the "additional protocols" which must be followed for IRQ8
through IRQ15, or could you point me to a source of this information? It would
be much appreciated.
Jonathan E. Kopke
Cincinnati, Ohio


String Class Opposition


Dear DDJ,
This is in regard to Steve Teale's October 1991 article, "Proposing a C++
String Class Standard." I strongly oppose the proposed ANSII string class. The
proposed class is a class for character arrays and should be named
accordingly. In C, there always was a difference between character arrays and
strings, please let's keep it. Kernighan and Ritchie had very good reasons for
allowing both options. Is this just an effort to confuse C programmers or is
there a trend to make C++ incompatible with C?
The need to have the "\0" character in a string arises very seldom. Let's have
a character array class for that purpose. Sometimes specialization is an
advantage, but there is no need to change names as it just confuses the issue.
A string class should "waste" the memory needed to store the terminating "\0"
character, because it saves the time and memory consuming conversion to a C
string if you call a library function accepting a string argument. Note that
the code in Example 3 would not work.
Example 3

 String name ="myfile";
 String ext=".dat";
 String fullname=name+ext;
 FILE *fp=fopen (fullname, "r"); // missing '\0' terminator
 FILE *fp=fopen (fullname+' \0',"r"); // calls malloc!!

The purpose of having a string class is to permit the code in Example 3
without the need of a time- and memory-consuming conversion.

The C standard library contains many functions accepting C string arguments.
Many compilers supply these functions in tight assembler code. Why should we
be discouraged from using these C standard library functions? What are the
advantages of rewriting all that proved and widely used code?
An often used argument against the C string is that determining the end of the
string can be time consuming. A String class can easily overcome that
disadvantage by storing the current length of the string. To speed up
concatenation, the class could also store the current size of the reserved
memory and assign memory in chunks.
There is a reason why the proposed ANSII class, specifying that a string can
contain the "\0" character and that it is not terminated by the "\0"
character, will not be used very much:
Millions and millions of lines of code in libraries accept C strings as
arguments. To use the proposed string class, a conversion routine, probably
reserving heap memory for temporary storage, must be called before a call to a
function accepting a string argument. A major feature of C++ is that it is
easy to use existing C libraries. It would be a waste of programming power to
rewrite all that code just so functions would accept ANSII C++ strings.
What programmers really need is a dynamic string class that makes string
handling as easy as in Basic, and also allows the use of C libraries. Classes
like that have been published. The ANSII committee should not invent new
things or change accepted conventions, but should generate a useful standard
that combines the advantages of the different versions already in use.
Helmuth Schmalzl
San Luis Potosi, Mexico


Less is More


Dear DDJ,
This is in regard to Jack Woehr's September 1991 article, "Forth: A Status
Report." I have received copies of Forth BASIS proposals for a few years, only
to see that the standard was a growing set of words. I felt the standard
should be on the Core Word Set (original 83 word set?). All other words should
be in a routine library that can be chosen and compiled by the end user if
needed. The idea is to give the end user the tightest or smallest kernel to
start with for the platform being developed on.
William B. Higinbotham
Brookhaven, New York.


Why Buy the Cow When the Milk is Free?


Dear DDJ,
This is in response to the "Editorial" in the October 1991 issue concerning
"pay-per-use" software. There really is no reason to pay to use software, if
software is as free as air (paraphrasing the GNU manifesto). You just have to
know how to find it. Many talented people (including GNU) are replacing and
enhancing ~NIX with free software. One problem ~ vendors tend to hide their
source c~, making it difficult for users to fix bugs. It makes life much
easier when you have readable source code.
Marty Leisner
Rochester, New York


C Fans and Otherwise


Dear DDJ,
I like the new "C Language Q&A" feature very much. I tend to agree with Jeff
Duntemann's assessments of C (as a "sorry mess," for example -- March 1989).
Still, C is what I use. I am always glad to see warnings about hazy aspects of
the language. Gimpel's lint advertisements are likewise stimulating, though of
course they don't print the answers.
Mr. Duntemann may be interested to learn that I switched from Pascal because
it is so different on OS-9 and MS-DOS. I find C less confusing than attempting
to remember which Pascal I'm using.
Charles Marsh
Monroeville, Pennsylvania
Dear DDJ,
I enjoyed your May 1991 issue, as I do each issue. However, in reading the
"Programmer's Bookshelf" by Andrew Schulman I see that he could not be
considered a C fan. Can't fault him too much for that, as it seems my favorite
programming columnist (Jeff Duntemann) doesn't appear to be enamored with it
either. It is, however, in the C++ form, my favorite. I therefore feel
obligated to point out what appears to be a misunderstanding of operator
overloading.
On page 138, column 2, paragraph 1, the statement is made: "Nor can you
overload operator ^() to mean exponentiation and check if (n == 2^64 - 1)
because in C--I mean in C++, the ^ operator is unary not binary." Example 4
demonstrates that this is not the case.
Example 4

 /*******************************************************
 CLASS : cMyInt
 PURPOSE : demonstrate overloaded xor operator
 DESCRIPTION : useless
 LAST UPDATE : 4-15-1991 21:51:03
 PUBLIC :
 MEMBER FUNCTION CALL WITH RETURNS
 cMyInt void
 cMyInt int i
 operator = int i cMyInt&
 operator ^ int i int
 operator int void int
 ********************************************************/
 class cMyInt {
 long Val;
 public:
 cMyInt ( void ) { Val = 0; }
 cMyInt ( int i ) { Val = i; }
 cMyInt& operator =( int i ) { Val = (long) i; return *this; }

 int operator ^( int i );
 operator int( void ) { return (int) Val; }
 }; // class cMyInt
 int cMyInt::operator ^( int i ) {
 if ( i == 0 ) {
 if ( Val == 0 ) return 0;
 else return 1;
 }
 int exp = ( i < 0 ) ? ( i * -1): i;
 long l = Val;
 if ( exp > 1 )
 for(--exp; exp; exp--) 1 *= Val;
 return (int) 1;
 }
 main() {
 int i = 3;
 cMyInt m;
 m = i;
 if ( ( ( (cMyInt) 4^2 ) - 1 ) == 15 ) {
 printf("%d ^ 4 = %d\n", (int) m, m ^ 4 );
 printf("\tc's version is %d ^ 4 = %d\n", i, i^4);
 }
 else printf("\tIt didn't work!\n");
 }

If your example class BigInt has been properly defined to handle integers
large enough and includes the definitions for:
 BigInt::Bigint( int ); BigInt& BigInt::
 operator ^( int );
your code would work in this form:
 if (n == ( (BigInt) 2^64 ) - 1 ) {...}
A cast is necessary in order for the compiler to "know" that we want the
overloaded operator for a power function, not the xor function that would be
used on the char 0x2. The extra parens are necessary because while we can
overload the operators all we want, we cannot change the precedence which they
receive, and xor has a lower precedence than minus.
xor is not unary in any language I have ever tried.
Compiled with Turbo C++, Version 1.01, the output from the above test program
looks like this:
 C:\ > test 3 ^ 4 = 81
 c's version is 3 ^ 4 = 7
I admit, however, that I don't know any way to pass a 20-digit integer in C++
except as a string. Nobody's perfect. C++ gives me the low-level control that
I want and the reusability and ability to build in error detection--correction
of objects. It also has the notation system that makes the most sense to me,
which is probably the only true reason for preferring it over Turbo Pascal. It
just seems more natural to code:
 int a, b, c; a = b = c = 0; a = ++c;
 b = ++c; ++c += a;
than the Pascal way:
 var a, b, c: integer; a := 0; b := 0; c := 0;
 c := c + 1; a = c; inc( c ); b := c; inc
 (c, a + 1 );
I do wish however, that there were a way to say:
 function funcA( a, b, c: integer): integer;
(although this would take more space), rather than:
 int funcA( int a, int b, int c );
// shouldn't have to repeat int three times. K&R missed the boat there, or
rather the improvements to K&R did. With K&R you could say:

 int funcA( a, b, c )
 int a, b, c; {... }
Prototypes are an improvement, but we lost something too.
As always I enjoyed DDJ. Thanks for a great mag, and keep it coming.
Jerry E. Howton
Madisonville, Kentucky


That's the Point



Dear DDJ,
I found the November 1991 interview with the creator of PenPoint ("A
Conversation with Robert Carr,") fascinating. I think more attention should be
given in your magazine to operating systems of the future. MS-DOS I think is
the Fortran of operating systems, and years from now people will wonder how it
took so long for a better operating system to arrive.
In particular, I believe an operating system should be integrated with and
complement all applications. Bits of program should be able to move around
freely; files, variables, and devices should all be objects; memory hassles
should be hidden from the user and the programmer; maybe even spreadsheets and
other applications could operate on general objects with a set of generic
functions. It sounds like Robert Carr is doing everything I have always
dreamed of doing.
On a completely different subject, has anyone explored the idea of using data
compression for information encryption? It stands to reason that if you take
all the redundancy out of a file, you render it impervious to cryptoanalytic
attack. Arithmetic coding with a good adaptive model should create files in
which there is no detectable difference from a random file, so how could
anyone crack it?
Tim Cooper
Eastwood, Australia
























































February, 1992
USING DPMI TO HOOK INTERRUPTS IN WINDOWS 3


You can get there from here with DPMI


 This article contains the following executables: INTHOOK.ARC


Walter Oney


Walter is a principal software engineer at Rational Systems Inc. He is
responsible for architecting and developing Windows-related products. Rational
Systems' address is 220 N. Main St., Natick, MA 01760.


A classic technique for communicating between two DOS applications uses a
software interrupt. One program, which I'll call the "responder" in this
article, uses the DOS Set Vector function to establish itself as the handler
for an interrupt. The other program, the signaler, uses the INT instruction to
signal that interrupt, thereby invoking the responder's service routine. This
article shows you how a DOS application (a "DosApp") running in the enhanced
386 mode of Microsoft Windows 3.0 can signal a Windows application (a
"WinApp"). You could use the technique, for example, to allow a DosApp that
you haven't hosted directly under Windows to gather user input using the
Windows interfaces.


Roadblocks


When you first try to establish the interrupt-based communication path under
Windows, you quickly find a series of roadblocks that leave you thinking, "You
can't get there from here."
The first roadblock you encounter is that there are actually two interrupt
vectors--a protected-mode vector and a real-mode vector--and you can't easily
get at the right one. In enhanced mode, Windows runs Windows applications in
the protected mode of the Intel processor. An interrupt that occurs while the
processor is in protected mode gets routed through an Interrupt Descriptor
Table to a protected-mode service routine. In contrast, when the processor is
in real mode at the time of an interrupt, the address of a real-mode service
routine is located by indexing the interrupt vector table that begins at
address 0:0.
If a Windows application uses the normal DOS Set Vector function (INT 21h,
function 25h) to establish itself as an interrupt handler, Windows intercepts
the request and only changes the protected-mode vector. Because a DosApp
signaler doesn't run in protected mode (as explained in the accompanying text
box entitled, "V86 Mode"), its signal can't reach the WinApp responder.
Once you find a way to hook the real-mode interrupt vector, you encounter the
next roadblock: You can't just point the real-mode vector to your
protected-mode WinApp handler because they belong to two different address
spaces. The real-mode interrupt vector contains segment:offset addresses of
real-mode service routines, whereas the WinApp responder routine has a
selector:offset address that makes sense only in protected mode.
When you use a DOS extender, the existence of two interrupt vectors and the
problem of reaching a protected-mode handler for a real-mode interrupt can be
largely covered up. When a program executes a DOS Set Vector request, the
extender can install a real-mode handler instead of, or in addition to, a
protected-mode handler for the interrupt. The real-mode handler will actually
be a stub in the extender kernel that switches the processor to protected mode
and then transfers control to the protected-mode responder routine. This is
called "passing up" the interrupt. Windows doesn't simulate the Set Vector
function at this level of detail, however, and more heroic measures are
required to install pass-up handlers.
The third obstacle in the way of a DosApp that wants to signal a WinApp arises
from the way Windows uses different virtual machines to accomplish its
multitasking. Each virtual machine has its own real-mode interrupt vector.
Once you succeed in hooking a real-mode interrupt and arranging for the WinApp
responder to get control in protected mode, you discover that you can only
trap interrupts that occur in the system virtual machine. The DosApp signaler
executes in a different virtual machine and therefore can't reach the
responder.


Architecture for Success


You can get around these problems by using some arcane features of the DOS
Protected Mode Interface (DPMI) and an obscure task-switching function of the
multiplex interrupt (INT 2Fh). Although the Windows 3.0 implementation of DPMI
was intended to allow DOS extenders to work in a virtual machine created by
Windows, WinApps can also issue many of the DPMI interface calls. You can use
the Set Real-Mode Interrupt Vector and Allocate Real-Mode Callback Address
functions to hook a real-mode interrupt and provide for passing it up to a
protected-mode responder. Subfunction 1685h of INT 2Fh can be called from the
signaler's virtual machine to schedule a program, which I'll call the
forwarder, in the system virtual machine.
Figure 1 shows a concrete example of this technique. In a DOS window, we run
the signaler program, INT60.COM. The signaler uses 2F/1685 to execute the
forwarder, INT61.COM. (The address of the forwarder is stored in the interrupt
vector for INT 61h; no interrupt 61h is actually signalled in this example.)
The forwarder in turn issues an INT 60h instruction. The interrupt vectors
through the real-mode callback address to the operating system, which switches
to protected mode and invokes a protected-mode responder routine within a
small WinApp named INTHOOK. The responder calls PostMessage, and INTHOOK's
window procedure thereafter displays a message box announcing receipt of the
signal. Thus, there are three pieces to the example--INT60.COM, INT61.COM, and
INTHOOK.EXE. Listing One (page 78) shows a sample MAKE file for creating all
three pieces; I'll soon describe each piece in more detail.
INTHOOK.C (see Listing Two, page 78) is a simple Windows application that has
the usual parts, including a standard WinMain, a window procedure named
MainWndProc, and several helper functions that I'll describe further on.
Listing Three (page 80) shows the minimal module definition file needed to
actually build the application. When INTHOOK starts up, it calls hook60() to
hook interrupt 60h. When it terminates, it calls unhook60() to clean up. While
the application is active, it displays a dialog box each time an INT 60h
occurs.
DPMI function 0303h (Allocate Real-Mode Callback Address) is used by hook60()
to derive the segment:offset address of a real-mode stub belonging to Windows.
There are three parameters to this function, passed via registers, as follows:
First, AX contains 0303h, the function code for this DPMI request. Second,
DS:SI contains the address of the protected-mode program (namely, int60(), a
routine local to INTHOOK.C) to which Windows is to pass control when real-mode
code attempts to execute at the real-mode callback address. Lastly, ES:DI
holds the address of a static Real-Mode Callback Structure (cb60, in this
example) that will be used at callback time to store the state of the
real-mode program.
Windows returns the segment:offset address of the callback in CX:DX, and
hook60() saves this address in the callback variable. Then hook60() uses DPMI
function 0200h (Get Real-Mode Interrupt Vector) to preserve the original INT
60h vector for later restoration and function 0201h (Set Real-Mode Interrupt
Vector) to install the real-mode callback address as the handler for INT 60h.


Fielding the Real-Mode Callback


The business end of INTHOOK is the local subroutine named int60(), which gains
control via the DPMI real-mode callback mechanism and which must exit with an
IRET instruction.
When this subroutine is called, the register contents are as follows. DS:SI
contains the protected-mode (selector:offset) address of the interrupting
real-mode stack. ES:DI holds the selector:offset address of the real-mode
callback structure. We're using the static structure cb60 in this example, so
we don't actually need to use this pointer to access it. Finally, SS:SP points
to a special locked stack that's used just for interrupt handlers and callback
routines. This has no relation to the interrupting stack or to the data
segment for an instance of INTHOOK.
Because int60() is declared with the "interrupt" keyword, the Microsoft C
compiler generates a stylized function prolog and epilog, as shown in Example
1.
Example 1: The prolog and epilog for a function declared with the "interrupt"
keyword

 pusha
 push ds
 push es
 mov bp,sp
 mov ax,DGROUP
 mov ds,ax
 cld

 [function body]


 pop es
 pop ds popa
 iret

int60() is declared as having a single argument, "f," an instance of the
IFRAME structure declared just before the function. The IFRAME structure maps
the registers pushed by the function prolog. To refer to, say, the SI register
passed by DPMI, you would code f.si. Notice also that the function prolog
reloads the DS register from an immediate operand that corresponds to the
first instance of INTHOOK. As the example is coded therefore, there's no
provision for a "thunk" to set up a different DS for a second or subsequent
instance. This is just as well, because there's only one interrupt vector
entry for 60h to start with! This is the reason the WinMain function in
INTHOOK returns immediately if called for a second instance.
int60() has two responsibilities. First of all, it must arrange for control to
return properly to the instruction following the INT 60h instruction in the
forwarder by simulating an IRET instruction on behalf of the real-mode
process. Figure 2 diagrams this simulation, which involves copying the flags
and CS:IP from the interrupting stack into the real-mode callback structure
and adjusting the saved Stack Pointer register up by 6 bytes. The program
addresses the interrupting real-mode stack using the original contents of
DS:SI, which are provided by DPMI for just this purpose.
The other responsibility of int60() is to respond to the interrupt signal. In
an ordinary DOS program, a software interrupt handler gets control in the same
system context as the program which signaled its interrupt. Therefore, there
aren't any unusual restrictions on the system calls or other functions it can
perform. The WinApp responder in this example, unfortunately, gets control in
a somewhat fragile state because of the intervention of DPMI and the real-mode
callback mechanism. PostMessage is a safe call in this and other interrupt
handling situations within Windows, but many (if not most) other Windows API
calls are not safe. In coding the example, I opted for utmost safety and
restricted int60() to doing a PostMessage call that sends a WM_USER message to
the INTHOOK window procedure. That message elicits a suitable call to
MessageBox to demonstrate receipt of the interrupt signal.


The Signaling and Forwarding Partnership


The signaling program, INT60.COM (see Listing Four, page 80), is invoked from
the command level in a DOS window. It uses INT 2Fh, function 1685h, to make
sure that the INT 60h instruction is executed within the system virtual
machine.
The parameters to this function are passed in four registers, as follows:
First, the AX register holds 1685h, the code for this function. Second, BX
holds the index of the virtual machine in which a program is to be scheduled.
In Windows 3.0, the system virtual machine is numbered "1," but this may
change in future releases. The third parameter is passed in CX, and holds flag
bits. Flag bit 0 is set to 1 if the scheduler should wait for the target
machine to enable interrupts, 0 otherwise. Flag bit 1 is set to 1 if the
scheduler should wait until the critical section isn't claimed by anyone, 0
otherwise. The fourth parameter is in DX:SI, and is a 32-bit priority boost to
be given the target virtual machine. Finally, the ES:DI register holds the
CS:IP address of the program to be scheduled in the target virtual machine.
To use the interface, we need to know the index of the system virtual machine
and the address of the forwarder program. This is an example and not a
product-grade program, so I relied on the fact that the system virtual machine
is number 1 in Windows 3.0. To put the forwarder where INT60.COM could easily
find it, I decided to install it as the service routine for interrupt 61h. But
note that we don't actually execute an INT 61h to call the forwarder; instead,
we load the address kept in the INT 61h vector into ES:DI before calling the
2F/1685 function.
The actual INT 60h signal is sent by the forwarding program, INT61.COM (
Listing Five, page 80). INT61.COM is a TSR utility because it must be present
in memory when you start Windows in order to be part of the system virtual
machine.


Discussion


There are some obvious limitations to the technique I've described. The most
formidable is the absence of any provision for passing data from the signaling
DosApp to the responding WinApp. The 2F/1685 interface doesn't allow the
signaler to pass any data to the forwarder, and it isn't very easy for the
forwarder to pass data along to the WinApp responder, either. You could
overcome the first part of the problem by reserving a data buffer area within
the forwarder. This data area would be part of the address space shared by all
DOS virtual machines because it would exist when Windows started, and the
signaler could copy data into it before executing the INT 2Fh. If you don't
need to pass much data to the responder, the forwarder could load up the
general registers before issuing the INT 60h, and the responder could read the
data out of the real-mode callback structure and pass it along in the message
it posts.
More Details.
If you want to pass a large amount of data to the WinApp responder, you can
either use the clipboard (with an INT 2F interface described in Microsoft's
documentation) or you can create a protected-mode pointer to the forwarder's
data area. There is no documented way to create the pointer in Windows. My
experience shows that DPMI functions such as 000Ch (Set Descriptor) operate
correctly when issued by a WinApp, but Microsoft discourages their use. (In
contrast, the other DPMI calls described in this article are approved for use
by WinApps.) Whether Microsoft's policy is motivated by concerns about system
integrity, planned "desupport" in upcoming releases, or lack of portability to
non-Intel environments isn't apparent. There are also some undocumented
descriptor management interface calls in the Windows kernel, but I would guess
that they are less permanent than the DPMI functions.
Another limitation is that the WinApp can't actually do very much when it
wakes up after an interrupt. Though I haven't made an extensive check, I would
expect PostMessage and GetCurrentTask to just about fill the list. This has
implications for data sharing too, because it may be unsafe for the responder
program to do memory allocation calls in order to copy data from the forwarder
buffer into the Windows address space.
Finally, the interrupt management scheme described in the article doesn't
provide for the WinApp passing any information back to the signaler. There are
actually two problems here. One is that the WinApp can't use this mechanism to
originate a call to a DosApp in any particular virtual machine. A second, and
perhaps more crippling problem is that there aren't any synchronization tools
built into Windows that would let a signaling DosApp suspend execution until
the WinApp responder completes some task. A barely passable solution uses a
shared data word as a semaphore; the DosApp alternately tests the semaphore
and yields its time slice. Overall performance suffers with this hand-rolled
semaphore, however, and it's only suitable for low-frequency use. There are
better techniques for doing these more advanced functions, but elaboration is
beyond the scope of this article.
Despite its complexity and limitations, the interrupt hooking and signaling
technique outlined here will serve a class of applications. A suitable
application runs primarily as a DosApp and needs to exchange a small amount of
data with a WinApp on an occasional basis. There are many potential
applications that fit this profile, however, so you may find the technique
useful.


V86 Mode


A DOS application, sometimes called a "non-Windows" application in the
Microsoft documentation, executes in the virtual 8086 (V86) mode of the Intel
processor. With the help of system software, the processor closely mimics real
mode in this situation. In actual fact, the processor is executing in
protected mode, but the Virtual Machine (VM) bit in the extended flags
register is on. In V86 mode, addresses are interpreted as segment:offset
values instead of selector:offset values. Certain instructions that control
critical hardware and other system resources in real mode cause
General-Protection (code 0D) Faults in V86 mode. These faults allow the
operating system to virtualize functions that need to be shared by all
concurrent applications.
As an example of virtualization, consider what happens when a DosApp executes
an INT instruction. The instruction causes a General-Protection Fault, which
Windows disambiguates to discover its cause. Windows will dispatch the
appropriate handler in V86 mode. The handler may issue an STI instruction
(because real-mode handlers normally get control with interrupts disabled) and
will end with an IRET instruction. Both the STI and IRET instructions will
also cause General-Protection Faults so that Windows can maintain a virtual
Interrupt Flag. The real processor will be enabled for interrupts during most
of the process (except during the initial handling of the General-Protection
Fault) so that multitasking won't suffer.
There are additional details to the interrupt handling process that matter
only to operating systems programmers. For instance, the DosApp runs in
protection ring 3 (least privileged), whereas the Windows kernel runs in
protection ring 0 (most privileged), and the Input/Output Privilege Level is
set to 0 to prevent the DosApp from executing I/O instructions or affecting
the real Interrupt Flag in any way. Handling the General-Protection Faults
described in the previous paragraph requires the processor to switch stacks
and save additional state information beyond that required for a purely
real-mode process.
None of these details is apparent to the application programmer trying to
signal a software interrupt, however. At the level described in this article,
it appears that the DosApp runs in real mode and that a totally normal
real-mode interrupt service routine gains control directly from the INT
instruction.
--W.O.



_USING DPMI TO HOOK INTERRUPTS IN WINDOWS 3_
by Walter Oney


[LISTING ONE]

#############################################################################
# MAKE file for INTHOOK example and helper programs.
# By Walter Oney. Use with Microsoft C 6.0A & MASM 5.1 (or compatibles)
#############################################################################

all: inthook.exe int60.com int61.com

inthook.obj: inthook.c
 cl -AS -Zlipe -c -Gsw2 -Ows -W3 inthook.c >inthook.err
inthook.exe: inthook.obj
 link /noe /nod /co /m inthook,inthook,inthook,slibcaw libw,inthook;
 rc inthook.exe
 mapsym inthook

int60.com: int60.asm
 masm int60;
 link int60;
 exe2bin int60 int60.com
int61.com: int61.asm
 masm int61;
 link int61;
 exe2bin int61 int61.com






[LISTING TWO]

/****************************************************************************
 * INTHOOK.C -- by Walter Oney Use with Microsoft C 6.0A.
 * Sample app illustrating interrupt hooking in Win3 using DPMI 0.9 services
 ****************************************************************************/

/* Include files */
#include "windows.h" /* MS windows dcls */
#include "dos.h" /* for FP_SEG, FP_OFF */

/* Local procedures and data */
 static HANDLE hInst ; /* current instance handle */
 static HWND hMyWindow ; /* handle for our window */
 static void (interrupt far *org60)() ; /* original INT 60 handler */
 static void (far *callback)() ; /* real mode callback address */
 typedef struct
 { /* real mode callback */
 unsigned long edi, esi, ebp, junk, ebx, edx, ecx, eax ;
 unsigned short flags, es, ds, fs, gs, ip, cs, sp, ss ;
 } CBSTRUCT ; /* real mode callback */
 CBSTRUCT cb60 ; /* callback for INT 60 handling */

 static void hook60(void) ; /* hook INT 60 */
 static void unhook60(void) ; /* unhook INT 60 */
 static void interrupt far int60() ; /* interrupt handler */

/**********************************************************************/
/* Main window procedure: */
 LONG FAR PASCAL MainWndProc
 (HWND hWnd, /* window handle */
 WORD iMessage, /* message code */
 WORD wParam, /* 1st parameter */
 LONG lParam) /* 2d parameter */
 { /* MainWndProc */
/* Local variables */
 long retcode = 0 ; /* return code */
/* Text */
 switch(iMessage)
 { /* process message */
 case WM_CREATE:
 hMyWindow = hWnd ; /* so INT60 can find it */
 hook60() ;
 break ;
 case WM_USER: /* posted by int60() */

 MessageBox(GetFocus(), "Wake up!", "Salutations",
 MB_ICONEXCLAMATION MB_OK) ;
 break ;
 case WM_DESTROY: /* window being destroyed */
 unhook60() ;
 PostQuitMessage(0) ;
 break ;
 default: /* some other message */
 retcode = DefWindowProc(hWnd, iMessage, wParam, lParam) ;
 break ;
 } /* process message */
 return retcode ;
 } /* MainWndProc */

/**********************************************************************/
/* Window message loop: */
 int PASCAL WinMain
 (HANDLE hInstance, /* current instance */
 HANDLE hPrevInstance, /* previous instance (if any) */
 LPSTR lpCmdLine, /* command line */
 int nCmdShow) /* show window type (open/icon) */
 { /* WinMain */
/* Local variables */
 HWND hWnd ; /* window handle */
 MSG msg ; /* current message */
 WNDCLASS wc ; /* template for this class */
/* Text */
/* Only allow one instance of the application at a time -- there's only
 one interrupt vector to hook! */
 if (hPrevInstance)
 return FALSE ;
/* Create the window class. */
 wc.style = 0 ; /* default styles */
 wc.lpfnWndProc = MainWndProc ; /* window proc */
 wc.cbClsExtra = 0 ; /* no extra bytes for class */
 wc.cbWndExtra = 0 ; /* no extra bytes for instance */
 wc.hInstance = hInstance ; /* who created the class */
 wc.hIcon = LoadIcon(NULL, IDI_APPLICATION) ;
 wc.hCursor = LoadCursor(NULL, IDC_ARROW) ; /* default cursor */
 wc.hbrBackground = (HBRUSH) GetStockObject(WHITE_BRUSH) ;
 wc.lpszMenuName = NULL ; /* no menu */
 wc.lpszClassName = "AppWClass" ; /* name of window class */
 if (!RegisterClass(&wc))
 return FALSE ;
/* Create an instance of the class (i.e., our own window) */
 hInst = hInstance ; /* so window proc can access it */
 hWnd = CreateWindow("AppWClass",
 "Interrupt Hook Sample Application",
 WS_OVERLAPPEDWINDOW,
 CW_USEDEFAULT,
 CW_USEDEFAULT,
 CW_USEDEFAULT,
 CW_USEDEFAULT,
 NULL, NULL, hInstance, NULL) ;
 if (!hWnd)
 return FALSE ;
 ShowWindow(hWnd, SW_SHOWMINIMIZED) ;
 UpdateWindow(hWnd) ;
/* Main message loop */

 while (GetMessage(&msg, NULL, NULL, NULL))
 { /* until WM_QUIT message */
 TranslateMessage(&msg) ; /* xlate virtual key codes */
 DispatchMessage(&msg) ; /* dispatch handler */
 } /* until WM_QUIT message */
 return msg.wParam ; /* PostQuitMessage's arg */
 } /* WinMain */

/**********************************************************************/
/* HOOK60 hooks software interrupt 60h in real mode, using a real-mode
 callback to get control passed to int60(). */
 static void hook60()
 { /* hook60 */
 _asm
 {
 push ds ; save DS across call

 mov ax, ds ; ES:DI -> callback structure
 mov es, ax ; ..
 mov di, offset cb60; ..
 mov ax, cs ; DS:SI -> routine to call
 mov ds, ax ; ..
 mov si, offset int60 ; ..
 mov ax, 0303h ; fcn 0303: allocate real mode callback
 int 31h ; issue DPMI function request

 pop ds ; restore DS
 mov callback, dx ; CX:DX = callback address
 mov callback+2, cx ; ..

 mov bl, 60h ; BL = interrupt number (60h)
 mov ax, 0200h ; fcn 0200: get real mode interrupt vector
 int 31h ; issue DPMI function request
 mov org60, dx ; CX:DX = original real mode vector
 mov org60+2, cx ; ..

 mov dx, callback ; CX:DX = new REAL MODE handler address
 mov cx, callback+2 ; ..
 mov ax, 0201h ; fcn 0201: set real mode interrupt vector
 int 31h ; issue DPMI fcn request
 }
 } /* hook60 */

/**********************************************************************/
/* UNHOOK60 restores the original interrupt 60h vector. */
 static void unhook60()
 { /* unhook60 */
 _asm
 {
 mov dx, org60 ; CX:DX = original INT 60 vector
 mov cx, org60+2 ; ..
 mov bl, 60h ; BL = interrupt number (60h)
 mov ax, 0201h ; fcn 0201: set real mode interrupt vector
 int 31h ; issue DPMI fcn request

 mov dx, callback ; CX:DX = real-mode callback address
 mov cx, callback+2 ; ..
 mov ax, 0304h ; fcn 0304: free real mode callback
 int 31h ; issue DPMI fcn request

 }
 } /* unhook60 */

/**********************************************************************/
/* INT60 is the interrupt handler for interrupt 60h. It gains control from
 the real-mode callback address allocated by HOOK60() and uses PostMessage
 to send a message to the application's window procedure. */
 typedef struct
 { /* interrupt register structure */
 unsigned short es, ds ;
 unsigned short di, si, bp, sp, bx, dx, cx, ax ;
 unsigned short flags, ip, cs ;
 } IFRAME ; /* interrupt register structure */
 static void interrupt far int60(IFRAME f)
 { /* int60 */
 unsigned int far *isp ; /* interrupting stack pointer */
/* Simulate IRET in signalling process. */
 FP_SEG(isp) = f.ds ;
 FP_OFF(isp) = f.si ;
 cb60.ip = isp[0] ;
 cb60.cs = isp[1] ;
 cb60.flags = isp[2] ;
 cb60.sp += 6 ;
/* Post a private message leading to a message box. */
 PostMessage(hMyWindow, WM_USER, NULL, NULL) ;
 } /* int60 */






[LISTING THREE]

NAME INTHOOK
DESCRIPTION 'Interrupt Hook Sample Application'
EXETYPE WINDOWS
STUB 'WINSTUB.EXE'
CODE PRELOAD MOVEABLE DISCARDABLE
DATA PRELOAD MOVEABLE MULTIPLE
HEAPSIZE 1024
STACKSIZE 5120
EXPORTS
 MainWndProc @1






[LISTING FOUR]

;-----------------------------------------------------------------------------
; INT60.ASM -- Signaller program for INTHOOK example app. By Walter Oney.
;-----------------------------------------------------------------------------
 name int60
int60 segment byte public 'code'
 assume cs:int60, ds:int60
 org 100h ; required for .COM file usage


; See if forwarder program is present by checking interrupt vector for 61h.
begin: mov ax, 3561h ; get int 61 vector address
 int 21h ; ..
 mov ax, es ; be sure there is one
 or ax, bx ; ..
 jz cantcall ; if not, complain and quit

; Use 2F/1685 to switch to system virtual machine and call forwarder program.
; Note that no-one actually does an INT 61h--we simply use the vector as a
; convenient (and facile) place to park the address of the callback
 mov ax, 1685h ; fcn 1685: switch VM's and callback
 mov di, bx ; ES:DI = callback address (int 61 hdlr)
 mov bx, 1 * ; BX = VM to switch to (system VM)
 mov cx, 3 ; CX = .... .... .... ..11 -- wait until
 ; interrupts enabled and critical
 ; section unowned
 xor dx, dx ; DX:SI = priority boost (zero)
 xor si, si ; ..
 int 2Fh ; switch to system VM & do INT 60

; If the forwarder found an INT 60 handler, it saved the address beginning
; after a 3-byte JMP located at 100h. Check this in order to complain if
; the responder WinApp isn't loaded. Note that this test will sometimes
; fail because we're running asynchronously with the system VM.
 mov ax, word ptr es:[103h]; see if forwarder found an INT60 handler
 or ax, word ptr es:[105h]; ..
 jz no60 ; if not, complain about it
goback: mov ax, 4C00h ; terminate with errorlevel 0
 int 21h ; (does not return)
cantmsg db 'The forwarder program is not installed', 13, 10, '$'
nomsg db 'The responder WinApp is not running', 13, 10, '$'
no60: lea dx, nomsg ; responder not present
 jmp short complain ; ..
cantcall:lea dx, cantmsg ; forwarder not present
complain:mov ah, 9 ; fcn 9: print msg in DS:DX on screen
 int 21h ; ..
 jmp goback ; exit
int60 ends
 end begin






[LISTING FIVE]

;-----------------------------------------------------------------------------;
; INT61.ASM -- Forwarder program for the INTHOOK example application ;
; Written by Walter Oney ;
;-----------------------------------------------------------------------------;

 name int61
 .286p
int61 segment byte public 'code'
 assume cs:int61, ds:int61
 org 100h ; required for .COM file usage
begin: jmp doit ; jump around fixed data area


vect60 dw 0, 0 ; for inspection by signaller

doit: mov ax, 2561h ; hook INT 61h so INT60 can find us
 lea dx, callback ; from another virtual machine
 int 21h ; ..

 lea dx, endprog+15 ; point past all code in this module
 shr dx, 4 ; compute # paragraphs to keep
 mov ax, 3100h ; terminate and stay resident
 int 21h ; (does not return)
; CALLBACK is reached circuitously when INT60 does a 2F/1685 to schedule
; it to run in the system virtual machine.
callback:
 push es ; save working registers
 push ds ; ..
 pusha ; ..
 mov ax, cs ; set DS == CS
 mov ds, ax ; ..

 mov ax, 3560h ; get current INT 60 vector address
 int 21h ; ..
 mov vect60, bx ; save for debugging inspection
 mov vect60+2, es ; ..

 mov ax, es ; is there a handler?
 or ax, bx ; ..
 jz done60 ; if not, don't signal it!
 int 60h ; issue INT 60 to wakeup WinApp
done60: popa ; restore registers
 pop ds
 pop es ; ..
 iret ; return from callback to Windows
endprog: ; for 21/31 paragraph computation
int61 ends
 end begin




EXAMPLE 1.

 pusha
 push ds
 push es
 mov bp,sp
 mov ax,DGROUP
 mov ds,ax
 cld

 [function body]

 pop es
 pop ds
 popa
 iret


































































February, 1992
MIXING REAL-AND PROTECTED-MODE CODE


Using an intermode call buffer is one technique for moving data between real
and protected mode




Kerry Loynd


Kerry is a senior programmer at M & R Services Inc. He can be contacted at
1301 5th Ave., Suite 3600, Seattle, WA 98101.


You're probably already saying "Ugh, even if I can mix real-and protected-mode
code, why would I want to?" Well, there are times when it makes good sense.
Where I work, for example, we recently developed a mathematical tool for
actuaries that manipulates large amounts of data, anywhere from many hundreds
of kilobytes to several megabytes. The only sensible way to deal with such
large amounts of information was to take advantage of protected mode's large
address space. But (there's always that "but") some of our files were in
Novell's Btrieve format, and Btrieve, as of this writing, doesn't come in a
DOS-usable, protected-mode version.
Our alternatives were to redo the code with another file manager that could
run in protected mode, or stay with Btrieve and mix addressing modes. Several
products on the market include source code that runs under protected mode, and
if we started again, I would certainly examine that option. But because time
was short and we wanted to minimize our effort, we chose to mix modes.
If you think about it, you can probably come up with similar scenarios of your
own. What if, for example, you have a highly specialized math library, and you
have only the real-mode object code, or you have a special-purpose device
attached to your computer that uses a real-mode device driver. Mixing real-and
protected-mode code is a useful technique you could apply in both cases.


Moving Data Between Real and Protected Mode


It's no secret that real- and protected-mode addresses are very different
beasties. Real-mode addresses correspond directly to hardware addresses;
protected-mode addresses are logical addresses that don't map directly to
hardware addresses. They have to be decoded by the processor's
memory-management circuitry.
Because there is no direct correlation between real- and protected-mode
addresses, how do you move data between modes? You could (and sometimes
should) put the values you need into registers and switch modes. It's fast,
it's simple, but it's only good for very small amounts of data, and you would
probably have to build an assembly language routine to do it.
Another alternative would be to write what you need to disk and let DOS and
the extender slug it out. You could certainly handle large quantities of
information, but the performance would be excruciatingly slow.
A third method, and the focus of this article, is to use an intermode call
buffer. This is an area of memory guaranteed to be accessible to both real-
and protected-mode addressing. You get speedy access to your data, and you can
declare enough space to access realistic amounts of information.


How a Call Buffer Works


The first requirement for a call buffer is that it has to be accessible in
real mode. That means it has to live in the first megabyte of memory. (Sorry,
no silver bullets here.) The next requirement is to make sure there is an
entry in the protected-mode descriptor table that defines a segment of memory
with the same length, at the same physical address. You don't have to worry
about the details of this unless you really like to get under the hood. The
DOS extender will take care of allocating the call buffer space and mapping it
to protected-mode space. You determine the size of the call buffer with
command-line switches for the extender or with link options.
How do you get the address information on the call buffer? You have to ask the
extender to give it to you. There are two ways to do that: The first is to
link in the extender's Application Program Interface (API) library and simply
make a function call from your protected-mode program. Because the extender
starts your program in protected mode, that's easy to do: See APIBTRV.C
(Listing One, page 82).
Another way to get the scoop on the call buffer is to issue a software
interrupt directly to the extender. That avoids the use of an outside library,
but takes a little more effort to figure out and make work. Then you have to
dig through the extender documentation to get the particulars for each call
you need. I don't know about you, but to me int386() calls are harder to read
in the source code than function names. Anyway, 386BTRV.C shows how to use
Watcom's int386() to make the calls to the extender.


Real-Mode Switching and Executing


Protected-mode programs can issue real-mode interrupts and call real-mode
procedures. Real-mode programs can call protected-mode procedures. Though this
article focuses on the former, the accompanying text box entitled, "The Flip
side of the Coin: Going from Real to Protected Mode," provides guidelines for
the latter.
DOS extenders use DOS interrupt calls to invoke the functions you need. For
example, Phar Lap uses DOS interrupt 0x21, function 0x25. You specify a
subfunction code for the extender function you need. The functions take values
you specify in a parameter block, switch to real mode, and start execution of
the interrupt or procedure you asked for. When the real-mode code returns,
these functions clean up and switch back to protected mode.


A Sample Application


The sample application is a card file program; see Listing Two (page 84).
Although not fancy, it does illustrate all the concepts I've covered. There
are two ways to build the application, depending on which Btrieve interface
file you use. The 386BTRV.C file (Listing Three, page 86) calls the extender
with int386(). APIBTRV.C uses Phar Lap's API library calls from DOSX32.LIB.
These interface files were built for Btrieve for DOS, Version 5.10a. They
might run with earlier versions of Btrieve, but I haven't tried. The code was
compiled with Watcom C, Version 8.0, and targets PharLap's 386DOS-Extender,
Version 3.0. Instructions on how to compile and run the application are in the
file headers for APIBTRV.C and 386BTRV.C. Listing Four (page 88) is
BTRV_DEF.H, and contains the manifest constants used to interface with Btrieve
and the structures used to set up Btrieve files. Listing Five (page 90) is
BTRVERRS.H, the file that contains the error codes returned by Btrieve.
More Details.
When the program starts, a menu will come up allowing you to Find, Create, or
List records, or Exit. Find asks for the name to find and displays the record
alphabetically greater than or equal to the name you entered. Then it asks if
you want to edit the record. Create allows a record to be entered and tells
whether it was successful or whether a matching record was already in the
file. List summarizes all the records in the file in alphabetic order. Exit
closes the file and terminates the program. Please note that the source code
shows the Btrieve file being opened in accelerated mode. If your program
crashes or stops executing without going through the Exit option, the data in
the file will probably be trashed.
The main points of interest for this article are in the Btrieve interface
files. These are modeled after the interface files Novell supplies with
Btrieve. They take care of setting up and executing the interrupt that
requests Btrieve services. The first thing they do is make sure Btrieve is
loaded before they issue the interrupt to Btrieve. The Btrieve TSR can be
detected by looking at the offset part of its real-mode interrupt vector.
APIBTRV gets the vector by calling DX_RMIV_GET(). BTR_INT is the number of the
interrupt vector to load, and realAddr is where the vector is returned. The
code in 386BTRV builds a register overlay by putting BTR_INT into CL, and
0x2503 into AX. The 0x25 is the extender function code, and 0x03 is a request
to fetch a real-mode interrupt vector. After calling DOS via int386(), the
vector is in EBX.
If Btrieve is loaded, the next step is to get the intermode call buffer
information. APIBTRV calls DX_RMLINK_GET() and gets the buffer's real-mode
address in realAddr, the buffer size (in bytes) in buffSize, and a
protected-mode far pointer to the buffer in xbuff. The size can be useful, but
it's not used in this application. DOS386RealProc gets the address of a
real-mode entry point that allows your real-mode code to call back to your
protected-mode procedures.
386BTRV gets the current segment register values and puts 0x250D in AX. 0x0D
is a request for the vitals on the call buffer. The protected-mode far address
is returned in ES:EDX, and that address is placed in xbuff. The real-mode
address comes back in EBX, and the size, in bytes, is in ECX. EAX holds the
real-mode to protected-mode entry point.
In both cases, the real-mode address is broken into its segment and offset
components. The Btrieve file control block, the key buffer, and the data are
copied into the call buffer through the structure laid out by xbuff, and the
Btrieve parameter block is loaded with the operation code and the real-mode
addresses of those copies.
Once the call buffer has been loaded, both interfaces put the Btrieve
interrupt number and the real-mode address of the call buffer into a parameter
block. APIBTRV then calls DX_REAL_INT with the protected-mode address of the
parameter block (not the call buffer). This call sets up the machine
registers, switches to real mode, executes the interrupt, then cleans up and
returns to APIBTRV. 386BTRV, on the other hand, sets up the register overlay
with 0x2511 in AX (0x11 is a request to issue a real-mode interrupt with
registers specified), the protected-mode selector of the parameter block (not
the call buffer) in DS, and its offset in EDX. Then it calls int386x() to
execute the Btrieve request.
When the Btrieve interrupts return, both interfaces copy the data returned by
Btrieve from the call buffer back to the application's protected-mode space.
The Btrieve status code is returned to whichever routine called BTRV. Nothing
to it, eh?
Mixing real-and protected-mode code may not be the most intuitive thing in the
world, but it is fairly simple. If you run into trouble, I have found the DOS
extender vendors knowledgeable and helpful. Next time you have a memory
constraint problem, take a look at mixing modes. You just might find a good
solution to your problem.



The Flip Side of the Coin: Going from Real to Protected Mode


In addition to allowing protected-mode programs to invoke real-mode code, DOS
extenders give you a means of calling protected-mode code from real mode. When
your protected-mode code calls the extender to get the information about the
intermode call buffer, one of the parameters returned to you is a real-mode
far pointer to a function that you can use to call a protected-mode procedure.
To use this entry point, both your real-mode and protected-mode code must
already be loaded. The protected-mode code has to pass a protected-mode far
pointer to the real-mode code. The best way to do this is to put the pointer
into the call buffer along with the address of the real-to-protected-mode
entry point. The real-mode code can then get the entry point address from the
call buffer and put it in a far function pointer variable. Push the parameters
onto the stack just as for a normal function call. If you are using the Watcom
compiler, remember to declare your protected-mode function with the cdecl
keyword so it will expect the parameters on the stack. Then push either a long
word 0 or a real-mode far pointer to a block holding protected-mode segment
register values for DS, ES, FS, and GS. Finally, put the protected-mode
address of the procedure to execute on the stack. Your call should look
something like this: (*rmToPmEntryPoint) (pmFunctionOffset,
pmFunctionSelector, &segRegBlock, parm1, parm2, parm3);. Remember,
protected-mode far pointers have 16-bit selectors and 32-bit offsets.

 The segRegBlock is shaped like this:

 struct {
 short DS;
 short ES;
 short FS;
 short GS;
 } sRegStruct;

If you give the address of such a block, the protected-mode code will use the
values in that block for its selector values. If you push 0L, it will use the
values those registers had when the protected-mode code first started.
The real-to-protected entry code takes the protected-mode address and the
selector values block from the stack. A protected-mode return address is put
on the stack, and the stack is adjusted. The protected-mode function will only
see the parameters you want it to use. When you declare the protected-mode
function, be sure to use the far keyword. You have to do this because the
routine has to execute a far return to get back to real mode with the stack
correctly aligned. After returning to real mode, the segment register
parameter block will have the values that were in the segment registers when
the protected-mode function returned. The stack will have zero where the
protected routine address and the segment register block address were stored.
There is one nonobvious point to look out for when calling from real to
protected mode. When control is given to the protected-mode routine, it is
using the same physical stack space as the real-mode code. If you want to use
library routines that were compiled with stack checking, you'll have to set
the protected-mode stack to the stack segment in use when the protected code
started. Otherwise you'll get memory protection violations. Be sure to reset
the stack to the real-mode segment before returning to real mode.

--K.L.



_MIXING REAL- AND PROTECTED-MODE CODE_
by Kerry Loynd



[LISTING ONE]

// Switcher.c
// This code should be easily portable to compilers other than Watcom's

#include <stdio.h>
#include <stddef.h>
#include <process.h>
#include <ctype.h>
#include <string.h>
#include "btrv_def.h"
#include "btrverrs.h"

#define SINT short
typedef unsigned char UCHAR; /* unsigned 8-bit value */

extern SINT cdecl BTRV (int, UCHAR *, char *, int *, char *, int);

/* CREATE_STRUCT is used to create the card file. */
typedef struct tagCREATE_STRUCT {
 FILE_SPEC fileInfo;
 KEY_SPEC keyInfo;
} CREATE_STRUCT;
/* CARD_STRUCT is the structure of the card file records */
typedef struct tagCARD_STRUCT {
 char name[21];
 char address[21];
 char city[21];
} CARD_STRUCT;
#define CARD_LEN sizeof (CARD_STRUCT)

#define NAME_KEY 0
static unsigned char posBlock[128]; /* Btrieve's position control */
static char searchKey[255]; /* Search key buffer. MUST be 255 long.*/
static CREATE_STRUCT cardInfo = {
 sizeof(CARD_STRUCT), 1024, 1, 0L, NO_FILE_FLAGS, 0, 0,
 1, 21, EXTENDED_TYPE_KEY MODIFIABLE_KEY, 1L, ZSTRING_KEY, 0, 0, 0, 0, 0
};
static void EditRecord (CARD_STRUCT *card)
{
 char done = 'n';
 char gunkCatcher[80];
 while (done != 'y') {
 gets(gunkCatcher); /* Flush out the input buffer. */
 printf ("\n\nName: %s\n:", card->name);
 flushall ();
 gets (card->name);
 printf ("\nAddress: %s\n:", card->address);
 flushall ();
 gets (card->address);
 printf ("\nCity: %s\n:", card->city);
 flushall ();
 gets (card->city);
 printf ("\n%-21s, %-21s, %-21s\n", card->name,
 card->address, card->city);
 printf ("\n\nDone? [y/n] ");
 flushall ();
 done = tolower (getchar());
 printf ("Done = %c\n", done);
 }
}
static void CreateRecord (void)
{
 CARD_STRUCT card;
 int rc = 0;
 int len = CARD_LEN;
 memset (&card, '\0', sizeof (CARD_STRUCT));
 EditRecord (&card);
 rc = BTRV (INSERT_BTR, &posBlock[0], (char *)&card, &len,
 searchKey, NAME_KEY);
 if (rc != 0)
 printf ("Could not insert record %s, BTRV error %d\n",
 card.name, rc);
}
static void FindRecord (void)
{
 CARD_STRUCT card;
 int rc = 0;
 int len = CARD_LEN;
 gets (searchKey); /* Flush out the input buffer. */
 printf ("Name to search on: ");
 flushall ();
 gets (searchKey);
 rc = BTRV (GET_GT_EQ, &posBlock[0], (char *)&card, &len,
 searchKey, NAME_KEY);
 if (rc == 0) {
 printf ("Found %s.\n\nEdit this record?", searchKey);
 if (tolower(getchar ()) == 'y') {
 EditRecord (&card);
 rc = BTRV (UPDATE_BTR, &posBlock[0], (char *)&card, &len,

 searchKey, NAME_KEY);
 if (rc != 0)
 printf ("Could not insert record %s, BTRV error %d\n",
 card.name, rc);
 }
 } else
 printf ("Could not find a record.\n");
}
static void ListRecords ()
{
 CARD_STRUCT card;
 int rc = 0;
 int len = CARD_LEN;
 printf ("\n\nList of records:\n\n");
 rc = BTRV (GET_FIRST, &posBlock[0], (char *)&card, &len,
 searchKey, NAME_KEY);
 while (rc == 0) {
 printf ("%-21s, %-21s, %-21s\n", card.name, card.address,
 card.city);
 len = CARD_LEN;
 rc = BTRV (GET_GT, &posBlock[0], (char *)&card, &len,
 searchKey, NAME_KEY);
 }
 if (rc != END_OF_FILE_BTR)
 printf ("\n\nBTRV error %d\n", rc);
 printf ("Press any key to continue.\n");
 while (!kbhit()); /* wait for a key */
 getch (); /* clean out the dregs. */
}
int main (int argc, char **argv)
{
 int rc = 0;
 int menuChoice = 0;
 int len = 0;
 char fakeData[1];
 if (argc < 2) {
 printf ("No card file specified.\n");
 exit(1);
 }
 strcpy (searchKey, argv[1]);
 rc = BTRV (OPEN_BTR, &posBlock[0], fakeData, &len, searchKey,
 ACCELERATED);
 if (rc == BTRIEVE_INACTIVE_BTR) {
 printf ("Btrieve isn't loaded.\n");
 exit(1);
 }
 if (rc != 0) {
 len = sizeof(CREATE_STRUCT);
 rc = BTRV (CREATE_BTR, &posBlock[0], (char *)&cardInfo, &len,
 searchKey, NAME_KEY);
 if (rc != 0) {
 printf ("BTRV create returned %d\n", rc);
 BTRV( STOP_BTR, &posBlock[0], fakeData, &len, fakeData,
 NAME_KEY);
 exit (1);
 }
 len = 0;
 rc = BTRV (OPEN_BTR, &posBlock[0], fakeData, &len, searchKey,
 ACCELERATED);

 if (rc != 0) {
 printf ("BTRV open returned %d\n", rc);
 BTRV( STOP_BTR, &posBlock[0], fakeData, &len, fakeData, NAME_KEY);
 exit (1);
 }
 }
 menuChoice = 0;
 while (menuChoice != 9) {
 printf ("\n\n1. Find a record\n");
 printf ("2. Create a new record\n");
 printf ("3. List records\n");
 printf ("9. Exit\n\n:");
 scanf ("%d", &menuChoice);
 switch (menuChoice) {
 case 1:
 FindRecord ();
 break;
 case 2:
 CreateRecord ();
 break;
 case 3:
 ListRecords ();
 break;
 }
 }
 BTRV (CLOSE_BTR, &posBlock[0], fakeData, &len, searchKey,
 NAME_KEY);
 BTRV( STOP_BTR, &posBlock[0], fakeData, &len, searchKey,
 NAME_KEY);
 return 0;
}







[LISTING TWO]

/*********************************************************************
 * Filename....... APIBTRV.C Version........ 1.2
 * Author......... Kerry Loynd
 * Comments....... WatCom C interface to the Btrieve Record Manager
 This particular incarnation of the BTRV call interface is for the 32-bit
 Watcom compiler, and uses int386() to generate calls to BTRIEVE from programs
 operating in protected mode under Pharlap's 386DOS. To compile it use
 the command line
 C> WCL386 switcher.c apibtrv.c -l=<pharlap path>dosx32
 To run the application use
 C> BTRIEVE
 C> RUN386 -callbuf n switcher <btrFile>
 where n is size, in KBytes, of the intermode call buffer. The size of that
 buffer must be greater than or equal to the size of the XBUFFER struct plus
 the maximum data record size used by your program. BtrFile is the name of the
 data file you want to use with the application. Use 1 for this application.
 There is a subtle point here that you must watch. ALWAYS pass all the
 defined parameters to a call to BTRV. Be especially careful to set *dataLen
 to 0 if it is not used in the operation you are performing. Otherwise, you

 may find that your program has gone over the wall...
 I could have placed a check for an oversize dataLen and returned a status
 code, but, since I can't guarantee whether Novell will use the same status
 code some time in the future, I did not. NOTE: You should be aware that
 Watcom's linker will not successfully search DOSX32.LIB unless you use
 Watcom's librarian to extract all the object modules and then re-insert them
 into the library. If you want to understand why, call Watcom or Pharlap.
 * COPYRIGHT (C) 1991. All Rights Reserved.
 * Kerry Loynd Seattle, WA
 ********************************************************************/

#include <stddef.h>
#include <dos.h>
#include <string.h>

typedef unsigned short int INT;
typedef unsigned long int UINT;
typedef unsigned long ULONG;
typedef unsigned long REAL_ADDR; /* used for real-mode addresses. */
typedef unsigned char UCHAR; /* unsigned 8-bit value */
typedef UCHAR far * FARPTR;

#define SINT short
#define BTR_ERR 20 /* record manager not started */
#define BTR_INT 0x7B /* Btrieve interrupt vector */
#define BTR_OFFSET 0x33 /* Btrieve offset within segment */
#define VARIABLE_ID 0x6176 /* id for variable length records */

/* INT_BLOCK is a structure used by 386DOS to invoke real mode interrupts. */
typedef struct intblock {
 SINT intNumber; /* Interrupt to invoke. */
 SINT rds; /* real mode ds. */
 SINT res; /* real mode es. */
 SINT rfs; /* real mode fs. */
 SINT rgs; /* real mode gs. */
 unsigned long reax; /* real mode eax. */
 unsigned long redx; /* real mode edx. */
} INT_BLOCK;

/* Whatever compiler you use, this structure must be byte aligned. */
typedef struct BTRIEVE_PARMS /* structure passed to Btrieve Rec Mgr */
 {
 REAL_ADDR bufAddress; /* caller's data buffer real mode address */
 INT bufLength; /* length of data buffer */
 REAL_ADDR curAddress; /* user position block real mode address */
 REAL_ADDR fcbAddress; /* Real mode address of disk FCB */
 INT function; /* requested function */
 REAL_ADDR keyAddress; /* Real mode address of user's key buffer */
 char keyLength; /* length of user's key buffer */
 char keyNumber; /* key of reference for request */
 REAL_ADDR statAddress; /* Real mode address of status word */
 INT xfaceID; /* language identifier */
 } BTR_PARMS;
typedef struct xbuffer { /* Structure of intermode call buffer. */
 BTR_PARMS xData; /* Btrieve parameter block. */
 INT stat; /* status of Btrieve call */
 char posBlock[128]; /* Btrieve file control block */
 char keyBuff[255]; /* key buffer for this Btrieve file. */
 char dataBuffer[1]; /* Data buffer space. */

} XBUFFER;

// The following function prototypes are copied from pharlap.h. They have to
// be declared with cdecl because they expect their parameters on the stack.
extern int cdecl DX_RMIV_GET(UINT, REAL_ADDR *);
extern int cdecl DX_REAL_INT(INT_BLOCK *);
extern int cdecl DX_RMLINK_GET(REAL_ADDR *, REAL_ADDR *, ULONG *, FARPTR *);

SINT cdecl BTRV (SINT operation,
 UCHAR *posBlock,
 char *dataBuf,
 INT *dataLen,
 char *keyBuf,
 SINT keyNum
 )
{
XBUFFER far *xbuff;
SINT realSeg;
SINT realOffset;
REAL_ADDR realAddr;
REAL_ADDR DOS386RealProc;
ULONG buffSize;
INT_BLOCK callBlock;

/* Check to see that the Btrieve Record Manager has been started. */
 DX_RMIV_GET (BTR_INT, &realAddr); /* Get real-mode int vector. */
 if ((realAddr & 0xFFFF) != BTR_OFFSET) /* is Btrieve installed? */
 return (BTR_ERR);
/* Get the real and protected addresses of the call buffer. */
/* DOS386RealProc and buffSize are not used. */
 DX_RMLINK_GET (&DOS386RealProc, &realAddr, &buffSize,
 (FARPTR *)&xbuff);
 realSeg = (realAddr >> 16) & 0xFFFF; /* real-mode segment. */
 realOffset = realAddr & 0xFFFF; /* real-mode offset. */
/* Get the key and data info from the caller. */
 _fmemcpy (xbuff->posBlock, (char far *)posBlock, 128);
 _fmemcpy (xbuff->keyBuff, (char far *)keyBuf, 255);
 _fmemcpy (xbuff->dataBuffer, (char far *)dataBuf, *dataLen);
/* Move user parameters to xbuff, where Btrieve can find them. */
 xbuff->stat = 0;
 xbuff->xData.function = operation;
 xbuff->xData.statAddress = realAddr + offsetof (XBUFFER,stat);
 xbuff->xData.fcbAddress = realAddr + offsetof (XBUFFER,posBlock);
 xbuff->xData.curAddress = xbuff->xData.fcbAddress + 38;
 xbuff->xData.bufAddress = realAddr + offsetof (XBUFFER,dataBuffer);
 xbuff->xData.bufLength = *dataLen;
 xbuff->xData.keyAddress = realAddr + offsetof (XBUFFER, keyBuff);
 xbuff->xData.keyLength = 255; /* use max since we don't know */
 xbuff->xData.keyNumber = keyNum;
 xbuff->xData.xfaceID = VARIABLE_ID;

/* Make call to the Btrieve Record Manager. */
/* Set up for Extended DOS call. Put real-mode interrupt call data into
the call block. */
 callBlock.intNumber = BTR_INT;
 callBlock.rds = realSeg;
 callBlock.redx = realOffset;
 DX_REAL_INT (&callBlock); /* Issue real mode int, regs specified.*/
 *dataLen = xbuff->xData.bufLength;

/* Copy the key and data info back to the caller. */
 _fmemcpy ((char far *)posBlock, xbuff->posBlock, 128);
 _fmemcpy ((char far *)keyBuf, xbuff->keyBuff, 255);
 _fmemcpy ((char far *)dataBuf, xbuff->dataBuffer, *dataLen);
 return (xbuff->stat); /* return status */
}




[LISTING THREE]



/*********************************************************************
 * Filename....... 386BTRV.C
 * Version........ 1.2
 * Version Date... August 8, 1991
 * Author......... Kerry Loynd
 * Comments....... WatCom C interface to the Btrieve Record Manager

 This particular incarnation of the BTRV call interface is for the
 32-bit Watcom compiler, and uses Pharlap API library calls to
 generate calls to BTRIEVE from programs operating in protected
 mode under Pharlap's 386DOS. To compile it use the command line

 C> WCL386 switcher.c apibtrv.c -l=<pharlap path>dosx32

 To run the application use

 C> BTRIEVE
 C> RUN386 switcher <btrFile>

 where n is the size, in KBytes, of the intermode call buffer. The
 size of that buffer must be greater than or equal to the size of
 the XBUFFER struct plus the maximum data record size used by your
 program. BtrFile is the name of the data file you want to use
 with the application. Use 1 for this application.

 There is a subtle point here that you must watch. ALWAYS pass all
 the defined parameters to a call to BTRV. Be especially careful
 to set *dataLen to 0 if it is not used in the operation you are
 performing. Otherwise, you may find that your program has gone
 over the wall...

 I could have placed a check for an oversize dataLen and returned
 a status code, but, since I can't guarantee whether Novell will
 use the same status code some time in the future, I did not.

 NOTE: You should be aware that Watcom's linker will not
 successfully search DOSX32.LIB unless you use Watcom's librarian
 to extract all the object modules and then re-insert them into the
 library. If you want to understand why, call Watcom or Pharlap.

 *
 * COPYRIGHT (C) 1991. All Rights Reserved.
 * Kerry Loynd Seattle, WA
 * (206)624-7970
 ********************************************************************/


#include <stddef.h>
#include <dos.h>
#include <string.h>

typedef unsigned short int INT;
typedef unsigned long int UINT;
typedef unsigned long ULONG;
typedef unsigned long REAL_ADDR; /* used for real-mode addresses. */
typedef unsigned char UCHAR; /* unsigned 8-bit value */
typedef UCHAR far * FARPTR;

#define SINT short
#define BTR_ERR 20 /* record manager not started */
#define BTR_INT 0x7B /* Btrieve interrupt vector */
#define BTR_OFFSET 0x33 /* Btrieve offset within segment */
#define VARIABLE_ID 0x6176 /* id for variable length records */

/*
 INT_BLOCK is a structure used by 386DOS to invoke real mode
 interrupts.
*/
typedef struct intblock {
 SINT intNumber; /* Interrupt to invoke. */
 SINT rds; /* real mode ds. */
 SINT res; /* real mode es. */
 SINT rfs; /* real mode fs. */
 SINT rgs; /* real mode gs. */
 unsigned long reax; /* real mode eax. */
 unsigned long redx; /* real mode edx. */
} INT_BLOCK;


/* Whatever compiler you use, this structure must be byte aligned. */

typedef struct BTRIEVE_PARMS /* structure passed to Btrieve Rec Mgr */
 {
 REAL_ADDR bufAddress; /* caller's data buffer real mode address */
 INT bufLength; /* length of data buffer */
 REAL_ADDR curAddress; /* user position block real mode address */
 REAL_ADDR fcbAddress; /* Real mode address of disk FCB */
 INT function; /* requested function */
 REAL_ADDR keyAddress; /* Real mode address of user's key buffer */
 char keyLength; /* length of user's key buffer */
 char keyNumber; /* key of reference for request */
 REAL_ADDR statAddress; /* Real mode address of status word */
 INT xfaceID; /* language identifier */
 } BTR_PARMS;

typedef struct xbuffer { /* Structure of intermode call buffer. */
 BTR_PARMS xData; /* Btrieve parameter block. */
 INT stat; /* status of Btrieve call */
 char posBlock[128]; /* Btrieve file control block */
 char keyBuff[255]; /* key buffer for this Btrieve file. */
 char dataBuffer[1]; /* Data buffer space. */
} XBUFFER;

// The following function prototypes are copied from pharlap.h.
// They have to be declared with cdecl because they expect their

// parameters on the stack.
extern int cdecl DX_RMIV_GET(UINT, REAL_ADDR *);
extern int cdecl DX_REAL_INT(INT_BLOCK *);
extern int cdecl DX_RMLINK_GET(REAL_ADDR *, REAL_ADDR *, ULONG *,
 FARPTR *);

SINT cdecl BTRV (SINT operation,
 UCHAR *posBlock,
 char *dataBuf,
 INT *dataLen,
 char *keyBuf,
 SINT keyNum
 )

{
XBUFFER far *xbuff;
SINT realSeg;
SINT realOffset;
REAL_ADDR realAddr;
REAL_ADDR DOS386RealProc;
ULONG buffSize;
INT_BLOCK callBlock;

/* */
/* Check to see that the Btrieve Record Manager has been started. */
/* */
 DX_RMIV_GET (BTR_INT, &realAddr); /* Get real-mode int vector. */
 if ((realAddr & 0xFFFF) != BTR_OFFSET) /* is Btrieve installed? */
 return (BTR_ERR);

/* Get the real and protected addresses of the call buffer. */
/* DOS386RealProc and buffSize are not used. */
 DX_RMLINK_GET (&DOS386RealProc, &realAddr, &buffSize,
 (FARPTR *)&xbuff);
 realSeg = (realAddr >> 16) & 0xFFFF; /* real-mode segment. */
 realOffset = realAddr & 0xFFFF; /* real-mode offset. */

/* Get the key and data info from the caller. */
 _fmemcpy (xbuff->posBlock, (char far *)posBlock, 128);
 _fmemcpy (xbuff->keyBuff, (char far *)keyBuf, 255);
 _fmemcpy (xbuff->dataBuffer, (char far *)dataBuf, *dataLen);

/* */
/* Move user parameters to xbuff, where Btrieve can find them. */
/* */
 xbuff->stat = 0;
 xbuff->xData.function = operation;
 xbuff->xData.statAddress = realAddr + offsetof (XBUFFER,stat);
 xbuff->xData.fcbAddress = realAddr + offsetof (XBUFFER,posBlock);
 xbuff->xData.curAddress = xbuff->xData.fcbAddress + 38;
 xbuff->xData.bufAddress = realAddr + offsetof (XBUFFER,dataBuffer);
 xbuff->xData.bufLength = *dataLen;
 xbuff->xData.keyAddress = realAddr + offsetof (XBUFFER, keyBuff);
 xbuff->xData.keyLength = 255; /* use max since we don't know */
 xbuff->xData.keyNumber = keyNum;
 xbuff->xData.xfaceID = VARIABLE_ID;

/* */
/* Make call to the Btrieve Record Manager. */

/* */

/*
 Set up for the Extended DOS call. Put the real-mode interrupt
 call data into the call block.
*/
 callBlock.intNumber = BTR_INT;
 callBlock.rds = realSeg;
 callBlock.redx = realOffset;
 DX_REAL_INT (&callBlock); /* Issue real mode int, regs specified.*/

 *dataLen = xbuff->xData.bufLength;

/* Copy the key and data info back to the caller. */
 _fmemcpy ((char far *)posBlock, xbuff->posBlock, 128);
 _fmemcpy ((char far *)keyBuf, xbuff->keyBuff, 255);
 _fmemcpy ((char far *)dataBuf, xbuff->dataBuffer, *dataLen);

 return (xbuff->stat); /* return status */
}







[LISTING FOUR]
//BTRV_DEF.H
/* This file contains the manifest constants used to interface with Btrieve
 and the structures used to set up Btrieve files. */

#ifndef _BTRV_DEF_H_
#define _BTRV_DEF_H_

#define OPEN_BTR 0
#define CLOSE_BTR 1
#define INSERT_BTR 2
#define UPDATE_BTR 3
#define DELETE_BTR 4
#define GET_EQUAL 5
#define GET_NEXT 6
#define GET_PREVIOUS 7
#define GET_GT 8
#define GET_GT_EQ 9
#define GET_LT 10
#define GET_LT_EQ 11
#define GET_FIRST 12
#define GET_LAST 13
#define CREATE_BTR 14
#define GET_STATUS 15
#define EXTEND_BTR 16
#define SET_DIRECTORY 17
#define GET_DIRECTORY 18
#define BEGIN_TRANS 19
#define END_TRANS 20
#define ABORT_TRANS 21
#define GET_POSITION 22
#define GET_DIRECT 23

#define STEP_NEXT 24
#define STOP_BTR 25
#define VERSION_BTR 26
#define UNLOCK_BTR 27
#define RESET_BTR 28
#define SET_OWNER 29
#define CLEAR_OWNER 30
#define CREATE_SUPP_INDEX 31
#define DROP_SUPP_INDEX 32
#define STEP_FIRST 33
#define STEP_LAST 34
#define STEP_PREVIOUS 35
#define GET_NEXT_EXTENDED 36
#define GET_PREV_EXTENDED 37
#define STEP_NEXT_EXTENDED 38
#define STEP_PREV_EXTENDED 39
#define INSERT_EXTENDED 40
#define GET_KEY 50
#define SINGLE_WAIT_LOCK 100
#define MULTIPLE_WAIT_LOCK 300
#define SINGLE_NOWAIT_LOCK 200
#define MULTIPLE_NOWAIT_LOCK 400

/* These are for key number options. */
#define UNACCELERATED 0
#define ACCELERATED -1
#define READ_ONLY -2
#define VERIFY -3

/* Key flag bit definitions */
#define NO_KEY_FLAGS 0x0000
#define DUPLICATE_KEY 0x0001
#define MODIFIABLE_KEY 0x0002
#define BINARY_KEY 0x0004
#define NULL_KEY 0x0008
#define SEGMENTED_KEY 0x0010
#define ALT_SORT_KEY 0x0020
#define DESCENDING_KEY 0x0040
#define SUPPLEMENTAL_KEY 0x0080
#define EXTENDED_TYPE_KEY 0x0100
#define MANUAL_KEY 0x0200

/* Extended key type definition bits */
#define STRING_KEY 0
#define INTEGER_KEY 1
#define FLOAT_KEY 2
#define DATE_KEY 3
#define TIME_KEY 4
#define DECIMAL_KEY 5
#define MONEY_KEY 6
#define LOGICAL_KEY 7
#define NUMERIC_KEY 8
#define BFLOAT_KEY 9
#define LSTRING_KEY 10
#define ZSTRING_KEY 11
#define UNSIGNED_BINARY_KEY 14
#define AUTOINCREMENT_KEY 15

/* File attribute bit definitions */

#define NO_FILE_FLAGS 0x00
#define VARIABLE_LENGTH_FILE 0x01
#define TRUNCATE_BLANKS_FILE 0x02
#define PREALLOCATION_FILE 0x04
#define DATA_COMPRESSION_FILE 0x08
#define KEY_ONLY_FILE 0x10
#define FREE_SPACE_10_FILE 0x40
#define FREE_SPACE_20_FILE 0x80
#define FREE_SPACE_30_FILE 0xC0

#define BTRV_EQ 1
#define BTRV_GT 2
#define BTRV_LT 3
#define BTRV_NE 4
#define BTRV_GE 5
#define BTRV_LE 6
#define BTRV_AND 1
#define BTRV_OR 2

/* Make sure that the following structures are packed, using whatever
mechanism
 your compiler defines. */
#pragma pack(1)

/* Typedefs to define the return data from a stat call. */
typedef struct {
 short int recordLength;
 short int pageSize;
 short int indexCount;
 long int recordTotal;
 short int fileFlags;
 char reserved[2];
 short int preAlloc;
} STATS;

/* typedefs needed for CREATE operation */
typedef struct {
 short int keyPosition;
 short int keyLength;
 short int keyFlag;
 long int keyTotal;
 char keyType;
 char nullValue;
 char rsvp[4];
} KEY_SPEC;
typedef struct {
 short int recordLength;
 short int pageSize;
 short int indexCount;
 long int unused;
 short int fileFlags;
 short int reserved;
 short int allocation;
} FILE_SPEC;

/* typedefs needed for EXTENEDED operations */
typedef struct
 {
 char dataType;
 short int fieldLength;

 short int fieldOffset;
 char comparisonCode;
 char andOrExpression;
 union
 {
 int Offset;
 char Value[2];
 } field2;
 }
 FILTER_TERM;
typedef struct
 {
 short int bufferLength;
 char command[2];
 short int maxSkip;
 short int termCount;
 }
 FILTER_HEAD;
typedef struct
 {
 short int recordCount;
 short int fieldCount;
 }
 FILTER_TAIL;
typedef struct
 {
 short int fieldLength;
 short int fieldOffset;
 }
 FILTER_FIELD;
#endif







[LISTING FIVE]

//BTRVERRS.H
/* This file contains the error codes returned by Btrieve. */

#define SUCCESS_BTR 0
#define INVALID_OP_BTR 1
#define IO_ERROR_BTR 2
#define FILE_NOT_OPEN_BTR 3
#define KEY_NOT_FOUND_BTR 4
#define DUPLICATE_KEY_BTR 5
#define INVALID_KEY_NUM_BTR 6
#define DIFF_KEY_NUM_BTR 7
#define INVALID_POSITION_BTR 8
#define END_OF_FILE_BTR 9
#define MOD_KEY_ERROR_BTR 10
#define BAD_FILE_NAME_BTR 11
#define FILE_NOT_FOUND_BTR 12
#define EXT_FILE_ERROR_BTR 13
#define PREIMG_OPEN_ERR_BTR 14
#define PREIMG_IO_ERR_BTR 15

#define EXPANSION_ERROR_BTR 16
#define CLOSE_ERROR_BTR 17
#define DISK_FULL_BTR 18
#define UNRECOVERABLE_BTR 19
#define BTRIEVE_INACTIVE_BTR 20
#define KEY_BUFF_TOO_SHORT_BTR 21
#define DATA_BUFF_TOO_SHORT_BTR 22
#define POS_BLOCK_LEN_ERR_BTR 23
#define PAGE_SIZE_ERR_BTR 24
#define CREATE_IO_ERR_BTR 25
#define KEY_COUNT_ERR_BTR 26
#define INVALID_KEY_POS_BTR 27
#define INVALID_REC_LEN_BTR 28
#define INVALID_KEY_LEN_BTR 29
#define NOT_BTRIEVE_FILE_BTR 30
#define ALREADY_EXTENDED_BTR 31
#define EXTEND_IO_ERR_BTR 32
#define INVALID_EXT_NAME_BTR 34
#define DIR_ERROR_BTR 35
#define TRANSACTION_ERR_BTR 36
#define ACTIVE_TRANS_BTR 37
#define TRANSACT_CTL_IO_ERR_BTR 38
#define END_TRANSACT_ERR_BTR 39
#define TRANSACT_MAX_FILES_BTR 40
#define OP_NOT_ALLOWED_BTR 41
#define INCOMPLETE_ACCEL_ACCESS_BTR 42
#define INVALID_REC_ACCESS_BTR 43
#define NULL_KEY_PATH_BTR 44
#define INCONSISTENT_KEY_FLAGS_BTR 45
#define ACCESS_DENIED_BTR 46
#define MAX_FILES_OPEN_BTR 47
#define BAD_ALT_DEF_FILE_BTR 48
#define KEY_TYPE_ERR_BTR 49
#define OWNER_ALREADY_SET_BTR 50
#define INVALID_OWNER_BTR 51
#define CACHE_WRITE_ERR_BTR 52
#define BAD_INTERFACE_BTR 53
#define VARIABLE_PAGE_ERR_BTR 54
#define AUTOINCREMENT_ERR_BTR 55
#define INCOMPLETE_INDEX_BTR 56
#define EXP_MEMORY_ERR_BTR 57
#define SQUEEZE_BUFF_TOO_SHORT_BTR 58
#define FILE_EXISTS_BTR 59
#define FILTER_LIMIT_REACHED_BTR 64
#define CONFLICT_BTR 80
#define LOCK_ERR_BTR 81
#define LOST_POSITION_BTR 82
#define READ_OUTSIDE_TRANSACT_BTR 83
#define RECORD_IN_USE_BTR 84
#define FILE_IN_USE_BTR 85
#define FILE_TABLE_FULL_BTR 86
#define HANDLE_TABLE_FULL_BTR 87
#define BAD_OPEN_MODE_BTR 88
#define BAD_LOCK_TYPE_BTR 93
#define PERMISSION_ERR_BTR 94



































































February, 1992
PORTING UNIX TO THE 386: DEVICE DRIVERS


Drivers for the basic kernel


 This article contains the following executables: UXDRIVER.ARC


William Frederick Jolitz and Lynne Greer Jolitz


Bill was the principal developer of 2.8 and 2.9BSD and was the chief architect
of National Semiconductor's GENIX project, the first virtual memory
microprocessor-based UNIX system. Prior to establishing TeleMuse, a market
research firm, Lynne was vice president of marketing at Symmetric Computer
Systems. They conduct seminars on BSD, ISDN, and TCP/IP. Send e-mail questions
or comments to lynne@berkeley.edu. (c) 1991 TeleMuse.


In our November 1991 installment, we discussed I/O device initialization via
automatic configuration. Unlike predetermined or static configuration,
automatic configuration is a powerful mechanism that reduces the complexity of
different configurations, adjusts the operating system to best make use of
resources (such as mass storage), and discovers and configures dynamically
changing services (such as networks). Device initialization is an important
element of any operating system -- how else could you use even the simplest
disk drive or floppy or communicate on a network? By incorporating an active
mechanism into our design, our drivers become a stronger base upon which to
build a system. Automatic configuration, or autoconfiguration, is a key item
which differentiates Berkeley UNIX from other systems, and is the hallmark of
any Berkeley-derived system. By thinking ahead from the very beginning (for
example, during initialization), we can anticipate the scope of a porting
project.
Unfortunately, programmers often become so involved in the minutiae of a
particular device type that they minimize the driver interface itself, leading
to inefficient or faulty design. In the case of initialization, the driver is
usually written only for the singular device desired, rather than in
anticipation of future requirements. Later, it is altered to suit the dozen or
so variations of device eventually used. This approach almost always results
in more tinkering than actual coding. However, under the gun, many programmers
just glue the driver onto the side of the operating system and hope it works.
In 386BSD, we considered the UNIX kernel as merely a pervasive "interface,"
and the driver as the extension of this interface -- the last step to contact
with the device itself -- instead of viewing the driver and UNIX as separate
entities. Thus, we incorporated the "interface" into our driver design from
the beginning and ignored the minutiae involved in driver operations which
might confuse the issue.
In November we also introduced some of the vagaries of ISA devices that may be
encountered in our device driver, and attempted to delineate possible
conflicts that autoconfiguration might need to resolve. Here we will describe
in more detail the support macros and functions of the 386BSD kernel that
drivers use to work devices, such as splx() (the interrupt priority-level
management) and the "interrupt vector code." We will also continue to build
upon our "UNIX as interface" paradigm -- the kernel's interface to drivers --
by studying the mechanisms by which we can extend the kernel's grasp through
the driver.
Finally, we examine some important "points of light" (sorry George -- we don't
have a thousand) in our sample drivers, including the console, disk, and clock
interrupt drivers, especially with respect to BSD autoconfiguration and
interfaces. We'll also discuss the basic structure of these drivers, minimal
requirements, and extending the functionality through procedures such as disk
labels.


Device Drivers Needed for the Basic Kernel


As a reminder, our goal was to create a "basic" system that could ultimately
compile itself, so we could use it to accelerate the progress of the port.
(The alternative, cross-support, is considerably more tedious and error prone
because of the communications burden.)
The drivers in the basic kernel must provide a root file system on a hard disk
drive, some kind of terminal function, and a rescheduling clock (because UNIX
is a multiprogramming system). Though a fully functional system will require
more than this, we can build a self-supporting system and refine it later.
Because we desire 386BSD to be available to a broad audience, our choice of
devices is relatively fixed. Our standard 386 ISA PC contains an AT-styled
disk controller and a garden-variety 40-Mbyte disk, the same as hundreds of
thousands of PCs everywhere. A standard PC keyboard and display adapter are
our terminal devices, and our rescheduling clock is implemented from the
onboard programmable timer on the PC's motherboard.


Contents of the Device Driver: A Dumping Ground?


Device drivers frequently suffer from middle-age spread. They accumulate
features from the past, both bad and good, and grow to gigantic proportions.
After a while, they become accumulations of baggage -- "dumping grounds" for
unchallenged code.
For example, one megafirm's research lab was stymied in trying to cudgel a
driver for a special disk and controller. Unfortunately, the lab's entire
budget was shot on servers built with these drivers, and the researchers were
therefore committed to seeing the project through to completion. Over the
years, the driver had grown to be an obscure 3000-plus line monument that
never worked. Yet the firm clung to it in the vague hope that with just one
more line, it might all work out.
Finally, the company hired an outside expert in drivers. In a week, a
completely new driver was written, tested, and completed, using (tiny)
fragments of the old driver that were isolated and incrementally proven. The
new, clearly written driver specifically implemented only the features needed
by the server, and was a fraction of the size. Because it used a "minimalist"
approach, the critical portions of the driver stood out in detail. The
"crisis" came to a deterministic conclusion. (The servers went online.)
The moral here is to distrust anything overbuilt and underjustified. And when
in doubt, simplify, simplify, simplify. With our early drivers, we must
enforce Spartan discipline with respect to "featurism." We want simple and
direct code that provides basic functionality. This is not to say we will be
devoid of any features -- some flexibility is needed because our equipment is
not uniform. For example, though the disk controller on ISA PCs is almost
identical, the disk drives (for example, capacities and geometries) are quite
different. We would also like to support common portions of different display
adapters, so that if we need to run on an MDA in a pinch, we can. Finally, we
would like to use display editors and buffer kernel error printouts on
separate screens. All in all, though not elaborate, our needs are something
more than stone knifes and bearskins.
It's not our intention to expand the content of these drivers much. They
represent the "default" case of bare minimum support. Drivers targeted
specifically at a device (say, the VGA) will also be the appropriate place for
more elaborate functionality (such as bitmap and color palette support), and
they will autoconfigure ahead of the default display driver.


Haven't We Met Somewhere Before?


Many drivers are essentially "copies" of other drivers, because one controller
generally looks pretty much like another. However, when the framework of the
driver is incompletely or incorrectly replicated, new bugs are introduced.
Terminal drivers are alike. In fact, the early UNIX terminal drivers were so
similar that the "pseudo" or "super-driver" tty.c was created just to share
the common code. Likewise, the 386BSD kernel contains support routines for
disk and network devices to minimize driver code replication.


Display and Keyboard Driver


The Display/Keyboard driver (cons.c, available electronically; see page 3)
provides two kinds of output. For user processes, it filters character I/O
from the keyboard and to the display screen through the tty terminal driver.
For the kernel, it provides a character output routine used by the kernel to
print disaster or panic messages. Multiple virtual screen support is
implemented to allow separate screens for the kernel, user session, and
editors. A tiny terminal emulator is present to allow vi, emacs, or jove
editors to run on the basic kernel.


Hard Disk Driver


The hard disk driver (wd.c and wdreg.h, also available electronically; see
page 3) supports the AT-style, programmed I/0 disk controllers (WD100[2347]).
The driver reads in a data structure called a "disk label" off a known sector
(in this case, the second sector of the disk). This allows the driver to
configure itself for arbitrary drives, because it consults the drive first to
obtain information about how to use the drive, before any other transfers are
attempted. By doing this, one driver covers all possible disk drives (and
there are many) -- in other words, one size fits all! Someone, however, must
craft such a disk label and use a program to tack it on before the disk can
otherwise be used. As you might guess, we get into a "chicken and egg"
situation here -- we need the disk label to be on the drive before we can
write it on the drive! This is not a problem in practice, because all drives
have at least the first 17 sectors in common (for example, the first track of
the smallest size). So we use a default disk label that corresponds to the
smallest drive, and overwrite that "logically" when we label the disk to give
it its own identity.



Clock Driver


The clock driver (Listing One, page 93) is the simplest driver in some ways.
We merely need to tickle the 386 ISA motherboard's timer device to generate
clock pulses and gate to the interrupt control unit every cycle. The interrupt
itself will enter the kernel's machine-independent clock processing routine,
hardclock(), for everything else.
We will discuss this in detail when we describe process scheduling. We'll also
see how hardclock() postpones work to a software interrupt clock processing
routine called softclock(), where the work does not degrade real-time
response.


Driver Operations


To get the feel of how the system uses drivers, we need to look at the
functional interfaces between 386BSD and its drivers from the perspective of
the system. Note that devices may fit into one of many different arrangements
when interacting with the system:


Configuration.


During system device autoconfiguration, the system probe()s for the existence
of a device and attempts to attach() it to the system for possible use via the
driver. If the device has subdevices, it attaches each of these slave()s
successively. Because each device/driver combination takes up resources that
may interfere or interact with other resources, it's the responsibility of the
driver(s) to resolve conflicts.


Normal Devices.


Most devices are accessed as files so the system can interact with them.
During normal use, one needs to open() the device, read() or write() to
exchange data with it, select operating modes via ioctl(), and ultimately
close() the device. From the perspective of the system, these events satisfy a
given need and do not necessarily completely resemble the semantics of
ordinary UNIX files. They are similar, but far from exact; that's why they are
called "special files."
For example, driver writers are often surprised when they try to use device
open and close routines to increment and decrement a reference count,
respectively. Many UNIX implementations don't preserve a one-to-one
relationship here; the "closes," for instance, can outnumber the "opens" by a
fair amount (as with a disk driver). The reason is that the kernel may view
the device through many aliases. The point of the open/close routines is to
present the semantics of what should be done to the device to put it in a
consistent state for the requested action. Another surprise is that some
drivers only read/write in units of integral record size, because they operate
with the restrictions of the device underlying the driver. (For example, the
hard disk controller only operates with integral sector size records.)
Just as much of the file paradigm matches the given device, so should the
open/close/read/write mechanisms fit nicely with a record-oriented device,
even if it's just a unit record device. Operating mode shifts should be a
function of different driver filename, raw/block device partitions, or ioctl
modes, so that they correspond to a different view or organization of the same
information.


XXintr().


Devices with interrupts need a means of asynchronously informing the device
driver of the event. Typically, this involves using as small a routine as
possible to minimize time spent with interrupts masked out. The
interrupt-driven portion of the driver is the "bottom," or lower part;
portions that run on the kernel stack of the process that has the driver open
are the "top," or upper part. In common use, the top part initiates a device
operation which causes one to n interrupts to ultimately occur. The top
portion then sleep()s, and eventually the interrupt routine supplies a
wakeup() to allow the top half to finish processing the request.


Special Use.


Beyond the more obvious device driver entry points, other operations may be
less clear.
To provide a means for loading/ unloading the block buffer cache used in
implementing the complex UNIX file system, the strategy() routine of mass
storage device drivers encapsulates the methodology for bounds-checking the
requested transfer strategy() first enqueus the transfer, then starts the lead
item in the queue's I/O request. Thus, it is possible to sort the queue so the
resulting transactions are conducted in an order that minimizes a disk drive's
head movement (thus reducing the time spent seeking on the disk).
Two other driver entry points are of note to disk device drivers: psize(),
which returns the size (in blocks) of a partition that the system uses to
dynamically determine the size of swap space; and dump(), which saves a
snapshot of physical memory on a partition of the disk (usually the swap
space) if the system crashes, so we can find the cause of the crash.
With all special devices, the select() routine provides the inner primitives
of the 386BSD select() system call that scans for activity of a file. mmap()
manages to map, via the mmap() system call, the device's I/O memory address
space into a portion of the calling process's address space. Although this is
the common way a user process (such as an X server) maps in a bitmap display's
frame buffer, it can also be used to map in other kinds of device memory for
direct manipulation by a user process (such as the shared memory of a DSP
chip).


Network Devices.


Network devices interface to the system differently from other devices. They
either send or receive a packet of information from a computer network. The
packets don't go directly to a process, but instead interact with the protocol
modules that implement the necessary processing of a packet. In a sense, the
model here is more akin to "stimulus-response" than the file abstraction of
special devices. It may be that protocol processing bounces the incoming
message out again without making it a user process (as is the case with a
redirected packet), or coalesces it with another (as is the case with a
transport layer segment), or drops it (in the case of a redundant or corrupted
packet).
These protocol modules are internal to the kernel, and they process the
link-level packets that the packet drivers send/receive. The network packet
driver is concerned with recognizing which kind of link or network-level
protocol the packet should be sent to, as well as how to address packets from
these different protocols to the link address of the destination host. As with
Ethernet, thousands of network protocols can simultaneously use the same wire
without interfering with each other.
These devices have no filename associated with them; instead, they have a name
built into the driver itself. A network driver is configured for operation by
passing it parameters (such as its address) via an init() entry point. (It
does this by attaching itself into the protocol.) From that point on,
protocols may choose to direct outbound packets to the device, solely on the
basis of address. (For example, don't select interfaces by name but by
capability to reach other machines.) Such packets are passed via the output()
routine, which wraps a packet from a given protocol into a form suited to the
device's requirements, tacks on the appropriate link-layer address, and
enqueues the transfer. Because many device classes do this in an identical
fashion, a support function is available in the kernel (in the case of
Ethernet devices, ether_output()) that implements the common code. The lead
packet on the queue is then passed to the driver's start() routine, which
passes the packet to the device and reclaims the temporary storage assigned to
the packet.
Packet reception is a simpler process. Upon interrupt, a packet is extracted
from the device in the interrupt routine, placed in a freshly allocated
portion of temporary storage, and matched to a given protocol by means of its
incoming address and form. It is then enqueued on the input queue of the
appropriate protocol, where it will be processed after the conclusion of all
remaining interrupt-level code. Because this is common processing to many
drivers, we also have an ether_input() routine for Ethernet drivers (such as
the common output routines) to share like processing.
To gain access to the device while it is operating or to change operating
modes, fetch statistics, and so forth, an ioctl() entry point allows utility
programs to manipulate the device (by name).


Next Month


Many devices work on an interrupt-driven basis -- they signal an asynchronous
event by generating an exception, which tells the processor to come and
service them. To support this need, we must have the ability to enter, exit,
and mask various processor interrupts. This topic is fairly complex, deserving
detailed discussion, and will be taken up next month.




[LISTING ONE]


/* [Excerpted from /sys/i386/include/param.h] */
 ...
#ifndef __ORPL__
/* Interrupt Group Masks */
extern u_short __highmask__; /* interrupts masked with splhigh() */
extern u_short __ttymask__; /* interrupts masked with spltty() */
extern u_short __biomask__; /* interrupts masked with splbio() */
extern u_short __netmask__; /* interrupts masked with splimp() */
extern u_short __protomask__; /* interrupts masked with splnet() */
extern u_short __nonemask__; /* interrupts masked with splnone() */

asm(" .set IO_ICU1, 0x20 ; .set IO_ICU2, 0xa0 ");

/* adjust priority level to disable a group of interrupts */
#define __ORPL__(m) ({ u_short oldpl, msk; \
 msk = (msk); \
 asm volatile (" \
 cli ; /* modify interrupts atomically */ \
 movw %1, %%dx ; /* get mask to OR in */ \
 inb $ IO_ICU1+1, %%al ; /* get low order mask */ \
 xchgb %%dl, %%al ; /* switch the old with the new */ \
 orb %%dl, %%al ; /* finally, OR both it in! */ \
 outb %%al, $ IO_ICU1+1 ; /* and stuff it back where it came */ \
 inb $ 0x84, %%al ; /* post it & handle write recovery */ \
 inb $ IO_ICU2+1, %%al ; /* next, get high order mask */ \
 xchgb %%dh, %%al ; /* switch the old with the new */ \
 orb %%dh, %%al ; /* finally, or it in! */ \
 outb %%al, $ IO_ICU2+1 ; /* and stuff it back where it came */ \
 inb $ 0x84, %%al ; /* post it & handle write recovery */ \
 movw %%dx, %0 ; /* return old mask */ \
 sti /* allow interrupts again */ " \
 : "&=g" (oldpl) /* return values */ \
 : "g" ((m)) /* arguments */ \
 : "ax", "dx" /* registers used */ \
 ); \
 oldpl; /* return the "old" value */ \
})
/* force priority mask to a set value */
#define __SETPL__(m) ({ u_short oldpl, msk; \
 msk = (msk); \
 asm volatile (" \
 cli ; /* modify interrupts atomically */ \
 movw %1, %%dx ; /* get mask to OR in */ \
 inb $ IO_ICU1+1, %%al ; /* get low order mask */ \
 xchgb %%dl, %%al ; /* switch the old with the new */ \
 outb %%al, $ IO_ICU1+1 ; /* and stuff it back where it came */ \
 inb $ 0x84, %%al ; /* post it & handle write recovery */ \
 inb $ IO_ICU2+1, %%al ; /* next, get high order mask */ \
 xchgb %%dh, %%al ; /* switch the old with the new */ \
 outb %%al, $ IO_ICU2+1 ; /* and stuff it back where it came */ \
 inb $ 0x84, %%al ; /* post it & handle write recovery */ \
 movw %%dx, %0 ; /* return old mask */ \
 sti /* allow interrupts again */ " \
 : "&=g" (oldpl) /* return values */ \
 : "g" ((m)) /* arguments */ \

 : "ax", "dx" /* registers used */ \
 ); \
 oldpl; /* return the "old" value */ \
})
#define splhigh() __ORPL__(__highmask__)
#define spltty() __ORPL__(__ttymask__)
#define splbio() __ORPL__(__biomask__)
#define splimp() __ORPL__(__netmask__)
#define splnet() __ORPL__(__protomask__)
#define splsoftclock() __ORPL__(__protomask__)
#define splx(v) ({ u_short val; \
 val = (v); \
 if (val == __nonemask__) (void) spl0(); /* zero is special */ \
 else (void) __SETPL__(val); \
})
#endif __ORPL__
 ...


[LISTING TWO]

LISTING TWO IS CURRENTLY UNAVAILABLE








































February, 1992
PSEUDO-RANDOM SEQUENCE GENERATOR FOR 32-BIT CPUS


A fast, machine-independent generator for 32-bit Microprocessors




Bruce Schneier


Bruce has an MS in Computer Science and has worked in computer and data
security for a number of public and private concerns. He can be reached at 730
Fair Oaks Ave., Oak Park, IL 60302.


Does the computer world really need another random sequence generator when
there's one built into most every compiler, a mere function call away? Just
use it and be done with it. Unfortunately, if randomness plays a large role in
your program, you'd better pay attention to the code that generates the random
sequence. Considering the average quality of the standard random sequence
generators, it is probably smart to ignore the RND function that comes with
your compiler and roll your own. The generators described here are almost
definitely better and probably faster than the one that came with your
compiler. But first, I'll explain why I even bother.


How Random is Random?


Most random sequence generators are not very random. Simple applications such
as computer games need so few random numbers they hardly notice, but
large-scale Monte-Carlo simulations that use millions or even billions of
random bits to model complex systems are extremely sensitive to the properties
of random number generators. Use a poor random sequence generator, and you
start getting weird correlations and strange results.
Why? Because the generator doesn't produce a random sequence. It probably
doesn't produce anything that looks remotely like a random sequence. Of
course, it is impossible to produce something truly random on a computer.
Computers are deterministic beasts: Stuff goes in one end, completely
predictable operations occur inside, and different stuff comes out the other
end. Put the same stuff in on two separate occasions and the same stuff comes
out both times. Put the same stuff into two identical computers, and the same
stuff comes out of both of them. There are only a finite number of states a
computer can be in (a large finite number, but a finite number nonetheless),
and the stuff that comes out will always be a deterministic function of the
stuff that went in and the computer's current state. That means that any
random sequence generator on a computer (at least, on a Turing machine) is, by
definition, periodic. Anything that is periodic is, by definition,
predictable. And if something is predictable, it can't be random. A true
random sequence generator requires some random input; a computer can't provide
that.
In the military, where these things take on a degree of seriousness not found
anywhere else, random sequence generators tap the natural randomness of the
real world. Noisy diodes, devices that measure atmospheric static, and Geiger
counters all can serve to produce random bit sequences. However, your typical
computer programmer will find this sort of specialized hardware out of reach.
But even if you can plug your Geiger counter into your computer, you're going
to have a problem with repeatability. You might get a beautiful random
sequence, but unless you saved every bit of that sequence there's no way to
reproduce the simulation. The random number generator that came with your
compiler may have lousy statistical properties and repeat after only 16,000
bits, but at least it can reproduce the same sequence on demand.
Where have we gotten? We can't have a true random sequence generator, and even
if we could we couldn't reproduce a given sequence anyway. So if we're stuck
with a periodic and deterministic "pseudo-random" sequence generator, we might
as well choose a good one.


What does a Good Pseudo-Random Sequence Generator Look Like?


Many people have taken a stab at defining this formally (see Knuth's
Semi-numerical Algorithms for an example), but an intuitive understanding
should suffice here. The sequence's period should be long enough so the finite
sequence actually used is not periodic. That is, if you need a billion random
bits for a simulation, don't choose a sequence generator that repeats after
only 16,000 bits. These relatively short, nonperiodic subsequences should be
as indistinguishable as possible from random sequences. For example, they
should have about the same number of 1s and 0s, about half the runs should be
runs of 0s and the other half should be runs of 1s, half the runs should be of
length one, one quarter of length two, one eighth of length three, and so on.
These properties can be empirically measured and then compared to statistical
expectations using a chi-square test. (This, of course, assumes that the
sequence has a flat distribution. If you want a sequence which is 0
three-quarters of the time and 1 one-quarter of the time, my advice is to
start with a flat sequence and then manipulate it -- it's much easier that
way.)
A lot of effort has gone into producing good pseudo-random sequences on
computers. Generators abound in the academic literature, along with various
tests of randomness. All of these generators are periodic (there's no escaping
that), but with potential periods of 2{256} bits and higher, they can be used
for most simulations that expect to terminate before the next ice age.
In the October 1990 issue of Communications of the ACM, a detailed and
comprehensive article by Pierre L'Ecuyer discussed a family of linear
congruential generators and other pseudo-random sequence generators based on
them. The simplest of these have the form: Xn=(A * Xn-1+C) mod m, where Xn is
the nth number of the sequence, Xn-1 is the previous number of the sequence,
and A, C, and m are large constants chosen to make everything just so. (m
should be a prime number, for example.) Many of the generators in this family
can seriously bottleneck a complex program. They can require a large number of
multiplications and divisions per cycle.
The pseudo-random sequence generator described in this article is both fast
and statistically sound. Its period is long enough for most applications, and
it has been optimized for fast execution on 32-bit microprocessors. In
addition, it has no machine-dependent operations, so a specific sequence
generated on one machine will be exactly the same as a sequence generated on
another. The generator produces a pseudo-random sequence of bits. If you need
larger random numbers, take a series of bits and combine them. Three
sequential bits is a random number between 0 and 7. If you collect 4 bits in
sequence and try again if you get a number greater than 1001, then you have a
random number between 0 and 9.
The generator is based on a Linear Feedback Shift Register, or LSFR. Feedback
shift registers have been generating random sequences for a long time. They're
discussed in Numerical Recipes in C and by Knuth. Basically, they consist of a
shift register and a feedback sequence. Everytime a random bit is needed, all
the bits in the register shift 1 bit to the right (the low-order bit falls
into the bit bucket), and a new high-order bit is calculated as a function of
the other bits and appended to the left side of the register. The generator
returns the low-order bit. An LSFR is a special case of a feedback shift
register, where the generator calculates the new high-order bit by taking the
XOR of some subset of the bits in the register (see Figure 1).
In theory, an n-bit LSFR can generate a 2{n}-1-bit pseudo-random sequence
before repeating. To do this, the shift register must cycle through all 2{n}-1
combinations. (It's 2{n}-1 and not 2{n} because a shift register filled with
0s will cause the LSFR to output a never-ending stream of 0s, which is not
particularly useful.) Only certain feedback sequences produce LSFRs with this
maximum-length sequence. I'll spare you the number theory in this article (see
almost any cryptography text for details), and have taken the liberty of
choosing three maximum-length LSFRs for my generator, one each of 32 bits, 31
bits, and 29 bits. The 32-bit LSFR is described in the function RANDOM in
Example 1; it has a period of 2{32}-1, or about 4*10{9} (four billion).
Example 1: Generating random bits with an LSFR

 int RANDOM (){
 static unsigned long register; /*Register must be unsigned so right
 shift works properly.*/
 /*Register should be initialized with some random value.*/
 register = ((((register >> 31) /*Shift the 32nd bit to the first
 bit*/
 ^ (register >> 6) /*XOR it with the seventh bit*/
 ^ (register >> 4) /*XOR it with the fifth bit*/
 ^ (register >> 2) /*XOR it with the third bit*/
 ^ (register >> 1) /*XOR it with the second bit*/
 ^ register) /*and XOR it with the first bit.*/
 & 0x0000001) /*Strip all the other bits off and*/
 <<31) /*move it back to the 32nd bit.*/
 (register >> 1); /*Or with the register shifted
 right.*/
 return register & 0x00000001; /*Return the first bit.*/

 }


LSFRs are competent random sequence generators all by themselves, but they
have some annoying nonrandom properties. Sequential bits are linear, which
makes them a poor candidate for encryption. And large random numbers generated
from sequential bits of this sequence are highly correlated and, for certain
types of applications, not very random at all.
The algorithm VERYRANDOM in Example 2 uses three LSFRs, combined in such a way
to produce a nonlinear sequence of bits. Two of the LSFRs provide inputs to a
2:1 multiplexer; the third LSFR chooses which of the inputs to output. The
length of the three LSFRs are relatively prime to each other; this ensures
that the sequence will not repeat before its maximal length, which is
(2{32}-1) * (2{31}-1) * (2{29}-1), which equals about 2{92} or about 10{27}
(one billion billion billion). Three-quarters of the time the output of
VERYRANDOM is equal to the output of each of the LSFRs that act as inputs to
the multiplexer, but only half the time is it equal to the output of the LSFR
that switches the multiplexer. This fact is useful to cryptanalysts trying to
break ciphers based on this generator, and in fact makes the sequence
generator all but useless for encryption. Still, it should be perfectly
acceptable as a random sequence generator for simulations and the like.
Example 2: Combining three LFSRs to increase the sequence length

 int VERYRANDOM () {
 static unsigned long regA, regB, regC;
 /*regA, regB, and regC should be initialized with some random
 value.*/
 regA = ((((regA>>31)^(regA>>6)^(regA>>4)^(regA>>2)^(regA<<1)^regA)
 & 0x00000001)<<31) <regA>>1);
 regB = ((((regB>>30)^(regB>>2)) & 0x00000001)<<30) (regB>>1);
 regC = ((((regC>>28)^(regC>>1)) & 0x00000001)<<28) (regC>>1);
 /*regB is a 31-bit LFSR. regC is a 29-bit LFSR.*/
 /*Both feedback sequences are chosen to be maximum length.*/
 return ((regA & regB) (!regA & regC)) & 0x00000001;
 /*Above is equivalent to: if A then return B else return C.*/
 /*Variants: return ((regA & regB) (regA & regC) (regB & regC))
 0x00000001; Above variant returns the majority of A, B, and C.
 return (regA ^ regB ^ regC) & 0x00000001;
 Above variant returns the XOR of A, B, and C. */
 }

Two variants of VERYRANDOM are also provided in the source code. Both modify
the way the three LSFRs interact to produce the output bit. In the first
variant, the output bit is the XOR of the three LSFR inputs. In the second
variant, the output bit is the majority of the three LSFR inputs. Note that
these two variants also produce a nonlinear sequence, and also have a period
of 2{92}-1.
The random bits produced by RANDOM and VERYRANDOM are repeatable; the same
input seeds will produce the same output sequences. For applications where
this is not a requirement, XORing the output of this generator with some
system-dependent register (the low-order bit of some clock or garbage
collection register, for example), will produce a sequence so close to random
for most applications that it might as well be.
Other variants are also possible. You could decimate the sequence; that is,
only use some of the bits produced. Collecting, for example, only every third
bit of the sequence will produce a different sequence. And if 2{n}-1 is not
divisible by 3, then the decimated sequence will have the same length as the
original sequence. There are various other decimation techniques as well. None
of them really do much to salvage LSFRs for encryption purposes, though.
With any simulation, it is always wise to check a few different generators.
Sometimes you'll get strange correlations with a particular generator and a
particular application. Make sure that any output from your program is real,
and not just an artifact of the pseudo-random number generator.
One final note of caution: There are many more feedback arrangements for
various-length LSFRs that produce maximum-length sequences, but don't fiddle
with the feedback sequences without the proper mathematical theory. The
particular bits that are XORed together may seem arbitrary, but they are
chosen to ensure that the sequence takes 2{n}-1 bits to repeat. Blindly
choosing a different feedback sequence can easily make the output sequence
repeat after only a couple of hundred bits, and you would be better off
sticking with your store-bought RND function.


_PSEUDO-RANDOM SEQUENCE GENERATOR FOR 32-BIT CPUs_
by Bruce Schneier



[LISTING ONE]


int RANDOM () {
 static unsigned long register; /*Register must be unsigned so right
 shift works properly.*/
 /*Register should be initialized with some random value.*/
 register = ((((register >> 31) /*Shift the 32nd bit to the first bit*/
 ^ (register >> 6) /*XOR it with the seventh bit*/
 ^ (register >> 4) /*XOR it with the fifth bit*/
 ^ (register >> 2) /*XOR it with the third bit*/
 ^ (register >> 1) /*XOR it with the second bit*/
 ^ register) /*and XOR it with the first bit.*/
 & 0x0000001) /*Strip all the other bits off and*/
 <<31) /*move it back to the 32nd bit.*/
 (register >> 1); /*Or with the register shifted right.*/
 return register & 0x00000001; /*Return the first bit.*/
}








[LISTING TWO]


int VERYRANDOM () {
 static unsigned long regA, regB, regC;
 /*regA, regB, and regC should be initialized with some random value.*/
 regA = ((((regA>>31)^(regA>>6)^(regA>>4)^(regA>>2)^(regA<<1)^regA)
 & 0x00000001)<<31) (regA>>1);
 regB = ((((regB>>30)^(regB>>2)) & 0x00000001)<<30) (regB>>1);
 regC = ((((regC>>28)^(regC>>1)) & 0x00000001)<<28) (regC>>1);
 /*regB is a 31-bit LFSR. regC is a 29-bit LFSR.*/
 /*Both feedback sequences are chosen to be maximum length.*/
 return ((regA & regB) (!regA & regC)) & 0x00000001;
 /*Above is equivalant to: if A then return B else return C.*/
 /* Variants: return ((regA & regB) (regA & regC) (regB & regC)) &
 0x00000001; Above variant returns the majority of A, B, and C.
 return (regA ^ regB ^ regC) & 0x00000001;
 Above variant returns the XOR of A, B, and C. */
}











































February, 1992
 PROTECTED-MODE DEBUGGING USING IN-CIRCUIT EMULATORS


Making a case for emulation




Tovey Barron


Tovey is a senior engineer for Intel's Development Tools Operation and can be
contacted at 5200 NE Elam Young Parkway, Hillsboro, OR 97124.


Since the introduction of the 80386 and 80486, software development based on
them has been done in the PC platform arena. While PC manufacturers have made
good use of many of the hardware enhancements incorporated in these CPUs, the
constraint of backward compatibility to the original PC and its real-mode
operating system has limited use of numerous architectural features that can
provide a powerful, protected-mode programming environment.
Although application software writers have only recently begun to develop code
that is making the PC world aware of the advantages of protected-mode
programming, in embedded systems development these features have long been
recognized and utilized to their fullest extent. In either case, writing and
debugging protected-mode code to run on the 386/486 requires an understanding
of the built-in architectural features that make protected mode possible. Both
chips share features -- hidden registers, segment access rights and privilege
levels, cached segment descriptor values and descriptor tables, multitasking,
and paging -- that can make debugging even simple programs a daunting task.
Fortunately, tools such as the in-circuit emulators can make access to the CPU
-- and these seemingly complicated features -- as easy as typing in a single
command.
A typical in-circuit emulator has two main sections: The emulator chassis,
containing circuit boards on which the trace buffer, breakpoint logic, and so
on, are found; and the probe, containing a microprocessor (much like the one
being emulated) and some support logic. The emulator is installed by replacing
the CPU in the target system with the emulation probe. Software controlling
the functions of the emulator runs on a host computer, communicating with the
emulator chassis via a parallel or serial link. The emulator probe dimensions
must fit comfortably within the target system. Beyond that, the only
significant special provision needed to support an emulator is a host computer
(typically a PC). Commands entered at the host computer are transferred to the
emulator chassis, and from there to the probe. The CPU on the probe matches or
surpasses the capability of the CPU on the target, so performance during
emulation is the same as performance during normal operation; timings,
interrupt latencies, software functions, and so on, remain the same. The
difference is that, via the host computer, you control operation of the
emulation probe and by extension, the target system. That is, the target can
be made to either run at full speed or execute one assembly language
instruction at a time.
The emulator also has access to registers and data structures hidden inside
the CPU. Manipulating these hidden features without an emulator would require
elaborate procedures, and even they wouldn't provide full access. The examples
in this article are based on debug sessions conducted using an Intel ICE-386
DX. Similar features are available on ICEs from Intel and other in-circuit
emulator vendors.
Generating valid protected-mode data structures such as descriptors,
descriptor tables, and tasks is normally done by the system integrator -- the
person who ties together the disparate parts of the programs that make up the
overall protected-mode environment. It's up to the system integrator to keep a
list of all the various segments and their access rights, and the tasks and
their associated task data structures: the task state segment and local
descriptor table, the interrupt handlers and interrupt descriptor table, and
the global descriptor table (where just about everything else is defined).
Once these different protected-mode data structures have been built, it's the
system integrator's job to load the program to the target system memory and
try executing code.


Stepping into Protected Mode


Often, the first difficulty encountered by the system integrator occurs when
stepping through the code that transfers the CPU from real to protected mode.
The code itself is not terribly complex, just a number of register loads.
However, the values placed in the registers set up the initial conditions for
protected-mode operation, and can easily be totally wrong, generating
immediate protection exceptions. Perhaps even worse, the register values can
be subtly incorrect, so that similar-type segments are accessed in error,
generating an exception only after damage has been done.
Example 1 shows 386 code that places the CPU into protected mode. Once the
Global Descriptor Table (GDT) register has been loaded to point to the area of
memory where the GDT will reside, and the Protection Enabled (PE) bit in the
CRO register has been set, the CPU is in protected mode. The next instructions
load up the various segment registers with protected-mode values. To answer
the questions that arise at this point, you single-step through the code,
checking appropriate memory and register values as you go. Unfortunately,
without an emulator it is almost impossible to single-step this code without
running into problems.
Example 1: 386 code that places the CPU into protected mode

 lgdt pword ptr gdt_reg_values
 ; Load global descriptor table register
 mov eax, cr0 ; Set Protection Enabled bit to go into
 protected
 or eax,1 ; mode
 mov cr0,eax
 jmp next ; Flush prefetch queue to get rid of
 instructions
 ; decoded in real mode
 next:
 xor bx,d_seg_selector; Initialize data selectors with appropriate
 mov ds,bx ; values - here, we see FLAT model
 mov es,bx ; initialization
 mov fs,bx
 mov gs,bx
 mov ss,bx
 pejump:
 jmp full_prot_code ; FAR jump, loads CS register with protected
 mode
 ; value and branches to full protected mode code

Debuggers create the effect of single-stepping by setting the Trap Flag (TF)
bit in the EFLAGS register of the CPU. With the TF bit set, starting execution
causes the chip to execute one instruction, then automatically generate an
interrupt. The interrupt clears the TF bit while the interrupt code places the
debugger back in command mode, ready for another single-step. Debuggers
typically start executing application code by setting up a return address on
the stack which points to the instruction where execution is to begin, then
performing a RET instruction. This works quite well, except when there is a
transition between real and protected mode. Then the debugger starts out in
real mode, the application switches to protected mode, and suddenly the
debugger gets confused.
An emulator lets you single-step through this type of code, and gives you
access to the important hidden values, allowing you to answer the following
questions:
1. Is there a problem because the GDT register hasn't been correctly loaded
and is pointing somewhere other than at the actual GDT space? It's simple
enough to use and emulator to display the contents of the GDT register; see
Example 2(a). The values returned by the emulator show the base address of the
GDT set at physical address 11000, and a GDT length of 77H bytes. (The default
numeric base of the emulator is hex.) At 8 bytes per descriptor, starting at
offset 0 from the beginning of the GDT, there must be 15 descriptors in the
table.
Example 2: (a) Using an emulator to display the contents of the GDT register;
(b) displaying the address of the variable holding the initial GDT register
values; (c) displaying the contents of the variable.

 (a)

 hlt> gdtbas /* Display the base field of the GDT register */

 11000
 hlt>
 hlt> gdtlim /* Display the limit field of the GDT register */
 77

 (b)

 hlt> &gdt_reg_values /* Display address of variable. */
 some-address-here /* The address would be displayed in */
 /* virtual format, i.e., seg:offset or */
 /* ldt:seg:offset, depending on whether */
 /* code is in real or protected mode */

 (c)

 hlt> byte &gdt_reg_values L length 6
 0ffff0168L 77 00 00 10 01 00

If the values in the GDT register don't make sense, it may be that the program
is loading them from the wrong memory location, or that the correct location
is being accessed but the values at the location are incorrect. In either
case, the emulator can help prove the accuracy of the actions being taken.
Displaying the address of the variable holding the initial GDT register values
is simple; see Example 2(b). Displaying the contents of the variable is almost
as simple; see Example 2(c). Here, the BYTE command is used to display 6 bytes
of memory starting at the linear (L) address of the variable. The response
displays the address as well. (Bytes are displayed in Little Endian format.)
2. Is the memory location d_seg_selector holding the correct value? As we've
seen, it is simple to display the contents of memory. If the program has been
loaded with full symbolic debug information, the emulator can display the
contents of a variable directly; see Example 3.
Example 3: If the program has been loaded with full symbolic debug
information, the emulator can display the contents of a variable directly.

 hlt> d_seg_selector
 003B

3. Have the correct segment descriptor values been read from the GDT and
cached into the segment registers? If program execution gets beyond
protected-mode initialization and begins generating exceptions on data or
stack accesses, the segment registers may have been loaded with invalid
values. The code in Example 1 shows that all the segment registers (except CS)
have been loaded with the same selector value. This is typical of a program
that runs in FLAT mode and means we only need to check one descriptor to find
out whether or not the selector is valid.
We already know the value of d_seg_selector is 3BH. We can decode this by
hand: 003BH = 0000 0000 0011 1011, where: bits 0,1 (the RPL, or Requested
Privilege Level) are three, the least privileged; bit 2 (the TI, or Table
Indicator bit) shows that the descriptor is in the GDT; and bits 3 through 13
(the table index) show that descriptor is in slot #7.
Now that we know selector 38H refers to descriptor #7 in the GDT, and we know
the base address of the GDT, we can display the appropriate memory locations
and decode the descriptor contents. While an example of this work would be
instructive, it is much easier to use emulator descriptor table commands to do
the work for us.
In Example 4(a), the DT command decodes the selector, reads the appropriate
descriptor, and decodes its value. The first line of the response shows us
that the selector 3BH refers to descriptor #7 in the GDT, and also shows the 8
bytes of data contained in the descriptor. The second line decodes the data
and gives information about the segment: access rights, privilege level, and
so on. See Table 1 for details.
Example 4: (a) The DT command decodes the selector, reads the appropriate
descriptor, and decodes its value; (b) if the segment limit appears too small,
the DT command can be used to change it to (b) or (c).

 (a)

 hlt> dt (3BH)
 GDT(7T) 0040F30116C0017B
 DSEG BASE=000116C0 LIMIT=0017B DPL=3 P=1 G=0 V=0 B=1 E=0 W=1 A=1

 (b)

 hlt> dt(38) .limit = 27B

 (c)

 hlt> dt(38) .limit = dt(38) .limit + 100

Table 1: Meaning of the response to the DT command shown in Example 4

 Response Meaning
 -------------------------------------------------------------------------

 DSEG The segment is a data segment.
 BASE The base address of the segment is 116C0H.
 LIMIT The length of the segment is 17BH bytes.
 DPL The descriptor privilege level is 3, least privileged.
 P The present bit shows that the descriptor is marked present
 in memory.
 G The granularity bit shows that the limit field contains a
 value that should be considered the least significant bits

 of a 20-bit limit (maximum length 1 Mbyte).
 V The V bit is user-definable and may be used for any
 purpose. A typical use is marking a segment "locked," so
 it's never swapped out of memory.
 B The B bit determines the default stack pointer register used
 for stack accesses (SP or ESP); for stack segments only.
 E The E bit determines whether a segment is a data segment (0)
 or code segment(1).
 W The W bit determines if a segment is read only (0) or
 read/write (1).

With the information displayed by the DT command, it is simple to determine
whether appropriate descriptor values are present. If on examination, a
descriptor appears to hold incorrect values, the DT command can also be used
to change descriptor field values. For instance, if the segment limit appears
too small, the DT command can be used to change it. See Example 4(b) or 4(c).
4. Is the FAR jump really branching to the right place? This question can be
answered by simply displaying code at the branch address shown in Example 5.
Example 5: Using the ASM command to display memory at the specified address

 hlt> asm full_prot_code length 5
 ; :TASK_1.PROC_A.full_prot_code
 0098:0014:00000000H 1E PUSH DS
 0098:0014:00000000H 66B9F900 MOV CX,0F9H
 0098:0014:00000000H 8ED9 MOV DS,CX
 0098:0014:00000000H 8EC1 MOV EX,CX
 0098:0014:00000000H 55 PUSH EBP

Here we see the ASM command, which displays memory at the specified address as
assembly language instructions. The emulator responds by displaying the fully
qualified address of the variable full_prot_code, including the module and
procedure in which the variable is defined. The instructions follow, with full
logical address (LDT_selector:segment_selector:offset) and opcode/operands.


Descriptor Table Access/Display


Just as emulators make it easy to access and change the contents of single
descriptors, they also make it easy to display descriptor tables. This is
handy when working with protected-mode programs that include numerous
descriptors and multiple local descriptor tables.
The GDT command displays the contents of the global descriptor table as
pointed to by the GDT register. For brevity, Example 6 shows only excerpts
from the table.
Example 6: The GDT command displays the contents of the global descriptor
table as pointed to by the GDT register.

 hlt> gdt /* Display the contents of the GDT */
 GDT(1T) 00009201100000FF
 DSEG BASE=00011000 LIMIT=000FF DPL=0 P=1 G=0 V=0 B=0
 E=0 W=1 A=0
 GDT(17T) 00409A01146C0055
 ESEG BASE=0001146C LIMIT=00055 DPL=0 P=1 G=0 V=0 D=1
 C=0 R=1 A=0
 GDT(19T) 0000820112000027
 DATBL BASE=00011200 LIMIT=00027 DPL=0 P=1 G=0 V=0
 GDT(26T) 0000EC0000150000
 CALLG3 SSEL=0015 SOFF=00000000 DPL=3 P=1 WCO=00

The example includes display of descriptors for data segments (DSEG),
executable code segments (ESEG), a descriptor for a segment containing a local
descriptor table (DTABL), and a descriptor containing a call gate to 32-bit
code (CALLG3). Descriptor types not shown in the example include available and
busy task state segments (ATSSD3, BTSSD3), task gates (TASKG), and so on. The
bits seen in the ESEG display, which are different from those described in the
data segment in Example 6 are listed in Table 2.
Table 2: The bits seen in the ESEG display that are different from those
described in the data segment in Example 6

 Response Meaning
 -------------------------------------------------------------------------

 D The D bit determines the default code type, either 16-bit code
 or 32-bit code. The code type determines the default register
 size and addressing capabilities (that is, 16- or 32-bit index
 registers).
 C The C bit determines whether or not a code segment is
 conforming.
 R The R bit determines whether the code segment is execute only

 or execute/read.

In a similar fashion, it's possible to display the Interrupt Descriptor Table
(IDT). This is useful when debugging protected-mode interrupts, where two
levels of indirection can make it difficult to access interrupt handler code.
(Interrupt descriptors contain selectors for code segments which in turn
contain interrupt handler code.) Again, for brevity, Example 7 shows only a
portion of the full display. Field names within the interrupt descriptors are
listed in Table 3.
Example 7: Displaying the Interrupt Descriptor Table (IDT). This can be useful
when debugging protected-mode interrupts.

 hlt> idt

 IDT(0T) FFFF8E00001803A4
 INTG3 SSEL=0018 SOFF=FFFF03A4 DPL=0 P=1
 IDT(1T) FFFF8E00001803A8
 INTG3 SSEL=0018 SOFF=FFFF03A8 DPL=0 P=1
 IDT(2T) FFFF8E00001803AC
 INTG3 SSEL=0018 SOFF=FFFF03AC DPL=0 P=1

Table 3: Field names within the interrupt descriptors; see Example 7.

 Response Meaning
 -------------------------------------------------------------------------

 INTG3 Defines the descriptor as an Intel386 interrupt gate.
 SSEL This field contains the code segment selector for the interrupt
 handler.
 SSOFF This field contains the offset to the interrupt handler.

It is also possible to display the current local descriptor table, as
described by the LDT register, using the LDT command. All three descriptor
table commands not only allow display of entire descriptor tables, but also
allow access to display single descriptors. Thus, if you already know the
descriptor number, you can display its contents. See Example 8(a).
Example 8: (a) Displaying the current local descriptor table using the LDT
command; (b) writing the limit field of the descriptor in slot #3 of the LDT,
whose descriptor is in slot #7 of the GDT.

 (a)

 hlt> gdt(19t)
 DGT(19T) 0000820112000027
 DTABL BASE=00011200 LIMIT=00027 DPL=0 P=1 G=0 V=0

 (b)

 gdt(7).1dt(3).limit = 12345H

More importantly, if you are working with multitasking programs, there are
multiple LDTs. At times, you might wish to display or change the contents of a
descriptor in an LDT not currently in use. An LDT is really nothing more than
a data segment containing special operating system-related data: the
descriptors for the segments in a given task. The LDT is a special segment, so
its descriptor resides in the GDT. Example 8(b) shows how to write the limit
field of the descriptor in slot #3 of the LDT whose descriptor is in slot #7of
the GDT.


Multitasking


Protected-mode programs meant to run under multitasking operating systems are
not difficult to write. Each program is written separately, as if it were to
run on its own. Even operating system code, which must handle task switching,
is not that difficult to write because of on-chip hardware that assists in the
task switching process. However, debugging multitasking protected-mode systems
can be a difficult job because all the on-chip hardware features that aid in
task switching can obscure the different operations and functions that make up
the task switching process.
As shown in many of the previous examples, an emulator can be used to display
and access many of the more difficult hardware and software aspects of
protected-mode programming. Task switching not only adds multiple LDTs, but
also adds Task State Segments (TSSs) for each task. Each TSS is a special
segment used to save/restore the processor registers when entering or leaving
a task. In this way, the processor can restore the TSS values, reentering a
task with the registers set up just as they were before the task was last
exited.
An emulator can be used to display/change the contents of a TSS; see Example
9(a). The TSS command shows us the current TSS referred to by the task
register. We see that the current task is a 386 task, rather than a 286 task.
We also see the stack segment register and associated stack pointer register
for each of the four privilege levels. This information makes it much easier
to debug stack-related problems when multiple privilege levels are in use. We
also see the selector for the current LDT and the "back link" field, which
contains a return TSS selector value if the current task has been called by a
previous task.
Example 9: (a) Using an in-circuit emulator to display/change the contents of
a TSS; (b) displaying the contents of any task using the task's TSS selector;
(c) displaying the contents of the privilege level #2 stack pointer for the
TSS; (d) locating the contents of a segment in a noncurrent task by knowing
the selector to the task's TSS; (e) using the DT command to display the task's
LDT descriptor; (f) displaying the LDT.

 (a)

 hlt> tss
 386 TSS

 SSO= 00f0 ESP0= 00000101 SS1= 001D ESP1= 00000101
 SS2= 0000 ESP2= 00000000
 EAX= 00000000 EBX= 00000000 ECX= 00000000 EDX= 00000000
 DS= 00FB ES= 00fb FS= 00fb GS= 00fb
 ESI = 00000000 EDI= 00000000
 SS= 001d CS= 0025

 ESP= 00000101 EIP= 00000000
 EBP= 00000101 LDTR=00b0
 LINK= 0068 EFLAGS= 00000000 CR3= 00000000

 (b)

 hlt> tss(50)

 (c)

 hlt> tss(50).esp2

 (d)

 hlt> tss(50).ldtr
 0068

 (e)

 hlt> dt(68)
 GDT(13T) 0000820112000027
 DTABL BASE=00011200 LIMIT=00027 DPL=0 P=1 G=0 V=0

 (f)

 hlt> gdt(13t).ldt /* For brevity, the LDT will not be shown */

It is also possible to display the contents of any task state segment, as long
as you know the task's TSS selector. The emulator simply looks up TSS's
descriptor in the GDT and displays the appropriate TSS's contents. The
following example shows the command which would display the contents of a TSS
whose TSS descriptor has a selector of 5OH; see Example 9(b).
Each of the fields within the TSS is accessible via a field name. For example,
it's possible to display the contents of the privilege level #2 stack pointer
for the TSS in Example 9(b) by way of Example 9(c). This type of command can
be extremely useful if you wish to locate the contents of a segment in a
noncurrent task, but you don't know the selector for the segment nor its base
address. As long as you know the selector to the task's TSS you can first
display the LDT register field of the task; see Example 9(d). (If you don't
know the task's TSS, you can always display the GDT and experiment with the
different TSSs found there. The GDT will also display the different LDT
descriptors, from which you can choose and experiment.)
Then you can use the DT command, see Example 9(e), using the selector to
display the task's LDT descriptor. We now know the slot number of the task's
LDT descriptor; with this information we can display the LDT; see Example
9(f).
The display of the LDT should allow you to choose the segment in which you are
interested. You now know the base address of the segment, its limit, and
whether or not it is present in memory. At this point, it is possible to use
emulator commands to display memory within the segment of interest.


Hidden Register Access


I've already shown that an emulator allows display and change of the hidden
fields within the GDT register. An emulator also allows the same access to
other hidden registers and register fields. Example 10 shows emulator commands
to display the contents of the LDT register, the task register, the IDT
register, and various fields within the segment registers.
Example 10: Emulator commands to display the contents of the LDT register,
task register, IDT register, and various other fields

 hlt> ldtbas /* Display base field of the current LDT */
 00011200H

 hlt> idtlim /* Display limit field of current LDT */
 00ffH

 hlt> tr /* Display selector field of current TR */
 0080H

 hlt> dslim = dslim + 35H /* Change limit of current data segment */

 hlt> cs /* Display selector in CS register */
 025H

 hlt> csar /* Display the access rights bits as they */
 0bbH /* appear in the current CS register */



Summary



The commands shown in this article highlight just a few emulator debug
features. Emulators also provide many other features, such as breaking
emulation after a specific or nonspecific task switch, displaying the contents
of page tables and the page directory, single-stepping high-level code, mixed
assembly and high-level code, and others too numerous to mention.
Having an emulator present is not a substitute for understanding the
protected-mode nature of the 386/486. However, an emulator does make it
possible to easily access just about every possible architectural feature,
making debugging protected-mode, multitasking programs almost as easy as
working with real-mode code.



_PROTECTED-MODE DEBUGGING USING IN-CIRCUIT EMULATORS_
by Tovey Barron

Example 1


 lgdt pword ptr gdt_reg_values
 ; Load global descriptor table register
 mov eax,cr0 ; Set Protection Enabled bit to go into protected
 or eax,1 ; mode
 mov cr0,eax
 jmp next ; Flush prefetch queue to get rid of instructions
 ; decoded in real mode
next:
 xor bx,d_seg_selector; Initialize data selectors with appropriate
 mov ds,bx ; values - here, we see FLAT model
 mov es,bx ; initialization
 mov fs,bx
 mov gs,bx
 mov ss,bx
pejump:
 jmp full_prot_code ; FAR jump, loads CS register with protected mode
 ; value and branches to full protected mode code




Example 2

(a)

hlt> gdtbas /* Display the base field of the GDT register */
11000
hlt>
hlt> gdtlim /* Display the limit field of the GDT register */
77

(b)

hlt> &gdt_reg_values /* Display address of variable. */
some-address-here /* The address would be displayed in */
 /* virtual format, i.e. seg:offset or */
 /* ldt:seg:offset, depending on whether */
 /* code is in real or protected mode */

(c)

hlt> byte &gdt_reg_values L length 6
0ffff0168L 77 00 00 10 01 00




Example 3

hlt> d_seg_selector
003B



Example 4

(a)

hlt> dt(3BH)
GDT(7T) 0040F30116C0017B
 DSEG BASE=000116C0 LIMIT=0017B DPL=3 P=1 G=0 V=0 B=1 E=0 W=1 A=1


(b)

hlt> dt(38).limit = 27B

(c)

hlt> dt(38).limit = dt(38).limit + 100



Example 5

hlt> asm full_prot_code length 5
; :TASK_1.PROC_A.full_prot_code
0098:0014:00000000H 1E PUSH DS
0098:0014:00000000H 66B9F900 MOV CX,0F9H
0098:0014:00000000H 8ED9 MOV DS,CX
0098:0014:00000000H 8EC1 MOV EX,CX
0098:0014:00000000H 55 PUSH EBP



Example 6

hlt> gdt /* Display the contents of the GDT */
GDT(1T) 00009201100000FF
 DSEG BASE=00011000 LIMIT=000FF DPL=0 P=1 G=0 V=0 B=0 E=0 W=1 A=0
GDT(17T) 00409A01146C0055
 ESEG BASE=0001146C LIMIT=00055 DPL=0 P=1 G=0 V=0 D=1 C=0 R=1 A=0
GDT(19T) 0000820112000027
 DTABL BASE=00011200 LIMIT=00027 DPL=0 P=1 G=0 V=0
GDT(26T) 0000EC0000150000
 CALLG3 SSEL=0015 SOFF=00000000 DPL=3 P=1 WCO=00



Example 7

hlt> idt

IDT(0T) FFFF8E00001803A4
 INTG3 SSEL=0018 SOFF=FFFF03A4 DPL=0 P=1
IDT(1T) FFFF8E00001803A8

 INTG3 SSEL=0018 SOFF=FFFF03A8 DPL=0 P=1
IDT(2T) FFFF8E00001803AC
 INTG3 SSEL=0018 SOFF=FFFF03AC DPL=0 P=1



Example 8

(a)

hlt> gdt(19t)
GDT(19T) 0000820112000027
 DTABL BASE=00011200 LIMIT=00027 DPL=0 P=1 G=0 V=0


(b)

gdt(7).ldt(3).limit = 12345H



Example 9

(a)

hlt> tss
386 TSS

SS0= 00f0 ESP0= 00000101 SS1= 001d ESP1= 00000101
SS2= 0000 ESP2= 00000000
EAX= 00000000 EBX= 00000000 ECX= 00000000 EDX= 00000000
DS= 00fb ES= 00fb FS= 00fb GS= 00fb
ESI= 00000000 EDI= 00000000
SS= 001d CS= 0025
ESP= 00000101 EIP= 00000000
EBP= 00000101 LDTR=00b0
LINK= 0068 EFLAGS= 00000000 CR3= 00000000


(b)

hlt> tss(50)


(c)

hlt> tss(50).esp2



(d)

hlt> tss(50).ldtr
0068


(e)

hlt> dt(68)

GDT(13T) 0000820112000027
 DTABL BASE=00011200 LIMIT=00027 DPL=0 P=1 G=0 V=0


(f)

hlt> gdt(13t).ldt /* For brevity, the LDT will not be shown */



Example 10

hlt> ldtbas /* Display base field of the current LDT */
00011200H

hlt> idtlim /* Display limit field of current IDT */
00ffH

hlt> tr /* Display selector field of current TR */
0080H

hlt> dslim = dslim + 35H /* Change limit of current data segment */

hlt> cs /* Display selector in CS register */
0025H

hlt> csar /* Display the access rights bits as they */
0bbH /* appear in the current CS register */


































February, 1992
 PROGRAMMING WITH PHAR LAP'S 286/DOS-EXTENDER


A DOS-extended Turtle graphics language


 This article contains the following executables: TURTLE.ARC


Al Williams


Al is a freelance writer and a consultant on the Space Station Freedom
project. His book, DOS 5: A Developer's Guide, is available from M&T Books. He
can be reached at 310 Ivy Glen Court, League City, TX 77573.


Traditional graphics programming techniques are difficult to master. Graphics
languages such as LOGO, however, simplify many types of graphics programming.
In fact, school children routinely use LOGO (developed at MIT) to draw
fractals, recursive patterns, and other sophisticated drawings by controlling
a graphics "turtle." I present in this article TURTLE, an extensible graphics
language (written in Microsoft C 6.0) that's loosely based on LOGO. Because I
wanted several graphics buffers available for animation purposes, I turned to
the Phar Lap 286DOS-Extender for access to more memory.
TURTLE challenges 286DOS-Extender in several areas I consider crucial: the
ease with which you can access memory with the extender; the ease with which
you can manipulate physical addresses (for example, the screen buffer); the
complexity of interrupt handling; the usefulness of the DLL tools; and how
much performance penalty the extender incurs.
In addition to examining the Phar Lap 286DOS-Extender, I'll also provide an
in-depth look at TURTLE. For instance, in developing TURTLE, I created a
general-purpose, extensible command interpreter based on DLLs. I'll also
explore some useful protected-mode techniques. To follow along, you'll need
Microsoft C 6.0, the DOS Extender, an 80286 (or better) PC, and a VGA graphics
card.


Why a 286 DOS Extender?


Many common DOS extenders can only run on 80386/486 machines. Many PCs have
80286 processors, so this is a serious obstacle for developers who want to use
a DOS extender. However, the Phar Lap tool is a 80286-based DOS extender (that
also runs on the 80386 or 80486, of course). One interesting point to note
about it is its split personality. Like all DOS extenders, it supports most
DOS and BIOS calls directly--your programs still use interrupt 10H to write to
the screen, or interrupt 21H to call DOS. It also supports calls that you need
to interact with the DOS extender. The unusual part of the Phar Lap DOS
Extender is that it also supports most of the basic OS/2 API calls.
The benefits are threefold. First, you can run the Microsoft compiler, linker,
and debugger under the DOS Extender--they believe they are running under OS/2.
Also, you can use almost any language that can create OS/2 applications to
create programs. Finally, you can use the API calls yourself, if you wish. You
may think you don't need to use OS/2 calls in your programs, but some of them
are very useful. In particular, your programs can use Dynamic Link Libraries
(DLLs), and threads (although their implementation is different from OS/2's).


TURTLE


I've kept TURTLE simple. It maintains a text screen for entering commands and
a graphics screen. You can also enter commands at the top of the graphics
screen, but any error messages will destroy part of the graphics screen. Table
1 shows a summary of TURTLE's operation.
Table 1: TURTLE command summary -- arguments are in italics, items in brackets
([]) are optional.

 Command Meaning Operators
 -------------------------------------------------------------------------

 setx x Set x-coordinate. (Highest Precedence)
 sety y Set y-coordinate. * Multiply
 setxy x y Set both coordinates. / Divide (integer)
 move dist Move turtle specified % Modulus (integer
 distance. remainder)
 home Move turtle to location + Addition
 (0,0). - Subtract
 turn heading Turn turtle to new & Logical and (like &&
 heading. in C)
 show Show graphics screen Logical or (like 
 and wait for a key. in C)
 show ON Turn graphics screen on. = Equality
 show OFF Turn graphics screen off. # Inequality
 pencolor color Set color. < Less than
 background color Set background color. > Greater than
 penup Make turtle move without <= Less than or equal to
 drawing. >= Greater than or
 pendown Make turtle draws as it equal to
 moves. (Lowest Precedence)
 clear Clear screen and home
 turtle.

 fill Fill region. Constants
 sto buffer Store screen to buffer Constants are hexadecimal
 (1-10). if you use the "Ox"
 prefix (that is,
 rcl buffer Recall buffer to screen. OxFF). All other
 prompt string Write string as prompt. constants are decimal.
 text string Write string on graphics
 screen. Variables
 textcolor color Set text color. The variables A-Z can hold
 set var value Set variable (A-Z) to integer values. You may
 value. use a variable any place
 do file Execute file -- resume you can use a constant.
 current file when it
 ends. Special variables
 %c Current color
 goto file Transfer control to file %h Current heading
 repeat n file Execute file n times. %i Numeric input from
 if expr THEN EXIT Exit current program user (no checking is
 file if expr is nonzero. done -- illegal input is
 if expr DO file Execute file if expr is returned as zero)
 nonzero. %r Random integer from
 if expr GOTO file Transfer to file if expr 0-32,767
 is nonzero. %x Current x-coordinate
 help [command] Get help. %y Current y-coordinate
 push expr Push expr on stack.
 pop var Pop top of stack to var. Expression rules
 delay time Delay time/10 seconds. Expressions must not
 dos [command] Run DOS command or contain spaces.
 COMMAND.COM Parentheses can be used
 if command is absent. freely and override
 edit [file] Edit file with EDIT precedence.
 (DOS 5) or program Most expresssions can be
 named in TURTLEEDIT relative. For instance,
 environment variable. SETXY -10 +0 will move
 the cursor ten places to
 dir [arguments] Run DOS DIR command. the left. This is not
 cd directory Change directory. the same as SETXY 10 0,
 quit Exit TURTLE. which moves the cursor to
 location (10,0).

TURTLE works in the VGA's 320 x 200 X 256 mode. The top-left corner of the
screen is at (0,0) and the bottom-right corner is (319,199). TURTLE allows
coordinates to range from -999 to 999. Any points that fall outside the screen
do not appear. Arguments to TURTLE commands can usually be absolute or
relative. For example, SETXY 100 100 moves the current position (or turtle) to
(100,100). However, SETXY -10 +0 moves the turtle ten units to the left (the
X-axis), and doesn't move it at all on the Y-axis. TURTLE provides 26 global
integer variables (A-Z). Any numeric argument can be an algebraic expression
(see Table 1). Expressions must not contain spaces -- a space marks the start
of a new argument.
To support animation, TURTLE supplies ten screen storage locations in memory.
Each storage buffer requires 64,000 bytes. In addition, one storage location
stores the current screen when TURTLE is in text mode. These data buffers
alone require 704,000 bytes -- more memory than is available with unaugmented
DOS.
Of course, a DOS extender isn't the only option available for buffers of this
size. EMS, extended memory, XMS, or disk paging would have worked too.
However, these would add considerable complexity to the application.


The Command Interpreter


At the heart of TURTLE is the extensible command interpreter, XCI (see
Listings One and Two, page 94). This interpreter is generic -- any program
could use it. It only supplies four basic commands: DO, HELP, LINK, and QUIT.
An application that uses XCI (a client) can also enable a fifth command, GOTO.
The client can directly add commands to XCI's command table. In addition, the
client or the user can add commands dynamically from a DLL to XCI. Figure 1
shows the format of an XCI command function. XCI.H (Listing One) defines the
type XCICMD for these functions.
Figure 1: Prototype for XCI command function

 int command (char *dll, char *startfile, int cases, void far *user,
 XCICMD( *userfunc) ());

 where:

 dll is the name of a DLL to load.
 startfile is the name of a file to execute with the DO command.

 cases is 0 if upper- and lower-case commands are the same, nonzero
 if case is important.
 user is a pointer to a user-defined structure.
 userfunc is a pointer to a command function that will run when XCI
 begins and ends. The cmd argument to this function (see
 Table 2) will be 0 when XCI starts and nonzero when it ends.

XCI passes an integer command to an XCICMD function. If the command is zero,
the function should perform its duty. If it is one, the function should print
detailed help information. If the command is two, the function should print a
one-line help message. To save space in the listings, TURTLE often uses the
same message for long and short help. You can modify the help text to suit
your preference, if you like.
The client can register a function that XCI will call before the program
starts and as it is ending. This function can install commands and do other
specific client processing. TURTLE uses the startup() function for this
purpose. XCI calls the function with the cmd argument equal to 0 when the
program starts, and equal to 1 when it ends.
The client specifies several parameters when it starts command(), XCI's main
function. These parameters allow the client to link a DLL, execute a file of
start-up commands (TURTLE uses TURTLE.CMD), and control XCI's behavior. Figure
2 shows a prototype for command(). The client also can control XCI via several
global variables (see Table 2). These variables have defaults -- you may not
need to set them.
Figure 2: Prototype for XCI's command() function

 XCICMD function (int cmd, char *string, void *data);

 where:

 cmd is one of the following: O, execute function; 1, provide
 short help; 2, provide help screen. The cmd parameter has
 special significance for startup functions (see Table 2).
 string is the command line (not including the command's name).
 data is a pointer to a user-defined structure. This structure must
 contain any data the application commands need to function.

Table 2: XCI global variables

 Variable Meaning
 -------------------------------------------------------------------------

 char*xci_prompt; String used to prompt user for input (default:
 "?").
 void (*xcif_prompt)(); Pointer to a function to print a prompt string:
 The string is passed to the function as an
 argument.
 void (*xcif_prehelp(); Pointer to a function to call before handling a
 help command: By default, this function does
 nothing. An application may want to switch
 screens or do other processing here.
 void (*xcif_posthelp)(); Pointer to a function to call after a help
 command.
 char *(*xcif_input)(); Pointer to the input function (normally
 fgets()).

The two key data structures in XCI are the cmds array and the instack pointer.
XCI dynamically allocates the cmds array. It contains a list of commands and
function pointers for each command. When XCI calls a command file (via the DO
command), it maintains a linked list of files via the instack pointer. During
the processing for a DO command (the dofunc() function in Listing Two), XCI
makes an entry in the linked list that contains the filename and an fseek()
offset in the file. It then closes the file before opening the new one. In
this way, XCI avoids exceeding the DOS open file limit. At the end of a file,
XCI unravels the linked list to reopen the previous file for processing.
The adddll() function (in listing Two) links in a DLL. This function uses the
DOS-Extender/OS/2 function DosGetModHandle() to decide if the DLL is already
present. It then loads the DLL using DosLoadModule(). Finally, XCI calls
DosEnumProc() to find the names of the functions the DLL exports. XCI expects
these names to be C functions (not Pascal functions), and therefore strips off
the first character of the function name before entering it into the cmds
array. This is necessary because the compiler prefixes C function names with
an underscore.
For our personal use, adding commands with a DLL is not very important. Yet
many sophisticated products allow users to add custom code to do specialized
processing (user algorithms, user blocks, exits, and so on). The use of DLLs
allows users to do this in a very straightforward manner. You can export
functions from your code that they can use in their DLLs. DLLs under the
DOS-Extender don't work exactly like OS/2 DLLs. See the text box entitled
"DOS-Extender DLLs" for more details.


The TURTLE Program


Due to space limitations, many of TURTLE's files are not included in this
article, but are available electronically. (See "Availability," page 3.)
Therefore, I'll simply refer to their filenames when appropriate. However, the
XCI command interpreter, which the TURTLE program depends upon heavily, is
included. TURTLE.C (Listing Four, page 99) simply sets up XCI and turns
control over to it. XCI then calls various functions within TURTLE in response
to certain commands and events. Most of the commands are in TCMDS.C. TEXPR.C
parses algebraic expressions in command arguments. Also, any source file
#include TURTLE.H (Listing Three, page 99) for global definitions.
The SAVE and LOAD commands reside in a DLL, TSAVE.C. An end user of the TURTLE
program could rewrite this DLL to load and save other graphics formats by
replacing this DLL. The end user would not need the source or object code for
TURTLE.


Accessing Physical Addresses


TURTLE uses the Microsoft C graphics functions for simplicity. It also uses
direct access to the screen buffer to save and restore screens. In a normal
DOS application, this is simple. The text buffer is at segment B800H, and the
graphics buffer is at A000H. How can we access these segments with the
DOS-Extender?
The setptr() function in TCMDS.C uses the DOS-Extender function
DosMapRealSeg() to address the video buffers. It creates a protected-mode
segment at the specified address and of a specified length. Using this
function, our programs can directly access any real-mode address. Once
setptr() initializes the pointers to the screen buffers, we can't tell that
they are special pointers. We use them just like any other pointer in a normal
C program.

More Details.


Unsupported Calls


Although the Phar Lap DOS-Extender supports most DOS and BIOS calls
transparently, there are a few functions that it does not directly support.
TURTLE uses interrupt 15H function 86H in its DELAY command. The Phar Lap
DOS-Extender doesn't support any interrupt 15H functions directly, but this
isn't a problem. Phar Lap provides a DosRealIntr() function that can call
almost any real-mode interrupt. The delaycmd() function in TCMDS.C uses this
call. There are similar functions for making far calls to real-mode code.


Handling Interrupts


Interrupt handling is the one area where a DOS-extended program is different
from a normal DOS program. Interrupts fall into three classes in a DOS
extended application: interrupts that occur while in real mode; interrupts
that occur while in protected mode; and processor exceptions (such as GP
fault). The DOS extender allows you to have real- or protected-mode interrupt
handlers. You also can have a real-mode handler that services protected-mode
interrupts and vice versa.
Handling the DOS Ctrl-C interrupt was the most difficult part of writing the
XCI module. If the screen contents were not important, it would have been best
to use the signal() function from the C library. XCI hooks interrupt 16H, the
BIOS keyboard interrupt, because signal() allows DOS to corrupt the screen by
printing a ^C on the screen when a break occurs. The new interrupt 16H handler
(xci_int16() in Listing Two) does not allow DOS to see any break characters.
Instead, it sets the broke flag to signal XCI that the user wants to return to
the top command level.
We can't install xci_int16() as a normal protected-mode interrupt handler
because DOS will call INT 16H from real mode. But we can install a
protected-mode handler that receives interrupts from real and protected mode.
This is done using the DosSetPassProtVec() call.
The 286DOS-Extender has separate vectors for real- and protected-mode
interrupts. Calling DosSetPassProtVec() changes both of them. Before the
program exits, XCI must restore both interrupt vectors using
DosSetRealProtVec(). XCI registers its shut-down code (the xci_cleanup()
function) using DosExitList(). This is similar to using atexit() or onexit()
with one important exception. A function set up with DosExitList() will
execute when the program terminates for any reason. Exit functions using the C
library's atexit() or onexit() only run when the program terminates normally.
Versions of the DOS-Extender earlier than 1.4 had a problem using
DosSetPassProtVec() with INT 16H. In these early versions, a DOS call triggers
an INT 16H after the extender has set the real-mode interrupt, but before it
has set the protected-mode vector. This causes an endless loop of mode
switches. The latest versions don't have this problem, but TURTLE uses a
workaround in any case. It sets the protected-mode vector for INT 16H before
calling DosSetPassProtVec().
Inside xci_int16(), we need to call the old INT 16H handler to check for
Ctrl-C characters. The DOS-Extender provides the DosChainToRealIntr()
function, but does not return control to our program when the real-mode
interrupt completes. Luckily, DosRealFarCall() can call an interrupt handler
with a little subterfuge.
TURTLE uses DosRealFarCall(oldbreal,&r1,0,-1,r1flags); to call the old
interrupt handler (whose address is in oldbreal). The pointer to r1 contains
the registers we want to pass to the interrupt handler. Be careful to set the
flags in r1 to a legitimate value. If the single-step flag is set, for
example, your program will crash. You should usually make sure the interrupt
enable flag is clear, too. The zero is a reserved parameter -- it must be zero
and doesn't mean anything. The next argument (-1) informs DosRealFarCall()
that it should push one word on the stack before calling the real-mode
routine. The number of words is negative, so the DOS-Extender expects the
real-mode code to clean up the stack before returning (like a Pascal function
does). The last zero in the call is the one word to push on the stack.
You may be wondering why this works. When an interrupt occurs, the CPU will
push the flags and the far return address on the stack. The IRET instruction
at the end of the interrupt service routine will restore the flags and return
to the calling routine. If DosRealFarCall() pushes a word on the stack, the
IRET will restore that value to the flags and return to the proper address.


286 Pointers


Writing programs for a 286 DOS extender doesn't provide you with the same
luxuries as writing for a 386 extender. In particular, with the 286, segments
cannot exceed 64 Kbytes in length. You still have to resort to huge pointers
to handle data larger than 64 Kbytes. Of course, for our TURTLE program, this
isn't an issue -- the large graphics buffers are 64,000 bytes long.


Compiling


Compilation of the program is similar to compilation of a real-mode program.
The Microsoft C compiler thinks it is creating an OS/2 program and DLLs.
The makefile (Listing Five, page 99) builds TURTLE.EXE and TSAVE.DLL, the two
main portions of the program. It also creates TURTLE.LIB. This file is an
import library for TURTLE DLLs. By linking with an import library, a DLL can
reference routines that reside in TURTLE.EXE. The DOS-Extender will provide
the correct address at runtime. If you were distributing TURTLE to end users,
they could write their own DLLs by using TURTLE.LIB and a header file to
declare the appropriate functions and types.
Note that protected-mode programs use the -Lp compiler switch to link with the
protected-mode libraries. TURTLE.EXE can't run on an 8086-based computer, so
the -G2 compiler switch creates 286-specific code for better performance.
The makefile compiles the TSAVE DLL with the -ML option. The -Gs option is
also necessary because the DLL's data segment will differ from TURTLE.EXE's
stack segment. DLLSTART.ASM provides an entry point for DLL initialization.
Once you create TURTLE.EXE you can run it by using the RUN286 program supplied
with the DOS-Extender. You also can use the BIND286 utility to patch
TURTLE.EXE to load RUN286 automatically when you execute it from the DOS
prompt. If you own the distribution kit, you can also bind in a complete copy
of the DOS-Extender, making TURTLE.EXE a complete stand-alone program.


Gauging Performance


Overall, the Phar Lap 286DOS-Extender was a good choice for the TURTLE
program. It provided enough memory to store screens, and the DLL system makes
TURTLE easily extensible. Try some of the example programs (also available
electronically). You'll see that performance suffers little, if any, compared
to real-mode programs using the Microsoft graphics library.
Interactive programs such as TURTLE are hard to time, so you may want to
experiment with TIMING.C (see Listing Six, page 100). TIMING.C will compile in
real or protected mode. It exercises the VGA and writes large files to disk
(similar to the TURTLE program). Table 3 shows the average times for TIMING.C
to run with different DOS extender parameters and in real mode.
Table 3: Results from TIMING.C program: All times are in seconds, averaged
over five runs. All tests ran on 386 DX/25 with no memory or disk cache.

 Real Mode Dos-Extender Dos-Extender w/
 w/no options -XFER 32 option
 ---------------------------------------------------

 Graphics: 19 22 22
 File 12 24 14

Accessing the video RAM directly wasn't much more trouble with the
DOS-Extender than with a normal program. The most troublesome aspect was the
interrupt handling.
Unless you change the options to RUN286, most of TURTLE runs in extended
memory. This leaves plenty of conventional memory free to run an external
editor in real mode (for the EDIT command). Calling a real-mode program from
TURTLE was as simple as it would be in a regular DOS program.


Bibliography



286DOS-Extender Reference Manual, Cambridge, Mass.: Phar Lap Software Inc.,
1991.
Berentes, Drew. Apple Logo. Blue Ridge Summit, Penn.: Tab Books, 1984.
Duncan, Ray. IBM ROM BIOS. Redmond, Wash.: Microsoft Press, 1988.
Williams, Al. DOS 5: A Developer's Guide. Redwood City, Calif.: M&T
Publishing, 1991.


Products Mentioned


Phar Lap 286DOS-Extender Phar Lap Software Inc. 60 Aberdeen Avenue Cambridge,
MA 02138 617-661-1510 $495


DOS-Extender DLLs


DLLs under 286DOS-Extender are slightly different from their OS/2 or Windows
counterparts. The primary difference is because the DOS-Extender doesn't
multitask. With OS/2 or Windows, a DLL may have to serve several clients
simultaneously. This requires the DLL to allocate private data for each client
and worry about concurrency problems. With the DOS-Extender, we don't have
these concerns. The DLL is more like an overlay--the DOS-Extender loads it for
our program's exclusive use at runtime.
Another difference stems from the way 286DOS-Extender loads DLLs. With OS/2 or
Windows, you may specify that a DLL loads only when your program uses it. If
the program doesn't use the code in the DLL, it doesn't take the time and
space to load it. The DOS-Extender loads any DLLs that you link with your
program immediately. The only way to achieve true dynamic loading is to manage
the process manually, as XCI does for its LINK command.

--A.W.



_PROGRAMMING WITH PHAR LAP'S 286DOS-EXTENDER_
by Al Williams


[LISTING ONE]

/*****************************************************************
 * XCI.H Header for XCI command interpreter -- Al Williams *
 *****************************************************************/
#ifndef XCI_HEADER
#define XCI_HEADER

/* type for command functions */
#define XCICMD void far

/* Pointer to command function */
typedef void (far * XCICMDP)(int cmd,char far *line,void *udata);

/* Various hooks */
extern char *xci_prompt; /* string to prompt with */
extern FILE *xci_infile; /* input file */
extern int xci_exitflag; /* set to exit XCI */
extern int xci_defaultbrk; /* default break handling */
void (*xcif_prompt)(); /* function to prompt with */
void (*xcif_prehelp)(); /* function to call before help */
void (*xcif_posthelp)(); /* function to call after help */
char *(*xcif_input)(); /* function to get input */

/* main function prototype */
int command(char *dll,char *startfile,int caseflag, void far *ustruc,XCICMDP
userfunc);

/* add command (not from DLL) */
int addcmd(char *cmdnam,XCICMDP fn);

#endif







[LISTING TWO]

/**********************************************************
 * XCI.C An extensible command interpreter for the *
 * Phar Lap 286 DOS Extender -- Al Williams *
 **********************************************************/

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <malloc.h>
#include <dos.h>
#include <phapi.h>
#include <setjmp.h>
#include "xci.h"

/* Table of commands (dynamically allocated) */
static struct cmdtbl
 {
 char far *cmd;
 XCICMDP f;
 } *cmds=NULL;

/* Number of commands in table */
static unsigned int nrcmds=0;

/* Case sensitive? */
static int truecase=0;

/* default hook function prototypes */
void xci_prompter(); /* func to prompt */
char *xci_input(); /* func to get input */
void xci_preposthelp(); /* pre & post help command */

/* default prompt string -- can be changed by client */
char *xci_prompt="? ";

/* default routines -- can be changed by client */
void (*xcif_prompt)()=xci_prompter;
void (*xcif_prehelp)()=xci_preposthelp;
void (*xcif_posthelp)()=xci_preposthelp;
char *(*xcif_input)()=xci_input;

/* flag set when break detected */
static int broke;
/* Jump to top level command loop */
jmp_buf cmdloop;

/* default command function prototypes */
XCICMD dofunc(int cmd,char *s,struct udata *data);
XCICMD linkfunc(int cmd,char *s,struct udata *data);
XCICMD quitfunc(int cmd,char *s,struct udata *data);
XCICMD helpfunc(int cmd,char *s,struct udata *data);

/* default commands (client must enable goto if desired) */

static char *defcmd[]= { "quit", "help", "link", "do" };
/* addresses of default commands */
static XCICMDP deffunc[]={quitfunc,helpfunc,linkfunc,dofunc};
/* non-zero if running a script via DO */
static int interactive=0;
/* stack of file positions for nested DO commands */
/* Files are closed and reopened to avoid DOS file limit */
static struct fstack
 {
 char *fp; /* file name */
 long pos; /* position in file */
 struct fstack * next; /* next fstack record */
 } *instack;

/* default stdin handle */
static FILE *baseio;
/* Current input file */
FILE *xci_infile;
/* Set to 1 when someone wants to exit */
int xci_exitflag=0;
/* Default break action */
int xci_defaultbrk=1;
/* Break vectors */
PIHANDLER oldbreak;
REALPTR oldbreal;
PIHANDLER old1b;
REALPTR old1breal;

/* Bios segment (you can't call DosGetBIOSseg from ISR) */
USHORT biosseg;
/* ^Break handlers */
void _interrupt _far xci_int1b(REGS16 r)
 {
 union REGS rr;
 unsigned int *keyhead,*keytail;
 if (!xci_defaultbrk)
 {
/* Chain to old break handler (never returns) */
 DosChainToRealIntr(old1breal);
 }
 keyhead=MAKEP(biosseg,0x1A);
 keytail=MAKEP(biosseg,0x1C);
 broke=1;
/* purge keyboard buffer */
 *keyhead=*keytail;
/* push ^C at head */
 rr.h.ah=5;
 rr.x.cx=3;
 int86(0x16,&rr,&rr);
 }
void _interrupt _far xci_int16(REGS16 r)
 {
 REGS16 r1;
 unsigned ah=r.ax>>8;
 _enable();
 if (xci_defaultbrk&&(ah==0ah==0x10ah==1ah==0x11))
 {
 do
 {

 r1.ax=0x100;
 r1.flags=0;
/* Simulate interrupt to old INT 16H handler */
 DosRealFarCall(oldbreal,&r1,0,-1,r1.flags);
 if ((r1.flags&64)&&(ah==1ah==0x11))
 {
 r.flags=r1.flags;
 return;
 }
 } while (r1.flags&64);
/* If break character -- replace it with a carriage return */
 if ((r1.ax&0xff)==3r1.ax==0x300)
 {
 unsigned int *keyhead;
 keyhead=MAKEP(biosseg,0x1A);
 keyhead=MAKEP(biosseg,*keyhead);
 *keyhead='\r';
 broke=1;
 }
 }
 DosChainToRealIntr(oldbreal);
 }
/* XCI Clean up */
/* Note: DosExitList requires this to be a pascal function */
void pascal far xci_clean(unsigned int reason)
 {
/* restore interrupt vectors */
 DosSetRealProtVec(0x16,oldbreak,oldbreal,NULL,NULL);
 DosSetRealProtVec(0x1b,old1b,old1breal,NULL,NULL);
/* Exit handler must call DosExitList with EXLST_EXIT
 to proceed with the termination */
 DosExitList(EXLST_EXIT,NULL);
 }
/* default functions */
void xci_prompter(char *s)
 {
 printf("%s",s);
 }
char *xci_input(char *inbuf,unsigned int siz,FILE *input)
 {
 return fgets(inbuf,siz,input);
 }
void xci_preposthelp()
 {
 }
/* Main command routine */
/* dll is initial DLL to load
 startfile is initial file to DO
 cases is 1 if case sensitivity is required
 userfunc is pointer to user function called at
 start and end */
int command(char *dll, char *startfile, int cases,void far *user,XCICMDP
userfunc)
 {
 int i;
 char inbuf[129],*p;
 if (!cmds)
 {
/* first time (not done for recursive calls) */
 DosGetBIOSSeg(&biosseg);

/* Due to a bug in versions prior to 1.4, you must set
 the INT 16H ProtVec before using PassToProtVec... */
 DosSetProtVec(0x16,xci_int16,&oldbreak);
 DosSetPassToProtVec(0x16,xci_int16,NULL,&oldbreal);
 DosSetPassToProtVec(0x1b,xci_int1b,&old1b,&old1breal);
/* set up exit handler */
 DosExitList(EXLST_ADD,xci_clean);
 truecase=cases;
 xci_infile=stdin;
/* install default commands */
 cmds=(struct cmdtbl *)malloc(4*sizeof(struct cmdtbl));
 if (!cmds) return 1;
 nrcmds=4;
 for (i=0;i<nrcmds;i++)
 {
 cmds[i].cmd=defcmd[i];
 cmds[i].f=deffunc[i];
 }
/* load default DLL (if specified) */
 if (dll&&*dll)
 if (adddll(dll))
 printf(
 "Warning: unable to load default command DLL\n");
/* call user function */
 if (userfunc) userfunc(0,NULL,user);
/* execute default DO file */
 if (startfile&&*startfile) dofunc(0,startfile,user);
/* set jump buffer for future longjmp's */
 setjmp(cmdloop);
 }
/* initilization done -- begin main processing */
 while (1)
 {
 char *token,*tail;
/* if someone wants to quit then quit */
 if (xci_exitflag)
 {
/* call user function */
 if (userfunc) userfunc(1,NULL,user);
/* reset some things in case we are called again */
/* restore interrupt vectors */
 DosSetRealProtVec(0x16,oldbreak,oldbreal,NULL,NULL);
 DosSetRealProtVec(0x1b,old1b,old1breal,NULL,NULL);
 DosExitList(EXLST_REMOVE,xci_clean);
 xci_infile=stdin;
 interactive=0;
 instack=NULL;
 free((void *)cmds);
 cmds=NULL;
 return 0;
 }
/* If interactive then prompt */
 if (!interactive) (*xcif_prompt)(xci_prompt);
/* get input from user or file */
 *inbuf='\0';
 (*xcif_input)(inbuf,sizeof(inbuf),xci_infile);
/* If break detected then go to top level */
 if (broke)
 {

 struct fstack *f;
 broke=0;
 /* free fstack entries */
 for (f=instack;f;f=f->next) free(f->fp);
 instack=NULL;
 interactive=0;
 xci_infile=stdin;
 longjmp(cmdloop,1);
 }
/* If end of do file, return. If end of console, ignore */
 if (!*inbuf&&feof(xci_infile))
 {
 if (interactive)
 {
 return 0;
 }
 clearerr(xci_infile);
 continue;
 }
/* got some input -- lets look at it */
 i=strspn(inbuf," \t");
/* skip blank lines and comments */
 if (inbuf[i]=='\n') continue;
 if (inbuf[i]=='#') continue;
/* eat off \n from line */
 p=strchr(inbuf+i,'\n');
 if (p) *p='\0';
/* get a token */
 token=strtok(inbuf+i," \t");
 if (!token) continue; /* this should never happen */
/* do we recognize the command? */
 i=findcmd(token);
/* NO: error */
 if (i==-1)
 {
 printf("Unknown command %s\n",token);
 continue;
 }
/* YES: compute command's tail (arguments) */
 tail=token+strlen(token)+1;
 tail+=strspn(tail," \t");
/* execute command */
 cmds[i].f(0,tail,user);
 }
 }
/* Find a command -- search backwards so new commands
 replace old ones */
static int findcmd(char *s)
 {
 int i,stat;
 for (i=nrcmds-1;i>=0;i--)
 {
 if (!(truecase?
 strcmp(s,cmds[i].cmd)
 :
 stricmp(s,cmds[i].cmd)))
 return i;
 }
 return -1;

 }
/* Add a DLL to the command input table
 returns 0 if successful */
static adddll(char *dll)
 {
 char cmdnam[33],*p;
 HMODULE h=0;
 unsigned ord=0;
 p=strrchr(dll,'\\');
/* check to see if module is already loaded */
 if (!DosGetModHandle(p?p+1:dll,&h))
 {
 printf("%s already loaded\n",p?p+1:dll);
 return 1;
 }
/* Load module if possible */
 if (DosLoadModule(0,0,dll,&h))
 return 1;
/* find all exported functions in module */
 while (!DosEnumProc(h,cmdnam,&ord))
 {
 PFN fn;
/* Get function's address */
 DosGetProcAddr(h,cmdnam,&fn);
/* add command -- skipt 1st character (it is a _) */
 if (addcmd(cmdnam+1,(XCICMDP) fn)) return 1;
 }
 return 0;
 }
/* add a command -- returns 0 for success */
addcmd(char *cmdnam,XCICMDP fn)
 {
 struct cmdtbl *ct;
/* make more room in table */
 ct=(struct cmdtbl *)
 realloc(cmds,(nrcmds+1)*sizeof(struct cmdtbl));
 if (!ct) return 1;
 cmds=ct;
/* add name and function */
 cmds[nrcmds].cmd=strdup(cmdnam);
 if (!cmds[nrcmds].cmd) return 1;
 cmds[nrcmds++].f=(XCICMDP) fn;
 return 0;
 }
/* currently executing file name */
static char curfile[67];
/* Command to transfer execution from one file to another
 Only works from inside a file, and must be enabled by
 client program: addcmd("GOTO",gotocmd); */
XCICMD gotofunc(int cmd,char *s,struct udata *data)
 {
 FILE *f;
 if (cmd==2)
 {
 printf("Execute commands from an ASCII file\n");
 return;
 }
 if (cmd==1!s!*s)
 {

 printf("goto executes commands from an ASCII file\n"
 "Usage: goto FILENAME\n");
 return;
 }
/* open file */
 f=fopen(s,"r");
 if (!f)
 {
 printf("Can't open %s\n",s);
 perror(s);
 return;
 }
 if (!interactive)
 {
 printf("Use goto only from command files\n"
 "Use do to execute a file\n");
 return;
 }
/* register as current file */
 strcpy(curfile,s);
 fclose(xci_infile);
 xci_infile=f;
 }
/* Do a command file */
XCICMD dofunc(int cmd,char *s,struct udata *data)
 {
 FILE *ifile;
 struct fstack recall;
 if (cmd==2)
 {
 printf("Do commands from an ASCII file\n");
 return;
 }
 if (cmd==1!s!*s)
 {
 printf("Do executes commands from an ASCII file\n"
 "Usage: do FILENAME\n");
 return;
 }
/* open file */
 ifile=fopen(s,"r");
 if (!ifile)
 {
 printf("Can't open %s\n",s);
 perror(s);
 return;
 }
 if (interactive)
 {
/* store current file name so we can resume later */
 if (!(recall.fp=strdup(curfile)))
 {
 printf("Out of memory\n");
 fclose(ifile);
 return;
 }
/* store position in current file and close it */
 recall.pos=ftell(xci_infile);
 fclose(xci_infile);

 }
 else
 {
/* no current file, so remember this handle but don't close it */
 baseio=xci_infile;
 recall.fp=NULL;
 }
/* add recall to linked list of nested files */
 recall.next=instack;
/* make new file current */
 strcpy(curfile,s);
 xci_infile=ifile;
/* mark nesting level */
 interactive++;
/* make recall the head of the fstack linked list */
 instack=&recall;
/* call command recursively */
 command(NULL,NULL,0,data,NULL);
/* close useless file */
 fclose(xci_infile);
/* restore old file */
 if (instack->fp!=NULL) /* is it a file? */
 {
/* open it */
 xci_infile=fopen(instack->fp,"r");
 if (!xci_infile)
 {
/* serious error! file vanished! reset to top level */
 printf("Error opening %s\n",instack->fp);
 xci_infile=baseio;
 interactive=0; /* bad error if nested */
 }
 else
 {
/* reposition old file */
 fseek(xci_infile,instack->pos,SEEK_SET);
/* make it current */
 strcpy(curfile,instack->fp);
 }
/* release memory used for file name */
 free(instack->fp);
 }
 else
 {
/* reset to console */
 xci_infile=baseio;
 }
/* fix up linked list */
 instack=instack->next;
 interactive--;
 }
/* Link a dll */
XCICMD linkfunc(int cmd,char *s,struct udata *data)
 {
 if (cmd==2)
 {
 printf("Add user-defined commands\n");
 return;
 }

 if (cmd==1!s!*s)
 {
 printf("Add user-defined commands via a DLL\n"
 "Usage: link DLLNAME\n");
 return;
 }
 if (adddll(s))
 {
 printf("Unable to load dll: %s\n",s);
 }
 }
/* Quit */
XCICMD quitfunc(int cmd,char *s,struct udata *data)
 {
 if (cmd==0) { xci_exitflag=1; return; }
/* long and short help message */
 printf("Exits to DOS\n");
 }
/* provide general help (scan from end to 0 call with cmd==2)
 or specific help find command and call with cmd==1 */
XCICMD helpfunc(int cmd,char *s,struct udata *data)
 {
 int i,j=0;
 if (cmd==2) printf("Get help\n");
 if (cmd==1) printf(
 "Use the help command to learn about the available"
 " commands\nUse HELP for a list of help topics"
 " or \"HELP topic\""
 " for help on a specific topic.\n");
 if (cmd) return;
/* call user's prehelp */
 (*xcif_prehelp)();
/* if specific command... */
 if (s&&*s)
 {
/* find it and ask it about itself (command==1) */
 i=findcmd(s);
 if (i==-1) printf("No such command: %s\n",s);
 else cmds[i].f(1,NULL,NULL);
 }
 else
/* No specific command -- do them all (command==2) */
 for (i=nrcmds-1;i>=0;i--)
 {
 char buf[22];
/* might be a lot of commands -- pause on screenfulls */
 if (!(++j%25))
 {
 printf("--More--");
 j=0;
 if (!getch()) getch();
 putchar('\n');
 }
/* print header */
 strncpy(buf,cmds[i].cmd,20);
 strcat(buf,":");
 printf("%-21.21s",buf);
/* ask command for short help */
 cmds[i].f(2,NULL,NULL);

 }
/* call user's post help */
 (*xcif_posthelp)();
 }






[LISTING THREE]

/*****************************************************************
 * TURTLE.H Header for TURTLE.EXE -- Al Williams *
 *****************************************************************/
#include <graph.h>
typedef unsigned long ulong;
typedef unsigned int uint;

/* graphics buffer */
extern char _gbuf[64000];

/* Application data (passed to XCI commands) */
struct udata
 {
 char *gbuf; /* pointer to graphics buffer */
 char tbuf[4000]; /* text buffer */
 char *gptr; /* pointer to graphics screen */
 char *tptr; /* pointer to text screen */
 struct xycoord graphxy; /* x,y of graphic screen */
 struct rccoord textxy; /* x,y of text screen */
 int color; /* color */
 long backcolor; /* background color */
/* store[10] & store[11] are for internal use */
 char *store[12]; /* screen storage */
 unsigned int mode:1; /* draw or move */
 unsigned int textgraph:1; /* if 1, don't exit graphic mode */
 int heading; /* turtle heading */
/* X and Y are stored as reals too to combat rounding errors */
 double realx;
 double realy;
/* 26 variables A-Z */
 long vars[26];
/* text color */
 int tcolor;
 };
/* Application data structure */
extern struct udata appdata;






[LISTING FOUR]

/*****************************************************************
 * TURTLE.C Main program for TURTLE.C -- Al Williams *
 * TURTLE assumes large model -- see the MAKEFILE for compile *

 * instructions. *
 *****************************************************************/
#include <stdio.h>
#include <graph.h>
#include <dos.h>
#include <phapi.h>
#include "turtle.h"
#include "xci.h"

/* XCI client's application data (see TURTLE.H) */
struct udata appdata;
int installcmds(void);

/* XCI startup command -- install commands */
XCICMD startup(int cmd, char far *dummy)
 {
 if (cmd) return;
 if (installcmds())
 {
 printf("Out of memory\n");
 exit(1);
 }
 }
/* Reset things before normal exit */
void preexit()
 {
 _setvideomode(_DEFAULTMODE);
 }
/* MAIN PROGRAM */
main()
 {
 void turtleprompt();
/* register exit routine */
 atexit(preexit);
/* Set some graphics things */
 _setvideomode(_TEXTC80);
 _setactivepage(0);
 _setvisualpage(0);
 appdata.tcolor=appdata.color=15;
 appdata.backcolor=0x003f0000L; /* blue background */
/* clear screen */
 clearcmd(0,"",&appdata);
/* Print banner */
 printf("TURTLE VGA by Al Williams\n"
 "Type HELP for help\n");

/* Take over XCI prompt function */
 xcif_prompt=turtleprompt;
 command("TSAVE.DLL","TURTLE.CMD",0,
 &appdata,(XCICMDP) startup);
 }
/* XCI prompt -- if in graphics mode keep input on top line */
void turtleprompt(char *s)
 {
 union REGS r;
 if (appdata.textgraph)
 {
/* don't do newline in graphic mode */
 if (*s=='\n')

 {
 printf(" ");
 return;
 }
/* but do clear the line */
 r.h.ah=2;
 r.h.bh=0;
 r.x.dx=0;
 int86(0x10,&r,&r);
 r.x.ax=0x0a00' ';
 r.x.bx=appdata.tcolor;
 r.x.cx=40;
 int86(0x10,&r,&r); /* clear to end of line */
 }
 printf("%s",s);
 }






[LISTING FIVE]

######################################################
# Makefile for TURTLE #
# Use NMAKE to compile #
######################################################

all : turtle.exe tsave.dll

turtle.exe : turtle.obj tcmds.obj xci.obj texpr.obj
 cl -AL -Lp turtle.obj tcmds.obj xci.obj \
 texpr.obj c:\run286\lib\graphp.obj \
 LLIBPE.LIB GRAPHICS.LIB
 implib turtle.lib turtle.exe
turtle.obj : turtle.c xci.h turtle.h
 cl -AL -Ox -G2 -c turtle.c
tcmds.obj : tcmds.c xci.h turtle.h
 cl -AL -Ox -G2 -c tcmds.c
texpr.obj : texpr.c turtle.h
 cl -AL -Ox -G2 -c texpr.c
xci.obj : xci.c xci.h
 cl -AL -Ox -G2 -c xci.c
tsave.dll : tsave.c dllstart.asm turtle.h xci.h turtle.lib
 cl -ML -Gs -Lp -Ox -G2 tsave.c dllstart.asm turtle.lib






[LISTING SIX]

/******************************************************************
 * TIMING.C - simple non-rigorous benchmark for 286DOS Extender *
 * Compile with: *
 * CL -AL -Lp -G2 -Ox timing.c graphp.obj llibpe.lib graphics.lib *
 * (protected mode) *

 * OR: *
 * CL -AL -G2 -Ox timing.c graphics.lib *
 * (real mode) *
 ******************************************************************/
#include <stdio.h>
#include <graph.h>
#include <time.h>

#define time_mark time_it(0)
#define time_done time_it(1)

main()
 {
 printf("Timing graphics operations\n");
 time_mark;
 gtest();
 time_done;
 printf("Timing file operations\n");
 time_mark;
 ftest();
 time_done;
 exit(0);
 }
/* Function to mark times */
int time_it(int flag)
 {
 static clock_t sttime;
 unsigned s;
 if (!flag)
 {
 sttime=clock();
 }
 else
 {
 s=(clock()-sttime)/CLK_TCK;
 printf("Elapsed time: %d seconds\n",s);
 }
 return 0;
 }
/* Graphics test -- must have VGA */
int gtest()
 {
 int i,x,y;
 _setvideomode(_MRES256COLOR);
 for (i=1;i<11;i++)
 {
 _setcolor(i);
 for (y=0;y<199;y++)
 for (x=0;x<319;x++)
 _setpixel(x,y);
 }
 _setvideomode(_DEFAULTMODE);
 return 0;
 }
/* File test -- assumes 320K free on current drive */
char filedata[64000];
int ftest()
 {
 FILE *tfile;

 int i,j;
 for (j=0;j<10;j++)
 {
 tfile=fopen("~~TIMING.~@~","w");
 if (!tfile)
 {
 perror("TIMING");
 exit(1);
 }
 for (i=0;i<5;i++)
 fwrite(filedata,sizeof(filedata),1,tfile);
 if (fclose(tfile))
 {
 perror("TIMING");
 }
 unlink("~~TIMING.~@~");
 }
 return 0;
 }











































February, 1992
 UNDOCUMENTED DOS FROM PROTECTED-MODE WINDOWS 3


Adding standard file dialogs


 This article contains the following executables: SFEDIT.ARC


Paul Chui


Paul is a software engineer at Claris and is the coauthor of the Turbo C++
DiskTutor (Osborne/McGraw-Hill). He can be reached through the DDJ office.


Under a true protected-mode system, an ill-behaved application cannot corrupt
the rest of the system. Windows, however, is not a pure protected-mode
environment. Windows lives on top of real-mode MS-DOS. So, how does Windows
communicate with DOS, the BIOS, network drivers, and other real mode services
that are unaware of protected mode? That--and how you go about making
undocumented DOS calls--is the focus of this article.
To examine these concepts, I used Microsoft's QuickC for Windows to write a
DLL that provides standard file dialog boxes similar to those found in Windows
3.1. I chose QuickC/Win largely because it is a Windows-hosted environment
that provides the essentials of the Windows SDK.


Extending Windows


Hidden within Windows is a DOS Protected-Mode Interface (DPMI) compliant DOS
extender. The DOS extender is an "invisible" layer between protected-mode
applications and DOS. Protected-mode Windows applications can use the same
real-mode services that DOS applications use. The Windows DOS extender
accomplishes this by trapping each call to the system. When a Windows program
calls DOS, Windows handles the call, switches the processor to real mode, and
relays the call to DOS. When the call is complete, the extender code switches
back to protected mode and returns control to the application. (See the
accompanying text box, "DPMI and the Windows DOS Extender.")
Transparent to the application, the Windows DOS extender must also translate
protected-mode buffers to real-mode buffers. For example, to call the DOS Open
File function (INT 21h, Function 3Dh), the DOS extender first copies the
protected-mode filename to a real-mode addressable buffer before issuing the
DOS call. This transparency allows unmodified standard C library functions
such as fopen to work in Windows applications.


Standard File Dialog Boxes


As Windows users know, a file dialog box allows the user to select a file from
a list of available files, enter a filename directly into an edit box, and to
switch drives and directories. Windows' lack of a standardized interface
element for selecting files has always been a mystery to me. Unlike the
Macintosh, which includes a Standard File Package, each Windows 3.0
application must implement its own version of a file dialog box. (Note that
Windows 3.1 will include standard file dialog boxes in COMMDLG.DLL.)
One benefit of standard file dialog boxes is shared code. The Open File and
File Save As dialog boxes are common to a large class of applications, so it
is natural to put them in a Dynamic Link Library. In a DLL, these dialog boxes
are not statically bound to any one application, but are available to any
application that wants to provide a user interface for specifying filenames.


SFILEDLG


SFILEDLG is a DLL containing a File Open and a File Save As dialog box.
(Because of space considerations, SFILEDLG and other listings are provided
electronically. See "Availability" on page 3 for details.) It uses an
owner-draw list box to draw bitmaps representing different drive types. The
exported function, SFGetFileDlg, is the interface to these dialog boxes.
SFileDlgProc, the callback function, processes the Windows messages. When the
DLL is loaded, LibMain creates the resources used by the dialog boxes. These
resources are discarded when the DLL is removed from the system.


Undocumented DOS Calls


The LibMain initialization routine in SFILEDLG calls GetDriveTypeX to
determine the type of drive for each possible drive in the system. The Windows
API function GetDriveType recognizes floppy, hard, and network drives.
GetDriveTypeX extends GetDriveType by also recognizing RAM, CD-ROM, and SUBST
drives. Unfortunately, neither Windows nor DOS provides a one-stop solution to
recognize all types of drives. This solution requires a combination of
documented and undocumented DOS.
Recall that the Windows DOS extender transparently reflects each DOS call for
protected-mode Windows applications. The DOS extender traps many documented
system calls. To make undocumented DOS calls, however, Windows applications
need to employ their own protected- to real-mode interface.
To make undocumented DOS calls, GetDriveTypeX uses the DPMI Simulate Real-Mode
Interrupt facility (INT 31h, Function 0300h), The function DPMI_RealModeInt in
Listing One (page 102) works similar to the int86 function supplied by C
libraries. The comments in the header of DPMI_RealModeInt document the
protocol for DPMI's Simulate Real-Mode Interrupt facility.
More Details.


CD-ROM Drives


The CD-ROM extensions, MSCDEX, are documented extensions of DOS. These
extensions are translated by the Windows DOS extender, so there is no need to
provide a real-mode translation. To detect a CD-ROM drive, isCDROM calls INT
2Fh, Function 15OBh. The CX register specifies the drive number, which is 0
for drive A:, 1 for drive B:, 2 for drive C:, and so on. If a CD-ROM drive is
detected, the function returns ABABh in register BX.


SUBST



The DOS command SUBST is a TSR that associates a drive letter with a
directory. Issuing the DOS command SUBST W: C:\WINDOWS maps all requests for
drive W: to the C:\WINDOWS directory. Drives created by SUBST can be
recognized using the Current Directory Structure (CDS), an undocumented DOS
structure. (See the textbox entitled "The DOS Lists of Lists.") This structure
contains a bit flag that indicates whether the drive is a SUBST drive.
In Listing One, isSubstDrive calls the function GetCDS to obtain a real-mode
pointer to the CDS array. GetCDS, calls the undocumented DOS function INT 21h,
Function 5200h (Get DOS List of lists). Because INT 21h, Function 5200h is not
serviced by the Windows DOS extender a Windows application must never call it
directly. GetCDS uses DMI_RealModeInt as the interface to this undocumented
DOS function. On return, the register pair ES:BX holds a real-mode address of
the lists of lists. This address must be converted to a protected-mode address
with PMODE_ADDR. PMODE_ADDR takes a selector created by the Windows API
function AllocSelector, a real-mode address, and returns a protected-mode
address.
From the protected-mode address of the List of Lists, GetCDS finds the
real-mode address of the CDS for the specified drive. Before this address can
be used, isSubstDrive calls on PMODE_ADDR again to convert it to a
protected-mode address. Finally, the flag field in the CDS is checked to
determine if the drive is a SUBST drive.


RAM Drives


Like SUBST drives, there is no official method for detecting RAM drives. The
function isRamDrive uses the assumption that RAM drives have only one File
Allocation Table (FAT) instead of the usual two. If a drive is not recognized
by any of the above drive types, GetDriveTypeX tries isRamDrive to see if it
has only one FAT. The CDS contains a pointer to the drive's Drive Parameter
Block (DPB), which holds the number of FATs used by the drive. Like the List
of Lists, the CDS is an internal DOS structure and thus contains real-mode
pointers. Again, a conversion to protected-mode pointer using PMODE-ADDR is
required before the DPB can be dereferenced. (Note: The RAM drive detection
method used by the Windows File Manager appears to check the disk volume label
for the string "MS-RAMDRIVE.")


True Pathnames


Network and SUBST drives are aliases to another device or pathname. LibMain
calls GetCanonicalPath to resolve the true paths of each drive.
GetCanonicalPath calls the undocumented DOS INT 21h Function 60h for this
purpose. DPMI_RealModeInt is called to invoke INT 21h Function 60h, which asks
for a pointer to a relative path string in DS:SI and returns the canonical
fully qualified string in ES:DI. These pointers must be real-mode addressable.
GetCanonicalPath uses the Windows API function GlobalDOSAlloc to create this
buffer. GlobalDOSAlloc returns a protected-mode selector in the high word and
a real-mode segment in the low word of a buffer in the first megabyte of RAM.
This buffer is used in GetCanonicalPath to transfer area for the real-mode
interrupt. The transfer buffer is returned to the precious real-mode memory
pool using GlobalDOSFree.


Real-Mode Buffers


SFILEDLG demonstrates two cases of a real- to protected-mode buffer transfer.
These solutions can be applied more generally to accessing any absolute
real-mode address such as TSRs, network areas, BIOS data area, and so on. In
GetDriveTypeX, the protected-mode application must access existing buffers
(DOS List of Lists, Current Directory Structure, Drive Parameter Block)
created by real-mode programs. GetDriveTypeX calls AllocSelector and
SetSelectorBase to map these buffers into the protected-mode address space. In
Get CanonicalPath, the interrupt asks the calling routine for a buffer to
fill. GlobalDOSAlloc provides the real-mode addressable buffer and a selector
the protected-mode program uses to access it.
Also, the file dialog boxes in SFILEDLG keeps track of the current filename
and file type. Typically, this data would be kept in static or global
variables. However, precautions must be taken with static and global data in
DLLs. At most, one instance of a DLL is loaded by Windows, even if many
applications are using it. Consequently, global and static data are stored in
a DLL's single data segment, which is shared among all applications that call
it.
To associate a separate data buffer with each instance of a file dialog box,
SFILEDLG uses window properties. A property list contains a list of words,
indexed by strings. Each window may have its own property list. Unlike window
extra bytes, properties may be assigned to a window even if that window was
not registered by the application. SFILEDLG allocates a global SFDlg structure
containing the dialog state. The handle to this structure is then placed in
the dialog box's property list.


Using SFILEDLG


SFEDIT (provided electronically) is a text editor that uses SFILEDLG.DLL for
selecting filenames for opening and saving. The application itself is not
particularly interesting; the editor is a multiline EDIT control which
provides no additional functionality over Windows' own NOTEPAD. But SFEDIT
shows how easily SFIELDLG.DLL can be incorporated into any application needing
file dialog boxes.
More Details.
Also provided electronically is SFILEDLG.MAK, a QuickC for Windows 1.0 project
file for making SFILEDLG.DLL and the bitmap resource files referenced in
SFILEDLG.DLL. QuickC/Win will automatically create an import library,
SFILEDLG.LIB, which is used by SFEDIT.MAK and linked with the application
SFEDIT.EXE. (With minor modifications, SFILEDLG.DLL and SFEDIT.EXE can be
created with Microsoft C or Borland C++.)


A Word About Tools


QuickC for Windows offers a less painful entry into the world of Windows
development than Microsoft C 6.0 and the Windows Software Development Kit.
QuickC/Win offers an integrated editor, compiler, and debugger in a completely
Windows-hosted environment. This is undoubtedly a more convenient environment
than the DOS-hosted Microsoft C 6.x Programmer's WorkBench. Furthermore,
QuickC/Win is packaged with the essential elements of the Windows SDK. A
dialog editor, image editor, and CASE tool is included. Code-View for Windows
is not included, but I found the integrated Windows-hosted debugger to be
generally very capable. However, when debugging user-interface code,
QuickC/Win has a tendency to grab input focus from the application being
debugged. Two other SDK utilities that I missed in QuickC/Win are SPY and
HEAPWALKER, which Microsoft is now selling separately.
QuickC/Win is an evolutionary tool for Windows software development. It is not
Visual Basic for C programmers and it doesn't compile C++ code. Microsoft has
revised its QuickC for DOS compiler and packaged it with existing SDK tools
and documentation. It's a good package, and will entice many users to delete
Microsoft C 6.0 and the Windows SDK from their hard drives.


References


Petzold, Charles. Programming Windows. Redmond, Wash.: Microsoft Press, 1990.
Schulman, Andrew, Raymond J. Michels, Jim Kyle, Tim Paterson, David Maxey, and
Ralf Brown. Undocumented DOS. Reading, Mass.: Addison-Wesley, 1990.
DOS Protected Mode Interface Specification, Santa Clara, Calif.: Intel, 1991.
Windows INT 21h and NetBios Support for DPMI, Redmond, Wash.: Microsoft Press,
1990.


DPMI and the Windows DOS Extender


The DOS Protected-Mode Interface (DPMI) allows protected-mode programs to
access DOS by specifying a set of low-level functions that manage extended
memory and call real-mode programs. The services provided by DPMI are not
transparent as in the case of DOS extenders. In fact, DPMI is not principally
designed to be called by applications. The DPMI "client" is intended to be DOS
extenders, such as the one in Windows. The DOS extender is responsible for
translating protected-mode system calls for the application.
Generally, Windows will translate any software interrupts that are register
based (that is, don't pass pointers, stack parameters, and use segment
registers). The reason is that, in protected-mode Windows, segment registers
are "selectors", which contain offsets into the Local Descriptor Table (LDT).
The actual linear address of the data segment is stored in the LDT.
Most DOS functions work under Windows 3.0; even those that use pointers or
segment registers. Windows provides the necessary protected-mode pointer
translations for DOS functions. However, the Windows DOS extender either does
not support or supports in limited fashion the DOS interrupts shown in Figure
1.
Figure 1: The Windows DOS Extender either does not support or supports in
limited fashion these DOS interrupts.

 INT Description
 -------------------------------------


 20h Terminate Program
 25h Absolute Disk Read
 26h Absolute Disk Write
 27h Terminate And Stay Resident

 INT 21h functions not supported
 Function Description
 -------------------------------------

 00h Terminate Process
 0Fh Open File with FCB
 10h Close File with FCB
 14h Sequential Read
 15h Sequential Write
 16h Create File with FCB
 21h Random Read
 22h Random Write
 23h Get File Size
 24h Set Relative Record
 27h Random Block Read
 28h Random block Write

 INT 21h functions supported with restrictions
 Function Description
 -----------------------------------------------------------------------

 21h, 35h Set and Get Interrupt Vector
 Software interrupts issued in real mode are not reflected to
 the protected mode interrupt handlers, except for INT 23h,
 INT 24h, and INT 1Ch.
 44h IOCTL
 Subfunctions 02h, 03h, 04h,, and 05h, used to receive data
 from and send data to a device; the transfer buffer must be
 less than 4K unless it is below the 1-Mbyte boundary.
 Subfunction OCh minor code 4Ah, 4Ch, 4Dh, 6Ah, and 6Bh; the
 code-page functions are not supported.
 38h, 65h Get Country Data and Extended Country
 DWORD pointers returned by these functions are real-mode
 addresses.

Unlike extenders, which try to be fully DOS compatible, Windows does not
translate many undocumented DOS functions. If necessary, a combination of
direct DPMI calls and Windows segment functions may be used. Windows 3.0
supports version 0.9 of the DPMI specification. (The DPMI 1.0 specification
can be obtained from the Intel Literature Center, P.O. Box 58065, Santa Clara,
CA 95052, or by calling for the Intel Reference Literature Packet # JP 26,
800-548-4725.)
Microsoft recommends that Windows applications not call any DPMI functions
other than those listed in Figure 2. The Windows kernel contains a number of
functions that may be used instead of calling DPMI directly. Not all of these
functions are documented. Some of the more useful ones, along with their
equivalent DPMI functions, are listed in Figure 3.
Figure 2: DPMI functions that may be safely called by a Windows application

 DPMI (INT 31h)
 Functions Description
 -------------------------------------------------------------

 0200h Get Real-Mode Interrupt Vector
 0201h Set Real-Mode Interrupt Vector
 0300h Simulate Real-Mode Interrupt
 0301h Call Real-Mode Procedure with Far Return Frame
 0302h Call Real-Mode Procedure with IRET Frame
 0303h Allocate Real-Mode Call-Back Address
 0304h Free Real-Mode Call-Back Address

Figure 3: Subset of DPMI functions supported within the Windows API


 DMPI (INT 31h)
 Functions Decription Windows API Function
 ---------------------------------------------------------------

 0000h Allocate Local Descriptor AllocSelector
 0001h Free Local Descriptor FreeSelector
 0006h Get Segment Base Address GetSelectorBase
 0007h Set Segment Base Address SetSelectorBase
 0008h Set Segment Limit SetSelectorLimit
 0100h Allocate DOS Memory Block GlobalDOSAlloc
 0101h Free DOS Memory Block GlobalDOSFree



The DOS List of Lists


MS-DOS maintains a list of system variables near the beginning of its kernel's
data segment. This undocumented area, coined the "DOS List of Lists," is a
gateway for accessing many other undocumented structures of DOS. Among other
secrets, the List of Lists contains: the address of the first memory control
block; information on the BUFFERS and LASTDRIVE commands in CONFIG.SYS; and
the address of the System File Tables, which maintain the state of open files
in the system. A pointer to the List of Lists can be obtained by calling INT
21h, Function 52h. The address of the List of Lists is returned in the ES:BX
register pair.
Among the undocumented structures in the List of Lists is the address for the
Current Directory Structure (CDS), located at offset 23. The CDS was
introduced as part of the network enhancements made in DOS 3.0. In addition to
working with network files, the CDS is used for manipulating foreign file
systems. There is one CDS for each possible drive on the system. For example,
if LASDRIVE = Z is set in the CONFIG.SYS file,then DOS will maintain an array
of 26 Current Directory Structures. The size of each CDS varies depending on
the version of DOS. In DOS 3.x, the size of the CDS is 81 bytes. From DOS 4.0
and up, the size of the CDS is 88 bytes. But this is an undocumented internal
data structure, so the size and the contents of the CDS are not guaranteed.
--P.C.



_MAKING UNDOCUMENTED DOS CALLS FROM WINDOWS 3_
by Paul Chui


[LISTING ONE]

#define NOCOMM
#include <windows.h>
#include <dos.h>
#include <memory.h>
#include "getdrvx.h"

#define CDS_SUBST (0x1000)

#define DOS3_CDS_SIZE 81
#define DOS4_CDS_SIZE 88
#define CDS_PATH 0x00
#define CDS_FLAGS 0x43
#define CDS_DPB 0x45

#define DPB_FATS 8

#define MK_FP(seg,ofs) ((void far *) (((unsigned long)(seg) << 16)
(unsigned)(ofs)))
#define _DS GetDS()

/* Documented Windows functions not declared in WINDOWS.H */
DWORD FAR PASCAL GlobalDosAlloc(DWORD dwBytes);
WORD FAR PASCAL GlobalDosFree(WORD wSelector);

/* Undocumented Windows functions */
VOID FAR PASCAL SetSelectorBase(WORD wSelector, DWORD dwBase);
VOID FAR PASCAL SetSelectorLimit(WORD wSelector, DWORD dwLimit);

/* Global Variables */
WORD _wSelector; // a data selector


/* Types */
// DOS list of lists structure (for DOS 3.1 or better)
typedef struct {
 BYTE x[22]; // (don't care)
 BYTE far* cds;
 BYTE xx[7]; // (don't care)
 BYTE lastdrive;
 // ... the rest of DOS List of Lists
} DOSLISTS;

// DPMI real-mode call structure
typedef struct {
 DWORD edi, esi, ebp, reserved, ebx, edx, ecx, eax;
 WORD flags, es, ds, fs, gs, ip, cs, sp, ss;
} REALMODECALL;


int GetDosVersionMajor(void)
{
 int dosver;

 _asm mov ah, 0x30;
 _asm int 0x21;
 _asm mov dosver, ax;

 return (BYTE)dosver;
}

unsigned GetDS(void)
{
 _asm mov ax, ds;
 return; // return value in AX
}


/****************************************************************************
 BOOL DPMI_RealModeInt(int intno, REALMODECALL far* r)

 PURPOSE: DPMI simulate real-mode interrupt function. The
 following is an excerpt from the DPMI Specification:

 PARAMETERS:
 int intno real-mode interrupt to simulate
 REALMODECALL far* r

 RETURNS: TRUE if interrupt is successful, FALSE on error

 NOTES: The following is an excerpt from INTEL's DPMI specs:

 To Call

 AX = 0300h
 BL = Interrupt number
 BH = Flags
 Bit 0 = 1 resets the interrupt controller and A20
 line
 Other flags reserved and must be 0
 CX = Number of words to copy from protected mode to
 real-mode stack

 ES:(E)DI = Selector:Offset of real-mode call structure

 Returns

 If function was successful:
 Carry flag is clear.
 ES:(E)DI = Selector:Offset of modified real-mode call
 structure

 If function was not successful:
 Carry flag is set.
****************************************************************************/
BOOL DPMI_RealModeInt(int intno, REALMODECALL far* r)
{
 intno &= 0x00FF; // reset high byte (flags)

 _asm {
 push di
 mov bx, intno // flags real-mode interrupt to call
 mov cx, 0 // don't copy anything from protected mode stack
 les di, r // es:di -> real-mode call structure
 mov ax, 0x0300 // DPMI simulate real-mode interrupt function
 int 0x31
 pop di
 jc error
 }
 return TRUE;

error:
 return FALSE;
}

/****************************************************************************
 BYTE far* PMODE_ADDR(WORD wSel, BYTE far* RMODE_ADDR)

 PURPOSE: Get the Current Directory Structure

 PARAMETERS:
 WORD wSel A valid selector
 BYTE far* RMODE_ADDR A real mode address

 RETURNS: a protected-mode address of RMODE_ADDR

 NOTES: The base address of selector wSel is set to the segment of
 RMODE_ADDR.
*****************************************************************************/
BYTE far* PMODE_ADDR(WORD wSel, BYTE far* RMODE_ADDR)
{
 SetSelectorBase(wSel, (DWORD) FP_SEG(RMODE_ADDR) << 4);
 return MK_FP(wSel, FP_OFF(RMODE_ADDR));
}

/****************************************************************************
 BYTE far* GetCDS(int nDrive)

 PURPOSE: Get the Current Directory Structure

 PARAMETERS: int nDrive The drive to retrieve


 RETURNS: a real-mode pointer to the Current Directory Structure of
 drive nDrive.
*****************************************************************************/
BYTE far* GetCDS(int nDrive)
{
 DOSLISTS far* doslists;
 REALMODECALL r;

 // Get DOS list of lists (INT 21h, Function 52h)
 _fmemset(&r, 0, sizeof(REALMODECALL));
 r.eax = 0x5200;
 if (!DPMI_RealModeInt(0x21, &r))
 return NULL;

 // Pointer to DOS list of lists is returned in ES:BX
 doslists = (DOSLISTS far*)
 PMODE_ADDR( _wSelector, MK_FP(r.es, LOWORD(r.ebx)) );

 if (GetDosVersionMajor() < 4)
 return doslists->cds + (nDrive * DOS3_CDS_SIZE);
 else
 return doslists->cds + (nDrive * DOS4_CDS_SIZE);
}

/****************************************************************************
 BOOL isCDROM(int nDrive)

 PURPOSE: Tests if nDrive is a CD ROM drive

 PARAMETERS: int nDrive The drive to test

 RETURNS: TRUE if nDrive is a CD ROM

*****************************************************************************/
BOOL isCDROM(int nDrive)
{
 WORD saveAX, saveBX;

 _asm mov cx, nDrive;
 _asm mov ax, 0x150B;
 _asm int 0x2F;
 _asm mov saveAX, ax;
 _asm mov saveBX, bx;

 return saveBX == 0xADAD && saveAX != 0;
}


/****************************************************************************
 BOOL isSubstDrive(int nDrive)

 PURPOSE: Tests for drives created using the DOS SUBST command

 PARAMETERS: int nDrive The drive to test

 RETURNS: TRUE if nDrive is a SUBST drive
*****************************************************************************/
BOOL isSubstDrive(int nDrive)
{
 BYTE far* cds; // Current Directory Structure

 WORD cds_flags;

 cds = PMODE_ADDR(_wSelector, GetCDS(nDrive));
 cds_flags = *(WORD far*)(cds+CDS_FLAGS);
 if (cds_flags & (CDS_SUBST))
 return TRUE;

 return FALSE;
}

/****************************************************************************
 BOOL isRamDrive(int nDrive)

 PURPOSE: Tests for Ram drives

 PARAMETERS: int nDrive The drive to test

 RETURNS: TRUE if nDrive is a ram drive

 NOTES: This function tests to see if the drive has only one FAT. It
 is assumed that if this is true, then the drive must be a RAM drive.
*****************************************************************************/
BOOL isRamDrive(int nDrive)
{
 BYTE far* cds; // Current Directory Structure
 BYTE far* dpb; // Drive Parameter Block

 cds = PMODE_ADDR(_wSelector, GetCDS(nDrive));

 dpb = *(BYTE far* far*)(cds+CDS_DPB);
 dpb = PMODE_ADDR(_wSelector, dpb);

 if (*(dpb+DPB_FATS) == 1)
 return TRUE;

 return FALSE;
}

/****************************************************************************
 WORD FAR PASCAL GetDriveTypeX(int nDrive)

 PURPOSE: Determines drive type

 PARAMETERS:
 int nDrive The drive number. 0 = A:, 1 = B:,
 2 = C:, 3 = D:, ...
 RETURNS:
 DRIVE_UNKNOWN Unknown drive type
 DRIVE_NOTEXIST Drive does not exist
 DRIVE_REMOVE Removable (floppy) drive
 DRIVE_FIXED Fixed (hard) drive
 DRIVE_REMOTE Remote (network) drive
 DRIVE_CDROM CD ROM drive
 DRIVE_RAM Ram drive
 DRIVE_SUBST SUBST drive
*****************************************************************************/
WORD FAR PASCAL GetDriveTypeX(int nDrive)
{
 WORD wDriveType;


 // Get a new selector, using the current DS as the prototype
 _wSelector = AllocSelector(_DS);
 SetSelectorLimit(_wSelector, 0xFFFF);

 wDriveType = GetDriveType(nDrive);

 if (isCDROM(nDrive))
 wDriveType = DRIVE_CDROM;
 else
 if (wDriveType != DRIVE_REMOTE && isSubstDrive(nDrive))
 wDriveType = DRIVE_SUBST;
 else
 if (isRamDrive(nDrive))
 wDriveType = DRIVE_RAM;

 FreeSelector(_wSelector);
 return wDriveType;
}

/****************************************************************************
 BOOL FAR PASCAL GetCanonicalPath(LPSTR lpszRelPath, LPSTR lpszTruePath)

 PURPOSE: Resolve path string to canonical path string

 PARAMETERS:
 LPSTR lpszRelPath Relative path string or directory name
 LPSTR lpszTruePath Destination for canonical fully qualified
 path
 RETURNS:
 TRUE if successful
*****************************************************************************/
BOOL FAR PASCAL GetCanonicalPath(LPSTR lpszRelPath, LPSTR lpszTruePath)
{
 BOOL retval;
 DWORD dw;
 WORD wSelector;
 WORD wSegment;
 LPSTR lpszDosBuf;
 REALMODECALL r;

 dw = GlobalDosAlloc(128);
 if (dw == NULL)
 return FALSE;

 wSelector = LOWORD(dw);
 wSegment = HIWORD(dw);

 lpszDosBuf = MK_FP(wSelector, 0);
 _fmemcpy(lpszDosBuf, lpszRelPath, 128);

 r.eax = 0x6000;
 r.ds = wSegment;
 r.esi = 0;
 r.es = wSegment;
 r.edi = 0;

 retval = DPMI_RealModeInt(0x21, &r);


 _fmemcpy(lpszTruePath, lpszDosBuf, 128);
 GlobalDosFree(wSelector);

 return retval;
}

























































February, 1992
PROGRAMMING PARADIGMS


Understanding Multimedia




Michael Swaine


Multimedia has arrived. I read it in PC Computing. What? You're not convinced
by this authoritative testimonial? Sigh. We live in an age of cynicism.
Personally, I'm so jaded I don't even trust cynicism any more. Talking with a
musician/programmer recently convinced me that, for all the hype and hoopla
over multimedia, there are real technological issues here, and there is a real
market.
Or rather, there are several technologies and several markets. And the
markets--or the various views of the market for multimedia stuff -- don't map
in any very useful way onto the different technologies involved.
It's all rather messy, which may help to explain why the fluff flags go up
whenever the subject is raised. The musician/programmer I mentioned admits
that, for his company, today's multimedia market isn't necessarily tomorrow's
multimedia market.
That musician/programmer is named Roger Powell, and he's the manager for audio
applications for Silicon Graphics. Talking with Powell set me to thinking
about, and then to reading up on, the multimedia phenomena: the emerging
standards and the forming markets and the players who appear to be molding
this clay, and the very pragmatic question one keeps coming back to: What is
this stuff for, anyway?


After Utopia, What?


Powell's story is more or less the standard computer meets music, music goes
on the road, computer finds music again story. Originally a professional
keyboard artist, Powell got involved early on with the Arp synthesizer
company. "I've always had this dual interest in music and technology," he
recalls, "and when personal computers became available I gravitated toward
them and began learning programming as well as making records." Make records
he did, with Todd Rundgren's group Utopia, but he never got far from computer
technology. When S-100 machines arrived on the scene in the late '70s, Powell
began assembling the hardware and writing assembly language sequencers to
control analog synthesizers.
"Things got a little more sophisticated when the Apple II and the IBM PC came
out, and there were actually good C compilers," he told me. "And then MIDI was
invented, which provided a protocol and some hardware to actually talk to
these instruments." Around 1985 he released a product named "Textures," one of
the first MIDI sequencing programs for the IBM PC. He was still with Utopia
then, still working as a professional musician.
When Utopia quit touring, Powell got into engineering heavily, working for a
Boulder, Colorado company that was producing digital audio workstations,
machines targeted at the professional audio production market. He worked there
"for about four years, primarily writing user interface code in Microsoft
Windows, which [ran] on a computer that connected to the workstation." When he
got a call from Silicon Graphics describing what they had in mind for audio,
he soon found himself working there.
It's easy enough to see why Powell is doing what he's doing. Like Steve
Wozniak building the Apple I or Bill Gates writing Altair Basic, he's engaged
in work that is largely indistinguishable from what he would call play. He's
doing it at least in part for his own satisfaction.
The analogy gets stressed when you press it. What Woz and Gates did turned out
to be eminently marketable. Nobody needs to defend or define the market for
personal computers or development software. It's a different story with
multimedia.
It's tempting to ask, even if there is a multimedia market, why is there a
multimedia market? Bill Gates is one of the people I've heard answer that
question, essentially, in this way: Because today's machines have more power
than can be justified if all you're going to do is run spreadsheets and word
processors. There needs to be a power-hungry application to justify the new
hardware. A story well suited to a cynical age.


Lights, Camera, Action


The product that inspired PC Computing to announce, in its December 1991
issue, that multimedia had arrived is a program from Macromind called
"Action." Action is a Windows application that builds on some emerging
standards for multimedia delivery platforms and on multimedia extensions for
Windows from Microsoft. It represents one answer to the question, What is
multimedia for? In the Action paradigm, multimedia is advanced presentation
software. The user developing a presentation can use Action to add sound,
video clips, animation, or interactivity. More impressively, Action lets one
add sound, video clips, animation, and interactivity to presentations: It
facilitates the simultaneous, synchronized presentation of information in
different media.
That's one definition of multimedia: advanced presentations. The Silicon
Graphics path to multimedia suggests a different definition, one based on the
democratization of professional audio and video production. That view grows
out of the company's existing customer base, and it explains why Silicon
Graphics hires people such as Roger Powell. But, as we shall see presently,
Silicon Graphics doesn't have just one view of multimedia.
But back to Macromind. Macromind is worth watching. Besides Action, Macromind
also makes multimedia tools for the Mac; in fact, it is the multimedia
company, more than any other software company. Its name isn't Macromind any
more, though, since it merged with another big media company, Paracomp.
Farallon used to be another player in multimedia, with a set of sound tools, a
base on the Mac, and a serious commitment to Windows. But Farallon recently
decided to concentrate on its connectivity products, and sold off its sound
tools to--right, Macromind-Paracomp. The company is gobbling up the multimedia
applications market for Macs and PCs.
But an application software company isn't in the best position to create the
standards. Apple and Microsoft have taken serious steps toward establishing
standards for multimedia.
Basically, multimedia requires tools and standards for sound, video,
animation, music, and why not 3-D graphics while you're at it. Multimedia
systems need to be able to store and retrieve video and sound in real time,
support acceptable and/or professional quality sound and video, deal with
existing formats and devices, edit, synchronize--there are many problems and
questions.
Apple's QuickTime is both software and a standard for time-based media: video,
sound, animation. It includes compression and a standard format named "Movie"
for digital video and sound. On existing hardware, QuickTime Movies are going
to be small, slow, and monophonic. Not exactly Michael Jackson, but the fault
is more in the current state of Macintosh hardware than in QuickTime, which is
generally regarded as pretty spiffy. QuickTime is software, so it can make
almost any Mac multimedia ready, by this limited definition.
Microsoft led the way in creating a standard for multimedia-ready machines in
the DOS/Windows universe. The Multimedia PC (MPC) standard is pretty minimal,
but the existence of a standard let Microsoft get on with extending Windows to
support multimedia. Microsoft is turning technology licensed from Macromind
into a movie format akin to Apple's. MPC is essentially a delivery platform
standard.
The first MPC-compliant machine announced was Tandy's five boxes, introduced
last summer, each containing the requisite CD-ROM drive, audio circuitry,
Windows with multimedia extensions, and stereo sound drivers. Without
monitors, the machines start at $2599.
The Apple-IBM multimedia venture, Kaleida, hints at higher bar, both for
authoring and delivery. The most salient feature of Kaleida appears to be a
media scripting system that will evolve from QuickTime and be usable on DOS,
OS/2, and Mac systems.


Defining the Development System


And then there is the definition provided by Silicon Graphics' Personal IRIS
line, in particular the Indigo.
Indigo is breaking new ground for Silicon Graphics, but gradually. "The
premier customer for Silicon Graphics," Powell told me, "is the creative or
technical professional. Our traditional markets have been the scientist's
desktop, but we're also very strong in the entertainment industry, from the
computer animation standpoint. Our machines are used quite heavily in the film
industry for doing animation and effects. 'Terminator II,' Michael Jackson
videos, that sort of thing."
Indigo puts some of this professional video and audio capability on a machine
that compares favorably in price with heavily decked out, high-end Macs and
PCs. The $8000 entry price for an Indigo can quickly ramp up to $10-12,000
with additional disk drives and add-ons, but for media development, a lot of
what you'd want is already there. Powell paints the picture of the short-term
market, which is the existing customer base. "The low-end machines can
populate the desktops and connect to the larger machines, which might be
render servers. Things that take an enormous amount of processing you might
offload to the bigger machines, but a lot of the work can be done at the
desktop [with an Indigo]."
But what about emerging markets? I asked him. He agreed that the Indigo was
viewed a wedge for opening new markets for Silicon Graphics, particularly
digital media authoring, "because the price factor puts you in the category of
a high-end Mac or PC, and the capability is far greater than you get with
those machines. Those machines are pushing the envelope on the processors
installed in them and the peripherals that you can plug into them. You start
plugging cards into a Mac to try to get it to be a digital audio workstation
...you're sucking up all the power of the machine and you have no expansion
path."
If he's right, it puts an ironic twist to the argument that the multimedia
market exists because the machines are powerful enough to support it.
Is he right? I waded through descriptions of reasonable development systems
based on PCs, Macs, and Amigas, culled from press releases, magazine articles,
and conversations with developers. Here's the picture I get:
It's possible to put together a Mac, PC, or Amiga system that displays color
3-D animations; gives the user control over on-screen motion video and lets
the developer capture and edit video; accepts and presents CD-quality sound;
and has software tools for multimedia integration, painting, and 3-D modeling.
I come up with something like $12,000 for a minimal Mac system. You can get
into multimedia development for well under $10,000 on an Amiga or a PC, but
you'll definitely be cutting some corners.
The Indigo system that checks all the same boxes will probably run closer to
$12,000 than to the entry $8,000, but in general you're getting more with the
Indigo. It's built around a RISC architecture; is ACE-compatible; has an audio
subsystem that isn't affected by CPU load; and has a video bus that cuts the
costs of video cards because video and graphics share common back-end
circuitry. The operating system is IRIX, which is SVR3 UNIX with 4.3BSD
extensions such as TCP/IP and NFS network protocols. IRIX implements the X11R4
window system, Motif, and Display PostScript.
A look at the back of the case shows how much is built in: a stereo digital
I/O jack for input and output from or to a DAT deck, CD player, or MIDI
device; stereo line-in and line-out jacks for analog I/O; a microphone input
jack; a stereo headphone jack; two Mac-compatible serial ports; a SCSI II
port; an Ethernet port; and a bidirectional Centronics parallel port.
The figures are for system, memory, monitor, necessary cards, disk storage,
and software. Video cameras, scanners, and other such devices are extra.
The Indigo system strikes me as a gorgeous machine to play with while trying
to decide what the multimedia market is.































































February, 1992
C PROGRAMMING


D-Flat Clipboard, Searches, and Pictures


 This article contains the following executables: DFLT10.ARC D10TXT.ARC


Al Stevens


This month we introduce a new D-Flat window class, the PICTUREBOX, and provide
the code that supports the clipboard and text searching for the EDITBOX class.
For those of you who came to this project in the middle, D-Flat is a public
domain function C language library that implements the Common User Access
(CUA) user interface in text mode on DOS computers. First, though, a
travelogue.


Changes in Latitudes


I spent the better part of this week in Key West following Judy around and
watching the bicycles, scooters, tourists, and sunsets. When you get away from
Duval Street and the endless array of gift shops and restaurants, you can
stroll quietly and aimlessly among a motley collection of souls and domiciles
in the best climate imaginable. It is November, and The Season hasn't started.
There is a gentle sea breeze, the humidity is down, and the temperature is
hanging at just around 80. Forget work ethics. This is a haven and a hideout
for the formerly overworked. You can forget about home and office. I can't
imagine how Truman, Hemingway, or Jimmy Buffet got anything done here. My
column is two days late and so the truth is out. I am a closet bum. Even the
sight of Judy flashing the MasterCard around town fails to stir me into
action. The laptop is in the hotel room, idle. It might as well be at the
bottom of that cool, deep, green Gulf of Mexico.
You find time to reflect, though. Thinking doesn't take much motion, and all
the bars are outdoors where bars ought to be. Only last week I was at the
extreme other end of the forty-eight in Washington State. It was cold and
raining. An intense group of young men was telling me about how programming is
going to be done soon. We were inside. There wasn't a bar in sight. They are
well into this new technology and building some of the tools, and they want me
to write about it. They talk fast and with precision. Their waking moments are
spent knowing about and shaping things that the dirty, unshaven old guy over
there on that ratty old bicycle doesn't know or care about. He's playing a
harmonica and talking to his hat. The two ends of knowledge. Maybe these
programmers don't know it, but they'd trade places with him for a while. He
wouldn't. You can't quite see the Pacific Northwest from Key West. You can't
see very far at all.


Thirty-Something


Do you know what a DOS extender is? The difference between 32-bit flat and
32-bit small memory models? How many bytes a gigabyte has? How protected-mode
programming really works? Many of you know these things. Many more of you do
not. If you are a PC programmer, you will be hearing more about the 32-bit
world. It is where DOS should have been a long time ago. The hardware wasn't
there. One thing and another kept memory chips and snappy processors out of
reach for a while. The cost of sand, they told us. International trade
embargos. Now you can get a 20-MHz 386 at the grocery store. The software is
going to have to catch up. We already have the C compilers and the DOS
extenders. There will be more, and there will be operating environments that
make use of all that wasted computer power.


The D-Flat Clipboard


D-Flat has a clipboard that works like the one defined in the CUA
specification. A clipboard is a data store into which you cut, copy, and paste
information. In a GUI, the clipboard can contain graphics or text. D-Flat is a
text-mode system. Therefore, it can store only text in its clipboard.
Users use the clipboard to move data around in an application. You cut or copy
a block of text to the clipboard from a marked location in an edit box.
Copying the text does just that; it copies the text into the clipboard.
Cutting the text copies it and then deletes it from its original location. We
discussed marking text blocks last month.
When the clipboard has text in it, the user can paste that text somewhere else
in the document. Pasting inserts the text at the current keyboard cursor
location just as if the user had retyped it.
Listing One, page 138, is clipbord.c, the code that implements copying and
pasting. The editbox.c source file from last month has code that calls the
functions in clipbord.c. When the user performs a cut operation, the editbox
code uses the clipboard's copy routine and then uses the block delete routine
in editbox.c to delete the text block. The Clipboard character pointer in
clipbord.c points to the clipboard's text. When that pointer is NULL, the
clipboard is empty. When the user has done the first cut or copy, the
clipboard then has contents for as long as the program runs. Each new cut or
copy replaces the clipboard's current contents. The ClipboardLength variable
contains the number of characters in the clipboard.
The CopyToClipboard function copies the marked text into the clipboard. First
it computes pointers to the beginning and end of the text block and the
block's length. Then it reallocates the clipboard's memory buffer and copies
the marked block into the new memory buffer.
The PasteText function inserts the clipboard's text into the editbox at the
current cursor location. This function accepts the clipboard's buffer address
and text length as parameters rather than from the global Clipboard and
ClipboardLength variables because the editbox's undo feature uses the
PasteText function to reinsert text into the editbox from the undo buffer. The
function first computes a new length for the editbox's buffer by summing the
current length of the buffer and the length of the text to be pasted. If the
new length is less than the current size of the buffer, the function
reallocates the buffer to the new length. Next, it must open a hole in the
text to accept the pasted text. The function builds a pointer to the current
cursor location and one to that location plus the length of the text to be
pasted. Then it shifts the editbox text from the first location to the second
using the length of the text from the first location to the end of the buffer.
Finally, the PasteText function moves the text to be pasted into the hole that
it just created. It marks the text as having been changed and calls the
BuildTextPointers function to rebuild the text buffer's text pointer array.


Searching for Text


Listing Two, page 138, is search.c, the code that allows a user to search for
and replace string values in an editbox's text. There are three functions that
the application program can call. The ReplaceText function uses the
ReplaceTextDB dialog box for the user to enter a text string to search for and
a text string to replace matches with. The SearchText function uses the
SearchTextDB dialog box for the user to enter a text string to search for with
no replacement. The SearchNext function assumes that there was a previous call
to the SearchText function and continues the search past the point where the
previous search found a match. All three functions are called from the
application program. In the example Memopad application that I use, the
window-processing module for the editbox document windows calls these
functions when the user makes the corresponding selection from the application
window's Search menu.
All three functions use the common SearchTextBox function, which performs the
searches and replacements. The first parameter is the window handle of the
edit box being searched. The second parameter is a true/false indicator that
tells the function whether or not a replacement is involved. The third
parameter is true to tell the function to start at one past the current cursor
position and false to tell it to start at the current cursor position. This
parameter prevents the function from matching the same string on successive
calls from SearchNext. The search algorithm is a simple text scan that
compares the text at each location with the search string. The comparison is
case-sensitive only when the Check-Case variable is false. Before calling the
SearchTextBox function, the program sets this variable based on the setting of
a check box in the dialog box. When the program finds a match on the search
string, it marks the matching text block in the editbox and positions the
keyboard cursor at the first character of the block. If the operation includes
a replacement string, the program calls the ReplaceText function to replace
the matching text with the specified string. That function will expand or
collapse the window's text buffer depending on whether the new text is bigger
or smaller than the text it is replacing.


The PICTUREBOX Window Class


The D-Flat PICTUREBOX window class allows you to use the PC's graphics
character set to draw horizontal and vertical lines and bars into a text
window. This feature supports windows that have custom borders and frames
inside the data space. It also lets you draw bar charts. The PICTUREBOX window
class records the presence of vectors and bars in the VectorList array, which
is an array of VECT structures, each of which defines a vector or bar. A VECT
structure contains a RECT structure that defines the dimensions of the vector
or bar, and a VectTypes variable that indicates whether the item is a vector
or a bar and, if it is a bar, the character that is used to display the bar. A
vector is a single line made from the characters that draw single lines. You
used those same characters to build window frames. A bar is a line made of the
block characters from the graphics character set. There are four different
blocks with different textures.
 Listing Three, page 138, is pictbox.c, the code that implements the
PICTUREBOX class. The applications program interface to a picture box is
fairly simple. You create a window of the class and call functions or send
messages to draw vectors and bars.
For example, the DRAWVECTOR message draws a horizontal or vertical line on a
picture box. It would seem that using the graphics character set to draw a
line is a straightforward operation. This would be true if the lines you drew
never intersected with other lines you drew. But lines do intersect, and the
line-drawing algorithm must deal with those intersections appropriately. That
means that as the program is drawing the line, it must look at the character
that is being overwritten. If that character is another vector character, the
program must draw that element of the new vector with a character that
correctly represents the intersection. The substituted character will be
different, depending on whether the new vector is horizontal or vertical;
whether the vector being intersected is horizontal or vertical, whether the
intersecting character is the first, middle, or last character of the new
vector; whether the character being intersected is the first, middle, or last
character of the intersected vector; and what the intersected vector character
is. It might be a simple line character or it might be a corner or other
character that resulted from an earlier intersection. The PC's graphics
character set includes horizontal and vertical straight lines, a character
where horizontal vertical lines intersect forming a cross, the four corners of
a box, and four different T formations where a line begins or ends at the
intersection of another line.
The PaintVector function in pictbox.c begins with a straight line character,
either horizontal or vertical. It then proceeds to write that character into
the window for the length of the vector--except when the character it is
replacing is itself a vector character. The CharInWnd array contains all the
characters that, when intersected, indicate that another vector is being
intersected. When that happens, the program must make a substitution. The
VectCvt table is an array with four dimensions. Its purpose is to return the
substitution character to draw the current element of the new vector. Its
first dimension is 0 through 2, depending on whether the intersected vector
character is the first, middle, or last character of that vector. The
FindVector function computes that subscript by searching the window's VectList
array for the last vector in the array that passes through the intersected
character position. Then, based on the relative position of the character to
the vector, the FindVector function returns the 0 through 2 subscript. The
second dimension of the VectCvt array is 0 through 12, depending on which of
the 13 vector-drawing characters was intersected. This value is computed from
the relative position of the character in the CharInWnd array. The third
dimension of the VectCvt array is 0 if the new vector is horizontal and 1 if
it is vertical. The last dimension of the array is 0, 1, or 2, depending on
whether the intersecting character is the first, middle, or last character of
the new vector. Those four subscripts will return the correct character to
display from the VectCvt array.
The DrawVector function sends the DRAWVECTOR message, the DrawBox function
sends the DRAWBOX message and the DrawBar function sends the DRAWBAR message.
You pass the DrawVector function the window handle, x and y coordinates of the
first position of the vector, the vector's length in character positions, and
a true/false indicator that is true for a horizontal vector and false for a
vertical vector. The DrawVector function sends the DRAWVECTOR message to the
window. The message passes a pointer to a RECT structure that defines the
vector. The RECT has either a one-character height or width, depending on
whether the vector is horizontal or vertical. You could send the message
yourself, but the function is easier to work with. Often a function is simpler
and easier to understand than sending a message. One of the ways that the
Windows SDK evolved was that more and more functions were introduced that took
over the drudgery of some of the less-intuitive messages.
The DRAWBOX message expands itself into four DRAWVECTOR messages to draw the
four vectors of the box. The DRAWBAR message is similar to the DRAWVECTOR
message except that it does not care about intersections.



The Calendar


Listing Four, page 140, is calendar.c, a program that uses the vectors in the
PICTUREBOX class to display a calendar. The PICTUREBOX class derives from the
TEXTBOX class, so a program can display text as well as vectors and bars. The
Calendar function gets the current time and date and creates a PICTUREBOX
window to display the calendar. When the window-processing module,
CalendarProc, gets the CREATE_WINDOW message, it draws a box around where the
days of the month will display and builds the month's 5 x 7 grid by drawing
intersecting vectors. The PAINT message displays the dates of the month in the
correct grids by calculating the day of the week that the first of the month
starts on and filling the grid from there. The program displays the current
date in red and the others in black. The KEYBOARD message intercepts the PgUp
and PgDn keys to allow the user to page through the calendar a month at a
time.


The Bar Chart


Listing Five, page 142, is barchart.c, which displays an example of a bar
chart. It begins by creating a PICTUREBOX window and writing some text into
the window. To simulate how a program would dynamically build a bar chart from
an array of information, the bar chart program uses a simple structure that
contains descriptive text and the start and stop positions of the associated
bars on the chart. This simple example imitates a project schedule. The start
and stop fields are the start and stop months of each project in the array.
The program draws a horizontal bar for each project, varying the display
character for the bar so that the chart will provide visual separation of the
bars. Figure 1 shows the calendar and the bar chart as the example memopad
program displays them.


How to Get D-Flat Now


The D-Flat source code is on CompuServe in Library 0 of DDJ Forum and on M&T
Online. If you cannot use either online service, send a formatted 360K or 720K
diskette and an addressed, stamped diskette mailer to me in care of DDJ, 411
Borel Ave., Suite 100, San Mateo, CA 94402-3522. I'll send you the latest
version of D-Flat. The software is free, but if you care to, stick a dollar
bill in the mailer for the Brevard County Food Bank. They give help to
homeless and hungry families. We've collected about $900 so far from generous
D-Flat "careware" users. If you want to discuss D-Flat with me, use
CompuServe. My CompuServe ID is 71101,1262, and I monitor DDJ Forum daily.


Just You Wait, 'enry 'iggins


In a letter to the editor, a reader takes me to task for my disapproval of the
ANSI committee's coinage of the non-word "stringize." I discuss the matter
with Kathy, my teenage niece. She carries the question to a higher authority,
her English teacher. Should we accept without issue, as the reader suggests,
all new mutations of language and usage, treating them as the natural
evolution of communication in a growing society? Kathy reports back, "She went
whoa and I'm like hey get out of my face." Pardon me, but I do not know how to
punctuate that sentence.


_C PROGRAMMING COLUMN_
by Al Stevens



[LISTING ONE]

/* ----------- clipbord.c ------------ */
#include "dflat.h"

char *Clipboard;
int ClipboardLength;

void CopyToClipboard(WINDOW wnd)
{
 if (TextBlockMarked(wnd)) {
 char *bbl=TextLine(wnd,wnd->BlkBegLine)+wnd->BlkBegCol;
 char *bel=TextLine(wnd,wnd->BlkEndLine)+wnd->BlkEndCol;
 ClipboardLength = (int) (bel - bbl);
 Clipboard = realloc(Clipboard, ClipboardLength);
 if (Clipboard != NULL)
 memmove(Clipboard, bbl, ClipboardLength);
 }
}

void PasteText(WINDOW wnd, char *SaveTo, int len)
{
 if (SaveTo != NULL && len > 0) {
 int plen = strlen(wnd->text) + len;
 char *bl, *el;

 if (plen > wnd->textlen) {
 wnd->text = realloc(wnd->text, plen+2);
 wnd->textlen = plen;

 }
 if (wnd->text != NULL) {
 bl = CurrChar;
 el = bl+len;
 memmove(el, bl, strlen(bl)+1);
 memmove(bl, SaveTo, len);
 BuildTextPointers(wnd);
 wnd->TextChanged = TRUE;
 }
 }
}






[LISTING TWO]

/* ---------------- search.c ------------- */
#include "dflat.h"

extern DBOX SearchTextDB;
extern DBOX ReplaceTextDB;
static int CheckCase = TRUE;

/* - case-insensitive, white-space-normalized char compare - */
static int SearchCmp(int a, int b)
{
 if (b == '\n')
 b = ' ';
 if (CheckCase)
 return a != b;
 return tolower(a) != tolower(b);
}

/* ----- replace a matching block of text ----- */
static void replacetext(WINDOW wnd, char *cp1, DBOX *db)
{
 char *cr = GetEditBoxText(db, ID_REPLACEWITH);
 char *cp = GetEditBoxText(db, ID_SEARCHFOR);
 int oldlen = strlen(cp); /* length of text being replaced */
 int newlen = strlen(cr); /* length of replacing text */
 int dif;
 if (oldlen < newlen) {
 /* ---- new text expands text size ---- */
 dif = newlen-oldlen;
 if (wnd->textlen < strlen(wnd->text)+dif) {
 /* ---- need to reallocate the text buffer ---- */
 int offset = (int)(cp1-wnd->text);
 wnd->textlen += dif;
 wnd->text = realloc(wnd->text, wnd->textlen+2);
 if (wnd->text == NULL)
 return;
 cp1 = wnd->text + offset;
 }
 memmove(cp1+dif, cp1, strlen(cp1)+1);
 }
 else if (oldlen > newlen) {

 /* ---- new text collapses text size ---- */
 dif = oldlen-newlen;
 memmove(cp1, cp1+dif, strlen(cp1)+1);
 }
 strncpy(cp1, cr, newlen);
}

/* ------- search for the occurrance of a string ------- */
static void SearchTextBox(WINDOW wnd, int Replacing, int incr)
{
 char *s1, *s2, *cp1;
 DBOX *db = Replacing ? &ReplaceTextDB : &SearchTextDB;
 char *cp = GetEditBoxText(db, ID_SEARCHFOR);
 int rpl = TRUE, FoundOne = FALSE;

 while (rpl == TRUE && cp != NULL) {
 rpl = Replacing ?
 CheckBoxSetting(&ReplaceTextDB, ID_REPLACEALL) :
 FALSE;
 if (TextBlockMarked(wnd)) {
 ClearTextBlock(wnd);
 SendMessage(wnd, PAINT, 0, 0);
 }
 /* search for a match starting at cursor position */
 cp1 = CurrChar;
 if (incr)
 cp1++; /* start past the last hit */
 /* --- compare at each character position --- */
 while (*cp1) {
 s1 = cp;
 s2 = cp1;
 while (*s1 && *s1 != '\n') {
 if (SearchCmp(*s1, *s2))
 break;
 s1++, s2++;
 }
 if (*s1 == '\0' *s1 == '\n')
 break;
 cp1++;
 }
 if (*s1 == 0 *s1 == '\n') {
 /* ----- match at *cp1 ------- */
 FoundOne = TRUE;

 /* mark a block at beginning of matching text */
 wnd->BlkEndLine = TextLineNumber(wnd, s2);
 wnd->BlkBegLine = TextLineNumber(wnd, cp1);
 if (wnd->BlkEndLine < wnd->BlkBegLine)
 wnd->BlkEndLine = wnd->BlkBegLine;
 wnd->BlkEndCol =
 (int)(s2 - TextLine(wnd, wnd->BlkEndLine));
 wnd->BlkBegCol =
 (int)(cp1 - TextLine(wnd, wnd->BlkBegLine));

 /* position the cursor at the matching text */
 wnd->CurrCol = wnd->BlkBegCol;
 wnd->CurrLine = wnd->BlkBegLine;
 wnd->WndRow = wnd->CurrLine - wnd->wtop;


 /* align the window scroll to matching text */
 if (WndCol > ClientWidth(wnd)-1)
 wnd->wleft = wnd->CurrCol;
 if (wnd->WndRow > ClientHeight(wnd)-1) {
 wnd->wtop = wnd->CurrLine;
 wnd->WndRow = 0;
 }

 SendMessage(wnd, PAINT, 0, 0);
 SendMessage(wnd, KEYBOARD_CURSOR,
 WndCol, wnd->WndRow);

 if (Replacing) {
 if (rpl YesNoBox("Replace the text?")) {
 replacetext(wnd, cp1, db);
 wnd->TextChanged = TRUE;
 BuildTextPointers(wnd);
 }
 if (rpl) {
 incr = TRUE;
 continue;
 }
 ClearTextBlock(wnd);
 SendMessage(wnd, PAINT, 0, 0);
 }
 return;
 }
 break;
 }
 if (!FoundOne)
 MessageBox("Search/Replace Text", "No match found");
}

/* ------- search for the occurrance of a string,
 replace it with a specified string ------- */
void ReplaceText(WINDOW wnd)
{
 if (CheckCase)
 SetCheckBox(&ReplaceTextDB, ID_MATCHCASE);
 if (DialogBox(wnd, &ReplaceTextDB, TRUE, NULL)) {
 CheckCase=CheckBoxSetting(&ReplaceTextDB,ID_MATCHCASE);
 SearchTextBox(wnd, TRUE, FALSE);
 }
}

/* ------- search for the first occurrance of a string ------ */
void SearchText(WINDOW wnd)
{
 if (CheckCase)
 SetCheckBox(&SearchTextDB, ID_MATCHCASE);
 if (DialogBox(wnd, &SearchTextDB, TRUE, NULL)) {
 CheckCase=CheckBoxSetting(&SearchTextDB,ID_MATCHCASE);
 SearchTextBox(wnd, FALSE, FALSE);
 }
}

/* ------- search for the next occurrance of a string ------- */
void SearchNext(WINDOW wnd)
{

 SearchTextBox(wnd, FALSE, TRUE);
}






[LISTING THREE]

/* -------------- pictbox.c -------------- */

#include "dflat.h"

typedef struct {
 enum VectTypes vt;
 RECT rc;
} VECT;

unsigned char CharInWnd[] = "D3Z?Y@EC4AB";

unsigned char VectCvt[3][11][2][4] = {
 { /* --- first character in collision vector --- */
 /* ( drawing D ) ( drawing 3 ) */
 {{"DDD"}, {"ZC@"}},
 {{"ZB?"}, {"333"}},
 {{"ZBB"}, {"ZCC"}},
 {{"???"}, {"???"}},
 {{"YYY"}, {"YYY"}},
 {{"@AA"}, {"CC@"}},
 {{"EEE"}, {"EEE"}},
 {{"CEE"}, {"CCC"}},
 {{"444"}, {"444"}},
 {{"AAA"}, {"AAA"}},
 {{"BBB"}, {"BEE"}} },
 { /* --- middle character in collision vector --- */
 /* ( drawing D ) ( drawing 3 ) */
 {{"DDD"}, {"BEA"}},
 {{"CE4"}, {"333"}},
 {{"ZZZ"}, {"ZZZ"}},
 {{"???"}, {"???"}},
 {{"YYY"}, {"YYY"}},
 {{"@@@"}, {"@@@"}},
 {{"EEE"}, {"EEE"}},
 {{"CCC"}, {"CCC"}},
 {{"EE4"}, {"444"}},
 {{"AAA"}, {"EEA"}},
 {{"BBB"}, {"BBB"}} },
 { /* --- last character in collision vector --- */
 /* ( drawing D ) ( drawing 3 ) */
 {{"DDD"}, {"?4Y"}},
 {{"@AY"}, {"333"}},
 {{"ZZZ"}, {"ZZZ"}},
 {{"BB?"}, {"?44"}},
 {{"AAY"}, {"44Y"}},
 {{"@@@"}, {"@@@"}},
 {{"EEE"}, {"EEE"}},
 {{"CCC"}, {"CCC"}},
 {{"EE4"}, {"444"}},

 {{"AAA"}, {"EEA"}},
 {{"BBB"}, {"BBB"}} }
};

/* -- compute whether character is first, middle, or last -- */
static int FindVector(WINDOW wnd, RECT rc, int x, int y)
{
 RECT rcc;
 VECT *vc = wnd->VectorList;
 int i, coll = -1;
 for (i = 0; i < wnd->VectorCount; i++) {
 if ((vc+i)->vt == VECTOR) {
 rcc = (vc+i)->rc;
 /* --- skip the colliding vector --- */
 if (rcc.lf == rc.lf && rcc.rt == rc.rt &&
 rcc.tp == rc.tp && rc.bt == rcc.bt)
 continue;
 if (rcc.tp == rcc.bt) {
 /* ---- horizontal vector,
 see if character is in it --- */
 if (rc.lf+x >= rcc.lf && rc.lf+x <= rcc.rt &&
 rc.tp+y == rcc.tp) {
 /* --- it is --- */
 if (rc.lf+x == rcc.lf)
 coll = 0;
 else if (rc.lf+x == rcc.rt)
 coll = 2;
 else
 coll = 1;
 }
 }
 else {
 /* ---- vertical vector,
 see if character is in it --- */
 if (rc.tp+y >= rcc.tp && rc.tp+y <= rcc.bt &&
 rc.lf+x == rcc.lf) {
 /* --- it is --- */
 if (rc.tp+y == rcc.tp)
 coll = 0;
 else if (rc.tp+y == rcc.bt)
 coll = 2;
 else
 coll = 1;
 }
 }
 }
 }
 return coll;
}

static void PaintVector(WINDOW wnd, RECT rc)
{
 int i, cw, fml, vertvect, coll, len;
 unsigned int ch, nc;

 if (rc.rt == rc.lf) {
 /* ------ vertical vector ------- */
 nc = '3';
 vertvect = 1;

 len = rc.bt-rc.tp+1;
 }
 else {
 /* ------ horizontal vector ------- */
 nc = 'D';
 vertvect = 0;
 len = rc.rt-rc.lf+1;
 }

 for (i = 0; i < len; i++) {
 unsigned int newch = nc;
 int xi = 0, yi = 0;
 if (vertvect)
 yi = i;
 else
 xi = i;
 ch = videochar(GetClientLeft(wnd)+rc.lf+xi,
 GetClientTop(wnd)+rc.tp+yi);
 for (cw = 0; cw < sizeof(CharInWnd); cw++) {
 if (ch == CharInWnd[cw]) {
 /* ---- hit another vector character ---- */
 if ((coll=FindVector(wnd, rc, xi, yi)) != -1) {
 /* compute first/middle/last subscript */
 if (i == len-1)
 fml = 2;
 else if (i == 0)
 fml = 0;
 else
 fml = 1;
 newch = VectCvt[coll][cw][vertvect][fml];
 }
 }
 }
 PutWindowChar(wnd, newch, rc.lf+xi, rc.tp+yi);
 }
}

static void PaintBar(WINDOW wnd, RECT rc, enum VectTypes vt)
{
 int i, vertbar, len;
 unsigned int tys[] = {'[', '2', '1', '0'};
 unsigned int nc = tys[vt-1];

 if (rc.rt == rc.lf) {
 /* ------ vertical bar ------- */
 vertbar = 1;
 len = rc.bt-rc.tp+1;
 }
 else {
 /* ------ horizontal bar ------- */
 vertbar = 0;
 len = rc.rt-rc.lf+1;
 }

 for (i = 0; i < len; i++) {
 int xi = 0, yi = 0;
 if (vertbar)
 yi = i;
 else

 xi = i;
 PutWindowChar(wnd, nc, rc.lf+xi, rc.tp+yi);
 }
}

static void PaintMsg(WINDOW wnd)
{
 int i;
 VECT *vc = wnd->VectorList;
 for (i = 0; i < wnd->VectorCount; i++) {
 if (vc->vt == VECTOR)
 PaintVector(wnd, vc->rc);
 else
 PaintBar(wnd, vc->rc, vc->vt);
 vc++;
 }
}

static void DrawVectorMsg(WINDOW wnd,PARAM p1,enum VectTypes vt)
{
 if (p1) {
 wnd->VectorList = realloc(wnd->VectorList,
 sizeof(VECT) * (wnd->VectorCount + 1));
 if (wnd->VectorList != NULL) {
 VECT vc;
 vc.vt = vt;
 vc.rc = *(RECT *)p1;
 *(((VECT *)(wnd->VectorList))+wnd->VectorCount)=vc;
 wnd->VectorCount++;
 }
 }
}

static void DrawBoxMsg(WINDOW wnd, PARAM p1)
{
 if (p1) {
 RECT rc = *(RECT *)p1;
 rc.bt = rc.tp;
 SendMessage(wnd, DRAWVECTOR, (PARAM) &rc, TRUE);
 rc = *(RECT *)p1;
 rc.lf = rc.rt;
 SendMessage(wnd, DRAWVECTOR, (PARAM) &rc, FALSE);
 rc = *(RECT *)p1;
 rc.tp = rc.bt;
 SendMessage(wnd, DRAWVECTOR, (PARAM) &rc, TRUE);
 rc = *(RECT *)p1;
 rc.rt = rc.lf;
 SendMessage(wnd, DRAWVECTOR, (PARAM) &rc, FALSE);
 }
}

int PictureProc(WINDOW wnd, MESSAGE msg, PARAM p1, PARAM p2)
{
 switch (msg) {
 case PAINT:
 BaseWndProc(PICTUREBOX, wnd, msg, p1, p2);
 PaintMsg(wnd);
 return TRUE;
 case DRAWVECTOR:

 DrawVectorMsg(wnd, p1, VECTOR);
 return TRUE;
 case DRAWBOX:
 DrawBoxMsg(wnd, p1);
 return TRUE;
 case DRAWBAR:
 DrawVectorMsg(wnd, p1, p2);
 return TRUE;
 case CLOSE_WINDOW:
 if (wnd->VectorList != NULL)
 free(wnd->VectorList);
 break;
 default:
 break;
 }
 return BaseWndProc(PICTUREBOX, wnd, msg, p1, p2);
}

static RECT PictureRect(int x, int y, int len, int hv)
{
 RECT rc;
 rc.lf = rc.rt = x;
 rc.tp = rc.bt = y;
 if (hv)
 /* ---- horizontal vector ---- */
 rc.rt += len-1;
 else
 /* ---- vertical vector ---- */
 rc.bt += len-1;
 return rc;
}

void DrawVector(WINDOW wnd, int x, int y, int len, int hv)
{
 RECT rc = PictureRect(x,y,len,hv);
 SendMessage(wnd, DRAWVECTOR, (PARAM) &rc, 0);
}

void DrawBox(WINDOW wnd, int x, int y, int ht, int wd)
{
 RECT rc;
 rc.lf = x;
 rc.tp = y;
 rc.rt = x+wd-1;
 rc.bt = y+ht-1;
 SendMessage(wnd, DRAWBOX, (PARAM) &rc, 0);
}

void DrawBar(WINDOW wnd,enum VectTypes vt,int x,int y,int len,int hv)
{
 RECT rc = PictureRect(x,y,len,hv);
 SendMessage(wnd, DRAWBAR, (PARAM) &rc, (PARAM) vt);
}







[LISTING FOUR]

/* ------------- calendar.c ------------- */
#include "dflat.h"

#ifndef TURBOC

#define CALHEIGHT 17
#define CALWIDTH 33

static int DyMo[] = {31,28,31,30,31,30,31,31,30,31,30,31};
static struct tm ttm;
static int dys[42];
static WINDOW Cwnd;

static void FixDate(void)
{
 /* ---- adjust Feb for leap year ---- */
 DyMo[1] = (ttm.tm_year % 4) ? 28 : 29;
#ifndef BCPP
 /* bug in the Borland C++ mktime function prohibits this */
 ttm.tm_mday = min(ttm.tm_mday, DyMo[ttm.tm_mon]);
#endif
}

/* ---- build calendar dates array ---- */
static void BuildDateArray(void)
{
 int offset, dy = 0;
 memset(dys, 0, sizeof dys);
 FixDate();
 /* ----- compute the weekday for the 1st ----- */
 offset = ((ttm.tm_mday-1) - ttm.tm_wday) % 7;
 if (offset < 0)
 offset += 7;
 if (offset)
 offset = (offset - 7) * -1;
 /* ----- build the dates into the array ---- */
 for (dy = 1; dy <= DyMo[ttm.tm_mon]; dy++)
 dys[offset++] = dy;
}
static void CreateWindowMsg(WINDOW wnd)
{
 int x, y;
 DrawBox(wnd, 1, 2, CALHEIGHT-4, CALWIDTH-4);
 for (x = 5; x < CALWIDTH-4; x += 4)
 DrawVector(wnd, x, 2, CALHEIGHT-4, FALSE);
 for (y = 4; y < CALHEIGHT-3; y+=2)
 DrawVector(wnd, 1, y, CALWIDTH-4, TRUE);
}
static void DisplayDates(WINDOW wnd)
{
 int week, day;
 char dyln[10];
 int offset;
 char banner[CALWIDTH-1];
 char banner1[30];

 SetStandardColor(wnd);

 PutWindowLine(wnd, "Sun Mon Tue Wed Thu Fri Sat", 2, 1);
 memset(banner, ' ', CALWIDTH-2);
 strftime(banner1, 16, "%B, %Y", &ttm);
 offset = (CALWIDTH-2 - strlen(banner1)) / 2;
 strcpy(banner+offset, banner1);
 strcat(banner, " ");
 PutWindowLine(wnd, banner, 0, 0);
 BuildDateArray();
 for (week = 0; week < 6; week++) {
 for (day = 0; day < 7; day++) {
 int dy = dys[week*7+day];
 if (dy == 0)
 strcpy(dyln, " ");
 else {
 if (dy == ttm.tm_mday)
 sprintf(dyln, "%c%c%c%2d %c",
 CHANGECOLOR,
 SelectForeground(wnd)+0x80,
 SelectBackground(wnd)+0x80,
 dy, RESETCOLOR);
 else
 sprintf(dyln, "%2d ", dy);
 }
 SetStandardColor(wnd);
 PutWindowLine(wnd, dyln, 2 + day * 4, 3 + week*2);
 }
 }
}
static int KeyboardMsg(WINDOW wnd, PARAM p1)
{
 switch ((int)p1) {
 case PGUP:
 if (ttm.tm_mon == 0) {
 ttm.tm_mon = 12;
 ttm.tm_year--;
 }
 ttm.tm_mon--;
 FixDate();
 mktime(&ttm);
 DisplayDates(wnd);
 return TRUE;
 case PGDN:
 ttm.tm_mon++;
 if (ttm.tm_mon == 12) {
 ttm.tm_mon = 0;
 ttm.tm_year++;
 }
 FixDate();
 mktime(&ttm);
 DisplayDates(wnd);
 return TRUE;
 default:
 break;
 }
 return FALSE;
}
static int CalendarProc(WINDOW wnd,MESSAGE msg,PARAM p1,PARAM p2)
{
 switch (msg) {

 case CREATE_WINDOW:
 DefaultWndProc(wnd, msg, p1, p2);
 CreateWindowMsg(wnd);
 return TRUE;
 case KEYBOARD:
 if (KeyboardMsg(wnd, p1))
 return TRUE;
 break;
 case PAINT:
 DefaultWndProc(wnd, msg, p1, p2);
 DisplayDates(wnd);
 return TRUE;
 case COMMAND:
 if ((int)p1 == ID_HELP) {
 DisplayHelp(wnd, "Calendar");
 return TRUE;
 }
 break;
 case CLOSE_WINDOW:
 Cwnd = NULL;
 break;
 default:
 break;
 }
 return DefaultWndProc(wnd, msg, p1, p2);
}
void Calendar(WINDOW pwnd)
{
 if (Cwnd == NULL) {
 time_t tim = time(NULL);
 ttm = *localtime(&tim);
 Cwnd = CreateWindow(PICTUREBOX,
 "Calendar",
 -1, -1, CALHEIGHT, CALWIDTH,
 NULL, pwnd, CalendarProc,
 SHADOW 
 MINMAXBOX 
 CONTROLBOX 
 MOVEABLE 
 HASBORDER
 );
 }
 SendMessage(Cwnd, SETFOCUS, TRUE, 0);
}

#endif






[LISTING FIVE]

/* ------------ barchart.c ----------- */
#include "dflat.h"

#define BCHEIGHT 12
#define BCWIDTH 44

#define COLWIDTH 4

static WINDOW Bwnd;

/* ------- project schedule array ------- */
static struct ProjChart {
 char *prj;
 int start, stop;
} projs[] = {
 {"Center St", 0,3},
 {"City Hall", 0,5},
 {"Rt 395 ", 1,4},
 {"Sky Condo", 2,3},
 {"Out Hs ", 0,4},
 {"Bk Palace", 1,5}
};

static char *Title = " PROJECT SCHEDULE";
static char *Months = " Jan Feb Mar Apr May Jun";

static int BarChartProc(WINDOW wnd, MESSAGE msg,PARAM p1, PARAM p2)
{
 switch (msg) {
 case COMMAND:
 if ((int)p1 == ID_HELP) {
 DisplayHelp(wnd, "BarChart");
 return TRUE;
 }
 break;
 case CLOSE_WINDOW:
 Bwnd = NULL;
 break;
 default:
 break;
 }
 return DefaultWndProc(wnd, msg, p1, p2);
}
void BarChart(WINDOW pwnd)
{
 int pct = sizeof projs / sizeof(struct ProjChart);
 int i;

 if (Bwnd == NULL) {
 Bwnd = CreateWindow(PICTUREBOX,
 "BarChart",
 -1, -1, BCHEIGHT, BCWIDTH,
 NULL, pwnd, BarChartProc,
 SHADOW 
 CONTROLBOX 
 MOVEABLE 
 HASBORDER
 );
 SendMessage(Bwnd, ADDTEXT, (PARAM) Title, 0);
 SendMessage(Bwnd, ADDTEXT, (PARAM) "", 0);
 for (i = 0; i < pct; i++) {
 SendMessage(Bwnd,ADDTEXT,(PARAM)projs[i].prj,0);
 DrawBar(Bwnd, SOLIDBAR+(i%4),
 11+projs[i].start*COLWIDTH, 2+i,
 (1 + projs[i].stop-projs[i].start) * COLWIDTH,

 TRUE);
 }
 SendMessage(Bwnd, ADDTEXT, (PARAM) "", 0);
 SendMessage(Bwnd, ADDTEXT, (PARAM) Months, 0);
 DrawBox(Bwnd, 10, 1, pct+2, 25);
 }
 SendMessage(Bwnd, SETFOCUS, TRUE, 0);
}






















































February, 1992
STRUCTURED PROGRAMMING


Stuck Windows




Jeff Duntemann KG7JF


My first house was one of those "Polish battleships," as we called them; a
skinny little boxcar of a place on a lot measuring 30 feet by 180 feet, in the
thick of the north side of Chicago. The guy next door (whose bedroom window
was probably eight feet from ours) would set his clock radio to go off at 5:45
AM, playing early punk rock loud enough to wake a patronage worker, let alone
the dead. (The dead had long since moved to Hoffman Estates, from which they
returned only on election days.)
It was, in short, your classic American "starter home," meaning that almost
every day, one damned thing or another about it was starting up.
Our first house, like most old houses, had openable windows and nonopenable
windows. The nonopenable windows were exactly like the openable windows,
except that they had 47 coats of dark green paint on them, and had not been
opened since 1934.
The north kitchen window was an exception, in that it looked like it might
have been opened as recently as 1958. I did the usual scraping and tugging and
pulling and razorblading to no avail. And that's where it sat for quite a
while, and at best I would pick at the cracks from time to time, furious at my
inability to get a better view of the wall of the next house over.
My friend George Ewing was visiting one spring weekend, and he spent some time
over coffee watching me thump and shove and fiddle with the just-barely-stuck
kitchen window. "What's the problem, Jeff?" he finally asked.
"Anything with handles oughta open," was my disgruntled response. He shrugged,
shoved me aside, and grabbed one of the window's handles in each hand. George
outmasses me by a considerable fraction, and has the sort of hands you would
imagine could crush rocks. He sucked in his breath and heaved upward, hard.
The window popped and crackled and resisted, and then with a crumbling crunch
both handles came off in his oversized hands. The window hadn't budged a
fraction of an inch.
"I guess it doesn't have handles anymore," I said, and went looking for the
Plastic Wood.


Opening Windows


Regardless of what happens under the surface to manage user input events, most
people consider Turbo Vision to be a text-based windowing manager. The action
in a Turbo Vision app happens in one or more windows. Behind the windows is
just the empty pattern of the desktop.
So it's time to talk about what it takes to open and use windows in Turbo
Vision. Again, I'll be using my HCALC mortgage calculator program as example
code. I've submitted a new and somewhat improved version of the program as a
whole to DDJ, and it can be downloaded from CompuServe and M&T Online. I won't
be listing the full program here, but I will list program segments that
illustrate the creation and use of Turbo Vision windows.
First, some overview and a little recap. A Turbo Vision window is a group--a
collection of views whose operation is coordinated by a "boss" object of the
TWindow type. The TWindow has no visible elements within itself; everything
you see of a TWindow actually belongs to one of the TWindow's views.
Typically, a window owns a frame, one or more interiors called panes, and
often one or more scroll bars. At minimum, a window needs a frame and a pane.
The frame is necessary because the top edge of the frame contains the two
buttons that zoom and close the window. The pane is necessary because all
Turbo Vision views are responsible for the screen space they enclose. A window
cannot be a frame around nothing. The window must be able to redraw itself
whenever it moves or changes size; hence it must have some sort of redrawable
interior.


Window Design


I've long quibbled with the wisdom of subsetting an already-tiny
25x80-character text screen into smaller entities. (My first project published
in DDJ, in fact, was a sort of "antiwindowing" system that treated the whole
screen as a window into a larger, 66x80 virtual screen.) Some of my
reservations remain, but in fact you can put the tininess of a screen window
to good use hiding application complexity if you put some forethought into
your window design. HCALC provides a good example. Its mortgage display window
actually exists in two levels. Figure 1 shows the first level, which you see
as soon as you instantiate a new mortgage by selecting the initial values of
principal, interest, and periods from the dialog box. All you see are the
three most important elements of the mortgage amortization table with a
mortgage summary at the top of the window.
You can instantiate several mortgage windows with varying initial values and
have them all on the screen and visible at once. This allows you to do some
real-time comparisons of payments when you're trying to decide what sort of
mortgage you can afford.
There is, however, an additional level a mortgage window can display. If you
click on the zoom button of a mortgage window, you'll see the full-screen
display shown in Figure 2. Here there are three additional columns: one for
additional principal values, and two more that summarize the cumulative totals
of principal and interest that hold for any given mortgage payment. This
allows more detailed analysis of how many payments you can lop off the end of
a mortgage by remitting extra principal during the mortgage's course--and
reminds you how much money goes down the interest rathole compared to
principal. (HCALC has been a wonderful goad toward paying off our mortgage
early!)
The interior layout of the window was designed to display only the essential
elements of a mortgage amortization table when in a "normal" (that is,
non-zoomed) state, and only display the full information table when the window
is zoomed. This was done mostly by choosing an initial window size to place
the three extra columns outside the right window margin.
So work smart when you put your windows together. Windows are windows
primarily so that you can display more than one on the screen at once. Think
through the question of what use it might be to display multiple windows at
once--and design the display of information within the window to support
whatever comparison would be useful.


Defining Windows


Putting Turbo Vision windows together is complicated by the fact that you
don't define everything within a single object definition. You have to define
a TWindow descendent to "be" the window--but you have to define one or more
interior objects separately and then use the Insert method to insert them into
the window object. (Windows, remember, are groups, and you have to insert a
window's views into the window.)
Listing One (page 145) contains the definitions for three classes:
TMortgageView, the mortgage window itself (a group) and the two panes:
TMortgageTopInterior and TMortgageBottomInterior, both of which are views.
Note that there is nothing explicit in the TMortgageView definition to connect
it to either of the pane objects. That connection is done strictly at runtime,
through the Insert method.
This, however, causes a problem. Both of the panes need to display information
contained in the mortgage class, TMortgage, present in the TMortgageView
object that owns the panes. The panes do not descend from TMortgageView, so
they do not have access to TMortgageView's fields. It would be wasteful to
give each pane its own mortgage object, so what we do is provide a pointer in
each pane to point to the mortgage object contained in TMortgageView. When the
mortgage window is instantiated, the two Mortgage pointer's in the two panes
must be set to point to the Mortgage field in TMortgageView. This is fast and
memory efficient, but lordy, something about it still makes me wince.
The lesson learned here is something to put in your notebook: The fields of a
group object are not automatically available to the objects owned by the
group!


Drawing Windows


At minimum, the panes of a window need to have two methods: a constructor,
which sets up the pane, and a Draw method, which draws the pane to the screen
on demand. A destructor is optional; the default destructor is inherited from
the pane's parent class and for simple panes serves quite well.
The Draw method of a view is called whenever the view changes size or moves.
It must draw every portion of the rectangle occupied by the view. Generally,
you the programmer don't have to call a view's Draw method directly. Turbo
Vision knows when a view needs to be redrawn, and it will call Draw for
you--assuming you set the Draw method up correctly to begin with.
The TMortgageTopInterior.Draw method is fairly simple, and is a good example
of how a view must draw the space it owns. Note that the drawing is not done
with Write or Writeln! What you have to do is declare a buffer (of type
TDrawBuffer), fill the buffer with text using the MoveStr library routine, and
then use one of TView's methods, WriteLine, to actually write the buffer to
the screen.
In between uses, you must clear the buffer variable to spaces using the
MoveChar routine. If you wish, you can use some other character (one of the
"halftone" characters, perhaps) to write text against a slightly fancier
background.
Again, this whole process of writing text to a view's interior seems
needlessly prolix to me, and I would like to see it simplified in some future
adaptation of Turbo Vision.



Scrollers and Scroll Bars


The bottom pane is a little different and a little more complex. The top pane
only serves to summarize the mortgage and (as a side benefit) provides some
column headers for the mortgage amortization table. The bottom pane does
something a lot tougher, but much more characteristic of a window: It has to
display some subset of a much larger block of data, not all of which can fit
in the window at once.
Turbo Vision contains most of the machinery to do this, in the form of a
special TView descendent class called TScroller. Generically, we'll call
objects of type TScroller simply "scrollers."
Functionally, scrollers are windows into a larger block of text. A scroller
scrolls through the larger block of text in either X (across) or Y (up and
down) or both, as needed. It's possible to scroll in only one dimension if the
data fits entirely within the scroller in the other dimension.
In order to work, you have to pass one or two scroll bars to the scroller when
you call its constructor. Turbo Vision scroll bars are "finished" views, and
you don't need to specialize them by subclassing them. In fact, if your scroll
bar is typical and extends across one whole side of the window (the right side
for a vertical scroll bar and the bottom for a horizontal scroll bar) you can
automatically set the sizing parameters to the scroll bars by calling a Turbo
Vision library routine called StandardScrollBar, as I've done in
TMortgageView.Init.
But note that I've only called StandardScrollBar for the horizontal scroll
bar. The vertical scroll bar isn't quite typical, in that it only embraces the
bottom pane of the mortgage window, and not the full vertical height of the
window. To play tricks like that you have to set the scroll bar's Origin and
Size records explicitly. I've done this for the vertical scroll bar in
TMortgageView.Init, as you can see in Listing Two (page 145).


Using Scroll Bars


Scroll bars are one of the nifty-neato aspects of Turbo Vision, and they
definitely illustrate the advantages of event-driven programming. The scroll
bars are highly independent and self-contained and don't require a lot of
fooling-with. The user sets the state of a scroll bar by pushing the slider
character with the mouse, or else using the arrow keys in the keypad. (PgUp
and PgDn also work with the vertical scroll bar.) All of that is done beneath
the surface, below the level of your application. All you have to do is build
the state of the scroll bar into your algorithm for redrawing the pane.
When you initialize a scroller with scroll bars, you must call
TScroller.SetLimit, which sets the maximum value of "travel" that each scroll
bar may work through. (This is done in HCALC near the bottom of
TMortgageView.Init.) In HCALC, the maximum horizontal travel is 80 (the width
of the mortgage amortization table) and the maximum vertical travel is the
number of periods in the mortgage being displayed by the window. The default
mortgage is the 30-year mortgage, which has 360 entries; hence the default
maximum travel in Y is 360. These maximum travel values are set in the
scroller's Limit record by SetLimit.
The state of the scroll bars is read through a single record called Delta,
contained in the TScroller object that owns the scroll bars. The value of
Delta.X is proportional to the position of the slider character within the
scroll bar (running from 0 to Limit.X), as it exists at the time you read
Delta.X. (Turbo Vision updates the values in Delta automatically.) Similarly,
the value of Delta.Y is proportional to the position of the slider character
within the scroll bar, running from 0 to Limit.Y. For example, if Limit.Y is
360 (the default), Delta.Y will be at 180 when the vertical scroll bar's
slider character is exactly halfway through its travel.
To make the contents of a scroller reflect the position of its scroll bars,
you have to take the Delta values into account in your Draw method. Read
TMortgageBottomInterior.Draw carefully, and you'll see how it's done. Vertical
scrolling is handled by using Delta.Y as part of the index value that selects
which elements of the mortgage amortization table array are displayed to the
screen. Horizontal scrolling is even easier: You simply use Delta.X to select
the starting point within the buffer that you display to the screen:
WriteLine(O, YRun, Size.X,1,B[Delta.X]); where B contains the string data that
must be displayed to the screen. B contains the full width of the amortization
table--all 80 characters' worth. If Delta.X is 0, you start the display at the
beginning of the buffer. If Delta.X is 40, you start the display with the
middle of the buffer--and hence half the horizontal display will appear to
have scrolled out of the window to the left.


Growing Panes


Take a look at the two constructors for the top and bottom panes of the
mortgage window. You'll see a couple of low-key statements that make a lot of
difference in how a window looks and acts. Look first at
MortgageBottomInterior.Init. Note the statement that sets the "grow mode":
GrowMode: = gfGrowHiX + gfGrowHiY;. The two gf constants dictate which way the
window can expand and contract. This statement says that the window can be
expanded in both X and Y by tugging on the lower-right corner. The bottom pane
is a scroller, and is really a window onto something larger than the window
can initially display in both dimensions (that is, the mortgage amortization
table, which is typically 360 lines long and has some "extra" display fields
normally hidden beyond the right margin).
Now look at MortgageTopInterior.Init. It sets its grow mode like
this:GrowMode:= gfGrowHiX;. This statement allows the top pane of the mortgage
window to grow in X (that is, to the right) only. You cannot increase or
decrease the vertical height of the top pane. This makes sense--the top pane
really just summarizes the mortgage, and as a side-service provides column
headers for the display of the amortization table in the lower pane. There's
nothing more to see in the Y dimension that isn't already shown--so there's no
need to grow it. Furthermore, closing up the top pane in Y wouldn't gain you
more than a few lines, and would greatly obscure the meaning of the window as
a whole. Therefore, you can't shrink the pane, either.
Because the top pane contains some additional header information to the right
of the right margin, it makes sense to grow the top pane in X, hence the
gfGrowHiX constant.
I recommend messing around with the GrowMode statements in both panes. Allow
the top pane to grow in both X and Y and see what happens when you pull on the
corner of the window. Comment out the GrowMode statement in the top pane's
constructor and see what happens when you try to grow the top pane in X.


Framed!


Although we usually only think of the window as a whole being framed,
individual panes can, in fact, have frames. The bottom pane of the mortgage
window has one, and its frame is the source of the line that divides the top
from the bottom pane. If you comment out the following line in
MortgageBottomInterior.Init, the frame will vanish: Options:= Options OR
Framed;. It's a cosmetic touch, true, but it adds a lot to the readability of
the data in the window. If you add a frame to the top pane, nothing much will
happen. However, if you give the top pane a frame (by ORing the Framed
constant with its Options value) and then comment out the top pane's GrowMode
statement, you'll see the right edge of the top pane's frame when you try to
grow the window in X. Furthermore, you won't see the additional column titles
to the right of the frame. The frame marks the edge of the pane--but you've
grown the window in X, beyond the now-fixed extent of the pane.


Don't Paint 'Em Shut!


That's the quick tour of creating and using windows under Turbo Vision. The
subject is a lot deeper than that, but much of the rest is refinement. If you
fully grasp the material I've covered in this column and in HCALC.PAS, you
should be able to open some reasonably workable (if not excessively fancy)
windows. The key point is understanding what you're doing, and what's going on
inside Turbo Vision. Running blind and lifting Turbo Vision code from other
people without knowing how it works is the software equivalent of painting
your windows shut. You're fine until you want to change the view, as it
were--and then, like me back on Campbell Street in Chicago, you'll just
be...stuck.
And if it seemed like a hairy business, well, shove your cowboy hats on tight,
buckaroos, because next time we have to rope and hawgtie Turbo Vision streams.
Yippee-I/O-ki-yikes!


Q & A


Whew. Another Halloween. Three years here in DDJ, revelling in this fine
madness. It seems like maybe an hour and a half, even considering that that
was two houses, a state, an earthquake, the crumbling of the Communist Threat,
and two major releases of Turbo Pascal ago.
I want to thank you for the mail. You've taught me a lot, especially about
Zeller's Congruence and the freakiness of the PC serial port. And in closing
out another year (with no end in sight) of "Structured Programming," I'd like
to share a handful of the comments I've received that have had nothing
whatsoever to do with programming at all. For example, someone asked if am I
consciously imitating Dave Barry. The answer is yes. (I am not making this
up!)
Or: Why is the Magic Van magic? Easy: Because it's gone over 90,000 miles and
I still have it.
Mr. Byte remains a popular topic: "The next time Mr. Byte has puppies, we'd be
interested." So would Ripley. But hey, I know what you mean. Trouble is, Mr.
Byte is a factory second (pink nose, horrors!) and he had to surrender the
family jewels before we could get clear title to him. Check the Bichon Frise
section of the breeder directory in Dog World. Don't buy puppies from pet
stores!
Not everybody wants puppies: "Stick to business. I don't care what Mr. Byte
urinates on."
Ahhh. You must have cats.


_STRUCTURED PROGRAMMING COLUMN_
by Jeff Duntemann


[LISTING ONE]

{---------------------------------}

{ METHODS: TMortgageTopInterior }
{---------------------------------}

CONSTRUCTOR TMortgageTopInterior.Init(VAR Bounds : TRect);

BEGIN
 TView.Init(Bounds); { Call ancestor's constructor }
 GrowMode := gfGrowHiX; { Permits pane to grow in X but not Y }
END;

PROCEDURE TMortgageTopInterior.Draw;
VAR
 YRun : Integer;
 Color : Byte;
 B : TDrawBuffer;
 STemp : String[20];
BEGIN
 Color := GetColor(1);
 MoveChar(B,' ',Color,Size.X); { Clear the buffer to spaces }
 MoveStr(B,' Principal Interest Periods',Color);
 WriteLine(0,0,Size.X,1,B);

 MoveChar(B,' ',Color,Size.X); { Clear the buffer to spaces }
 { Here we convert payment data to strings for display: }
 Str(Mortgage^.Principal:7:2,STemp);
 MoveStr(B[2],STemp,Color); { At beginning of buffer B }
 Str(Mortgage^.Interest*100:7:2,STemp);
 MoveStr(B[14],STemp,Color); { At position 14 of buffer B }
 Str(Mortgage^.Periods:4,STemp);
 MoveStr(B[27],STemp,Color); { At position 27 of buffer B }
 WriteLine(0,1,Size.X,1,B);

 MoveChar(B,' ',Color,Size.X); { Clear the buffer to spaces }
 MoveStr(B,
 ' Extra Principal Interest',
 Color);
 WriteLine(0,2,Size.X,1,B);

 MoveChar(B,' ',Color,Size.X); { Clear the buffer to spaces }
 MoveStr(B,
 'Paymt # Prin. Int. Balance Principal So far So far ',
 Color);
 WriteLine(0,3,Size.X,1,B);

END;

{------------------------------------}
{ METHODS: TMortgageBottomInterior }
{------------------------------------}

CONSTRUCTOR TMortgageBottomInterior.Init(VAR Bounds : TRect;
 AHScrollBar, AVScrollBar : PScrollBar);
BEGIN
 { Call ancestor's constructor: }
 TScroller.Init(Bounds,AHScrollBar,AVScrollBar);
 GrowMode := gfGrowHiX + gfGrowHiY;
 Options := Options OR ofFramed;
END;


PROCEDURE TMortgageBottomInterior.Draw;
VAR
 Color : Byte;
 B : TDrawBuffer;
 YRun : Integer;
 STemp : String[20];
BEGIN
 Color := GetColor(1);
 FOR YRun := 0 TO Size.Y-1 DO
 BEGIN
 MoveChar(B,' ',Color,80); { Clear the buffer to spaces }
 Str(Delta.Y+YRun+1:4,STemp);
 MoveStr(B,STemp+':',Color); { At beginning of buffer B }
 { Here we convert payment data to strings for display: }
 Str(Mortgage^.Payments^[Delta.Y+YRun+1].PayPrincipal:7:2,STemp);
 MoveStr(B[6],STemp,Color); { At beginning of buffer B }
 Str(Mortgage^.Payments^[Delta.Y+YRun+1].PayInterest:7:2,STemp);
 MoveStr(B[15],STemp,Color); { At position 15 of buffer B }
 Str(Mortgage^.Payments^[Delta.Y+YRun+1].Balance:10:2,STemp);
 MoveStr(B[24],STemp,Color); { At position 24 of buffer B }
 { There isn't an extra principal value for every payment, so }
 { display the value only if it is nonzero: }
 STemp := '';
 IF Mortgage^.Payments^[Delta.Y+YRun+1].ExtraPrincipal > 0
 THEN
 Str(Mortgage^.Payments^[Delta.Y+YRun+1].ExtraPrincipal:10:2,STemp);
 MoveStr(B[37],STemp,Color); { At position 37 of buffer B }
 Str(Mortgage^.Payments^[Delta.Y+YRun+1].PrincipalSoFar:10:2,STemp);
 MoveStr(B[50],STemp,Color); { At position 50 of buffer B }
 Str(Mortgage^.Payments^[Delta.Y+YRun+1].InterestSoFar:10:2,STemp);
 MoveStr(B[64],STemp,Color); { At position 64 of buffer B }
 { Here we write the line to the window, taking into account the }
 { state of the X scroll bar: }
 WriteLine(0,YRun,Size.X,1,B[Delta.X]);
 END;
END;

{------------------------------}
{ METHODS: TMortgageView }
{------------------------------}

CONSTRUCTOR TMortgageView.Init(VAR Bounds : TRect;
 ATitle : TTitleStr;
 ANumber : Integer;
 InitMortgageData :
 MortgageDialogData);
VAR
 TopInterior : PMortgageTopInterior;
 BottomInterior : PMortgageBottomInterior;
 HScrollBar,VScrollBar : PScrollBar;
 R,S : TRect;
BEGIN
 TWindow.Init(Bounds,ATitle,ANumber); { Call ancestor's constructor }
 { Call the Mortgage object's constructor using dialog data: }
 WITH InitMortgageData DO
 Mortgage.Init(PrincipalData,
 InterestData / 100,
 PeriodsData,
 12);

 { Here we set up a window with *two* interiors, one scrollable, one }
 { static. It's all in the way that you define the bounds, mostly: }
 GetClipRect(Bounds); { Get bounds for interior of view }
 Bounds.Grow(-1,-1); { Shrink those bounds by 1 for both X & Y }

 { Define a rectangle to embrace the upper of the two interiors: }
 R.Assign(Bounds.A.X,Bounds.A.Y,Bounds.B.X,Bounds.A.Y+4);
 TopInterior := New(PMortgageTopInterior,Init(R));
 TopInterior^.Mortgage := @Mortgage;
 Insert(TopInterior);

 { Define a rectangle to embrace the lower of two interiors: }
 R.Assign(Bounds.A.X,Bounds.A.Y+5,Bounds.B.X,Bounds.B.Y);

 { Create scroll bars for both mouse & keyboard input: }
 VScrollBar := StandardScrollBar(sbVertical + sbHandleKeyboard);
 { We have to adjust vertical bar to fit bottom interior: }
 VScrollBar^.Origin.Y := R.A.Y; { Adjust top Y value }
 VScrollBar^.Size.Y := R.B.Y - R.A.Y; { Adjust size }
 { The horizontal scroll bar, on the other hand, is standard: }
 HScrollBar := StandardScrollBar(sbHorizontal + sbHandleKeyboard);

 { Create bottom interior object with scroll bars: }
 BottomInterior :=
 New(PMortgageBottomInterior,Init(R,HScrollBar,VScrollBar));
 { Make copy of pointer to mortgage object: }
 BottomInterior^.Mortgage := @Mortgage;
 { Set the limits for the scroll bars: }
 BottomInterior^.SetLimit(80,InitMortgageData.PeriodsData);
 { Insert the interior into the window: }
 Insert(BottomInterior);
END;




[LISTING TWO]

 PMortgageTopInterior = ^TMortgageTopInterior;
 TMortgageTopInterior =
 OBJECT(TView)
 Mortgage : PMortgage;
 CONSTRUCTOR Init(VAR Bounds : TRect);
 PROCEDURE Draw; VIRTUAL;
 END;

 PMortgageBottomInterior = ^TMortgageBottomInterior;
 TMortgageBottomInterior =
 OBJECT(TScroller)
 { Points to Mortgage object owned by TMortgageView }
 Mortgage : PMortgage;
 CONSTRUCTOR Init(VAR Bounds : TRect;
 AHScrollBar, AVScrollbar : PScrollBar);
 PROCEDURE Draw; VIRTUAL;
 END;

 PMortgageView = ^TMortgageView;
 TMortgageView =
 OBJECT(TWindow)

 Mortgage : TMortgage;
 CONSTRUCTOR Init(VAR Bounds : TRect;
 ATitle : TTitleStr;
 ANumber : Integer;
 InitMortgageData :
 MortgageDialogData);
 PROCEDURE HandleEvent(Var Event : TEvent); VIRTUAL;
 PROCEDURE ExtraPrincipal;
 PROCEDURE PrintSummary;
 DESTRUCTOR Done; VIRTUAL;
 END;




[LISTING THREE]

PROGRAM HCalc; { By Jeff Duntemann; Update of 10/31/91 }
 { Requires Turbo Pascal 6.0! }

USES App,Dialogs,Objects,Views,Menus,Drivers,
 FInput, { By Allen Bauer; on CompuServe BPROGA }
 Mortgage; { By Jeff Duntemann; from DDJ 10/91 }

CONST
 cmNewMortgage = 199;
 cmExtraPrin = 198;
 cmCloseAll = 197;
 cmCloseBC = 196;
 cmPrintSummary = 195;
 WindowCount : Integer = 0;

TYPE
 MortgageDialogData =
 RECORD
 PrincipalData : Real;
 InterestData : Real;
 PeriodsData : Integer;
 END;

 ExtraPrincipalDialogData =
 RECORD
 PaymentNumber : Integer;
 ExtraDollars : Real;
 END;

 THouseCalcApp =
 OBJECT(TApplication)
 InitDialog : PDialog; { Dialog for initializing a mortgage }
 ExtraDialog : PDialog; { Dialog for entering extra principal }
 CONSTRUCTOR Init;
 PROCEDURE InitMenuBar; VIRTUAL;
 PROCEDURE CloseAll;
 PROCEDURE HandleEvent(VAR Event : TEvent); VIRTUAL;
 PROCEDURE NewMortgage;
 END;

 PMortgageTopInterior = ^TMortgageTopInterior;
 TMortgageTopInterior =

 OBJECT(TView)
 Mortgage : PMortgage;
 CONSTRUCTOR Init(VAR Bounds : TRect);
 PROCEDURE Draw; VIRTUAL;
 END;


 PMortgageBottomInterior = ^TMortgageBottomInterior;
 TMortgageBottomInterior =
 OBJECT(TScroller)
 { Points to Mortgage object owned by TMortgageView }
 Mortgage : PMortgage;
 CONSTRUCTOR Init(VAR Bounds : TRect;
 AHScrollBar, AVScrollbar : PScrollBar);
 PROCEDURE Draw; VIRTUAL;
 END;

 PMortgageView = ^TMortgageView;
 TMortgageView =
 OBJECT(TWindow)
 Mortgage : TMortgage;
 CONSTRUCTOR Init(VAR Bounds : TRect;
 ATitle : TTitleStr;
 ANumber : Integer;
 InitMortgageData :
 MortgageDialogData);
 PROCEDURE HandleEvent(Var Event : TEvent); VIRTUAL;
 PROCEDURE ExtraPrincipal;
 PROCEDURE PrintSummary;
 DESTRUCTOR Done; VIRTUAL;
 END;


CONST
 DefaultMortgageData : MortgageDialogData =
 (PrincipalData : 100000;
 InterestData : 10.0;
 PeriodsData : 360);


VAR
 HouseCalc : THouseCalcApp; { This is the application object itself }



{------------------------------}
{ METHODS: THouseCalcApp }
{------------------------------}


CONSTRUCTOR THouseCalcApp.Init;

VAR
 R : TRect;
 aView : PView;

BEGIN
 TApplication.Init; { Always call the parent's constructor first! }


 { Create the dialog for initializing a mortgage: }
 R.Assign(20,5,60,16);
 InitDialog := New(PDialog,Init(R,'Define Mortgage Parameters'));
 WITH InitDialog^ DO
 BEGIN
 { First item in the dialog box is input line for principal: }
 R.Assign(3,3,13,4);
 aView := New(PFInputLine,Init(R,8,DRealSet,DReal,0));
 Insert(aView);
 R.Assign(2,2,12,3);
 Insert(New(PLabel,Init(R,'Principal',aView)));

 { Next is the input line for interest rate: }
 R.Assign(17,3,26,4);
 aView := New(PFInputLine,Init(R,6,DRealSet,DReal,3));
 Insert(aView);
 R.Assign(16,2,25,3);
 Insert(New(PLabel,Init(R,'Interest',aView)));
 R.Assign(26,3,27,4); { Add a static text "%" sign }
 Insert(New(PStaticText,Init(R,'%')));

 { Up next is the input line for number of periods: }
 R.Assign(31,3,36,4);
 aView := New(PFInputLine,Init(R,3,DUnsignedSet,DInteger,0));
 Insert(aView);
 R.Assign(29,2,37,3);
 Insert(New(PLabel,Init(R,'Periods',aView)));

 { These are standard buttons for the OK and Cancel commands: }
 R.Assign(8,8,16,10);
 Insert(New(PButton,Init(R,'~O~K',cmOK,bfDefault)));
 R.Assign(22,8,32,10);
 Insert(New(PButton,Init(R,'Cancel',cmCancel,bfNormal)));
 END;

 { Create the dialog for adding additional principal to a payment: }
 R.Assign(20,5,60,16);
 ExtraDialog := New(PDialog,Init(R,'Apply Extra Principal to Mortgage'));
 WITH ExtraDialog^ DO
 BEGIN
 { First item in the dialog is the payment number to which }
 { we're going to apply the extra principal: }
 R.Assign(9,3,18,4);
 aView := New(PFInputLine,Init(R,6,DUnsignedSet,DInteger,0));
 Insert(aView);
 R.Assign(3,2,12,3);
 Insert(New(PLabel,Init(R,'Payment #',aView)));

 { Next item in the dialog box is input line for extra principal: }
 R.Assign(23,3,33,4);
 aView := New(PFInputLine,Init(R,8,DRealSet,DReal,2));
 Insert(aView);
 R.Assign(20,2,35,3);
 Insert(New(PLabel,Init(R,'Extra Principal',aView)));

 { These are standard buttons for the OK and Cancel commands: }
 R.Assign(8,8,16,10);
 Insert(New(PButton,Init(R,'~O~K',cmOK,bfDefault)));
 R.Assign(22,8,32,10);

 Insert(New(PButton,Init(R,'Cancel',cmCancel,bfNormal)));
 END;

END;


{ This method sends out a broadcast message to all views. Only the
{ mortgage windows know how to respond to it, so when cmCloseBC is
{ issued, only the mortgage windows react--by closing. }

PROCEDURE THouseCalcApp.CloseAll;

VAR
 Who : Pointer;

BEGIN
 Who := Message(Desktop,evBroadcast,cmCloseBC,@Self);
END;


PROCEDURE THouseCalcApp.HandleEvent(VAR Event : TEvent);

BEGIN
 TApplication.HandleEvent(Event);
 IF Event.What = evCommand THEN
 BEGIN
 CASE Event.Command OF
 cmNewMortgage : NewMortgage;
 cmCloseAll : CloseAll;
 ELSE
 Exit;
 END; { CASE }
 ClearEvent(Event);
 END;
END;


PROCEDURE THouseCalcApp.NewMortgage;

VAR
 Code : Integer;
 R : TRect;
 Control : Word;
 ThisMortgage : PMortgageView;
 InitMortgageData : MortgageDialogData;

BEGIN
 { First we need a dialog to get the intial mortgage values from }
 { the user. The dialog appears *before* the mortgage window! }
 WITH InitMortgageData DO
 BEGIN
 PrincipalData := 100000;
 InterestData := 10.0;
 PeriodsData := 360;
 END;
 InitDialog^.SetData(InitMortgageData);
 Control := Desktop^.ExecView(InitDialog);
 IF Control <> cmCancel THEN { Create a new mortgage object: }
 BEGIN

 R.Assign(5,5,45,20);
 Inc(WindowCount);
 { Get data from the initial mortgage dialog: }
 InitDialog^.GetData(InitMortgageData);
 { Call the constructor for the mortgage window: }
 ThisMortgage :=
 New(PMortgageView,Init(R,'Mortgage',WindowCount,
 InitMortgageData));

 { Insert the mortgage window into the desktop: }
 Desktop^.Insert(ThisMortgage);
 END;
END;


PROCEDURE THouseCalcApp.InitMenuBar;

VAR
 R : TRect;

BEGIN
 GetExtent(R);
 R.B.Y := R.A.Y + 1; { Define 1-line menu bar }

 MenuBar := New(PMenuBar,Init(R,NewMenu(
 NewSubMenu('~M~ortgage',hcNoContext,NewMenu(
 NewItem('~N~ew','F6',kbF6,cmNewMortgage,hcNoContext,
 NewItem('~E~xtra Principal ','',0,cmExtraPrin,hcNoContext,
 NewItem('~C~lose all','F7',kbF7,cmCloseAll,hcNoContext,
 NewItem('E~x~it','Alt-X',kbAltX,cmQuit,hcNoContext,
 NIL))))),
 NIL)
 )));
END;


{---------------------------------}
{ METHODS: TMortgageTopInterior }
{---------------------------------}

CONSTRUCTOR TMortgageTopInterior.Init(VAR Bounds : TRect);

BEGIN
 TView.Init(Bounds); { Call ancestor's constructor }
 GrowMode := gfGrowHiX; { Permits pane to grow in X but not Y }
END;


PROCEDURE TMortgageTopInterior.Draw;

VAR
 YRun : Integer;
 Color : Byte;
 B : TDrawBuffer;
 STemp : String[20];

BEGIN
 Color := GetColor(1);
 MoveChar(B,' ',Color,Size.X); { Clear the buffer to spaces }

 MoveStr(B,' Principal Interest Periods',Color);
 WriteLine(0,0,Size.X,1,B);

 MoveChar(B,' ',Color,Size.X); { Clear the buffer to spaces }
 { Here we convert payment data to strings for display: }
 Str(Mortgage^.Principal:7:2,STemp);
 MoveStr(B[2],STemp,Color); { At beginning of buffer B }
 Str(Mortgage^.Interest*100:7:2,STemp);
 MoveStr(B[14],STemp,Color); { At position 14 of buffer B }
 Str(Mortgage^.Periods:4,STemp);
 MoveStr(B[27],STemp,Color); { At position 27 of buffer B }
 WriteLine(0,1,Size.X,1,B);

 MoveChar(B,' ',Color,Size.X); { Clear the buffer to spaces }
 MoveStr(B,
 ' Extra Principal Interest',
 Color);
 WriteLine(0,2,Size.X,1,B);

 MoveChar(B,' ',Color,Size.X); { Clear the buffer to spaces }
 MoveStr(B,
 'Paymt # Prin. Int. Balance Principal So far So far ',
 Color);
 WriteLine(0,3,Size.X,1,B);

END;


{------------------------------------}
{ METHODS: TMortgageBottomInterior }
{------------------------------------}

CONSTRUCTOR TMortgageBottomInterior.Init(VAR Bounds : TRect;
 AHScrollBar, AVScrollBar :
 PScrollBar);

BEGIN
 { Call ancestor's constructor: }
 TScroller.Init(Bounds,AHScrollBar,AVScrollBar);
 GrowMode := gfGrowHiX + gfGrowHiY;
 Options := Options OR ofFramed;
END;


PROCEDURE TMortgageBottomInterior.Draw;

VAR
 Color : Byte;
 B : TDrawBuffer;
 YRun : Integer;
 STemp : String[20];

BEGIN
 Color := GetColor(1);
 FOR YRun := 0 TO Size.Y-1 DO
 BEGIN
 MoveChar(B,' ',Color,80); { Clear the buffer to spaces }
 Str(Delta.Y+YRun+1:4,STemp);
 MoveStr(B,STemp+':',Color); { At beginning of buffer B }

 { Here we convert payment data to strings for display: }
 Str(Mortgage^.Payments^[Delta.Y+YRun+1].PayPrincipal:7:2,STemp);
 MoveStr(B[6],STemp,Color); { At beginning of buffer B }
 Str(Mortgage^.Payments^[Delta.Y+YRun+1].PayInterest:7:2,STemp);
 MoveStr(B[15],STemp,Color); { At position 15 of buffer B }
 Str(Mortgage^.Payments^[Delta.Y+YRun+1].Balance:10:2,STemp);
 MoveStr(B[24],STemp,Color); { At position 24 of buffer B }
 { There isn't an extra principal value for every payment, so }
 { display the value only if it is nonzero: }
 STemp := '';
 IF Mortgage^.Payments^[Delta.Y+YRun+1].ExtraPrincipal > 0
 THEN
 Str(Mortgage^.Payments^[Delta.Y+YRun+1].ExtraPrincipal:10:2,STemp);
 MoveStr(B[37],STemp,Color); { At position 37 of buffer B }
 Str(Mortgage^.Payments^[Delta.Y+YRun+1].PrincipalSoFar:10:2,STemp);
 MoveStr(B[50],STemp,Color); { At position 50 of buffer B }
 Str(Mortgage^.Payments^[Delta.Y+YRun+1].InterestSoFar:10:2,STemp);
 MoveStr(B[64],STemp,Color); { At position 64 of buffer B }
 { Here we write the line to the window, taking into account the }
 { state of the X scroll bar: }
 WriteLine(0,YRun,Size.X,1,B[Delta.X]);
 END;
END;


{------------------------------}
{ METHODS: TMortgageView }
{------------------------------}

CONSTRUCTOR TMortgageView.Init(VAR Bounds : TRect;
 ATitle : TTitleStr;
 ANumber : Integer;
 InitMortgageData :
 MortgageDialogData);
VAR
 TopInterior : PMortgageTopInterior;
 BottomInterior : PMortgageBottomInterior;
 HScrollBar,VScrollBar : PScrollBar;
 R,S : TRect;

BEGIN
 TWindow.Init(Bounds,ATitle,ANumber); { Call ancestor's constructor }
 { Call the Mortgage object's constructor using dialog data: }
 WITH InitMortgageData DO
 Mortgage.Init(PrincipalData,
 InterestData / 100,
 PeriodsData,
 12);

 { Here we set up a window with *two* interiors, one scrollable, one }
 { static. It's all in the way that you define the bounds, mostly: }
 GetClipRect(Bounds); { Get bounds for interior of view }
 Bounds.Grow(-1,-1); { Shrink those bounds by 1 for both X & Y }

 { Define a rectangle to embrace the upper of the two interiors: }
 R.Assign(Bounds.A.X,Bounds.A.Y,Bounds.B.X,Bounds.A.Y+4);
 TopInterior := New(PMortgageTopInterior,Init(R));
 TopInterior^.Mortgage := @Mortgage;
 Insert(TopInterior);


 { Define a rectangle to embrace the lower of two interiors: }
 R.Assign(Bounds.A.X,Bounds.A.Y+5,Bounds.B.X,Bounds.B.Y);

 { Create scroll bars for both mouse & keyboard input: }
 VScrollBar := StandardScrollBar(sbVertical + sbHandleKeyboard);
 { We have to adjust vertical bar to fit bottom interior: }
 VScrollBar^.Origin.Y := R.A.Y; { Adjust top Y value }
 VScrollBar^.Size.Y := R.B.Y - R.A.Y; { Adjust size }
 { The horizontal scroll bar, on the other hand, is standard: }
 HScrollBar := StandardScrollBar(sbHorizontal + sbHandleKeyboard);

 { Create bottom interior object with scroll bars: }
 BottomInterior :=
 New(PMortgageBottomInterior,Init(R,HScrollBar,VScrollBar));
 { Make copy of pointer to mortgage object: }
 BottomInterior^.Mortgage := @Mortgage;
 { Set the limits for the scroll bars: }
 BottomInterior^.SetLimit(80,InitMortgageData.PeriodsData);
 { Insert the interior into the window: }
 Insert(BottomInterior);
END;


PROCEDURE TMortgageView.HandleEvent(Var Event : TEvent);

BEGIN
 TWindow.HandleEvent(Event);
 IF Event.What = evCommand THEN
 BEGIN
 CASE Event.Command OF
 cmExtraPrin : ExtraPrincipal;
 cmPrintSummary : PrintSummary;
 ELSE
 Exit;
 END; { CASE }
 ClearEvent(Event);
 END
 ELSE
 IF Event.What = evBroadcast THEN
 CASE Event.Command OF
 cmCloseBC : Done
 END; { CASE }
END;


PROCEDURE TMortgageView.ExtraPrincipal;

VAR
 Control : Word;
 ExtraPrincipalData : ExtraPrincipalDialogData;

BEGIN
 { Execute the "extra principal" dialog box: }
 Control := Desktop^.ExecView(HouseCalc.ExtraDialog);
 IF Control <> cmCancel THEN { Update the active mortgage window: }
 BEGIN
 { Get data from the extra principal dialog: }
 HouseCalc.ExtraDialog^.GetData(ExtraPrincipalData);

 Mortgage.Payments^[ExtraPrincipalData.PaymentNumber].ExtraPrincipal :=
 ExtraPrincipalData.ExtraDollars;
 Mortgage.Recalc; { Recalculate the amortization table... }
 Redraw; { ...and redraw the mortgage window }
 END;
END;


PROCEDURE TMortgageView.PrintSummary;

BEGIN
END;


DESTRUCTOR TMortgageView.Done;

BEGIN
 Mortgage.Done; { Dispose of the mortgage object's memory }
 TWindow.Done; { Call parent's destructor to dispose of window }
END;



BEGIN
 HouseCalc.Init;
 HouseCalc.Run;
 HouseCalc.Done;
END.




[LISTING FOUR]

unit FInput;
{$X+}
{
 This unit implements a derivative of TInputLine that supports several
 data types dynamically. It also provides formatted input for all the
 numerical types, keystroke filtering and uppercase conversion, field
 justification, and range checking.

 When the field is initialized, many filtering and uppercase converions
 are implemented pertinent to the particular data type.

 The CheckRange and ErrorHandler methods should be overridden if the
 user wants to implement then.

 This is just an initial implementation and comments are welcome. You
 can contact me via Compuserve. (76066,3202)

 I am releasing this into the public domain and anyone can use or modify
 it for their own personal use.

 Copyright (c) 1990 by Allen Bauer (76066,3202)

 1.1 - fixed input validation functions

 This is version 1.2 - fixed DataSize method to include reals.

 fixed Draw method to not format the data
 while the view is selected.
}

interface
uses Objects, Drivers, Dialogs;

type
 VKeys = set of char;

 PFInputLine = ^TFInputLine;
 TFInputLine = object(TInputLine)
 ValidKeys : VKeys;
 DataType,Decimals : byte;
 imMode : word;
 Validated, ValidSent : boolean;
 constructor Init(var Bounds: TRect; AMaxLen: integer;
 ChrSet: VKeys;DType, Dec: byte);
 constructor Load(var S: TStream);
 procedure Store(var S: TStream);
 procedure HandleEvent(var Event: TEvent); virtual;
 procedure GetData(var Rec); virtual;
 procedure SetData(var Rec); virtual;
 function DataSize: word; virtual;
 procedure Draw; virtual;
 function CheckRange: boolean; virtual;
 procedure ErrorHandler; virtual;
 end;

const
 imLeftJustify = $0001;
 imRightJustify = $0002;
 imConvertUpper = $0004;

 DString = 0;
 DChar = 1;
 DReal = 2;
 DByte = 3;
 DShortInt = 4;
 DInteger = 5;
 DLongInt = 6;
 DWord = 7;
 DDate = 8;
 DTime = 9;

 DRealSet : VKeys = [#1..#31,'+','-','0'..'9','.','E','e'];
 DSignedSet : VKeys = [#1..#31,'+','-','0'..'9'];
 DUnSignedSet : VKeys = [#1..#31,'0'..'9'];
 DCharSet : VKeys = [#1..#31,' '..'~'];
 DUpperSet : VKeys = [#1..#31,' '..'`','{'..'~'];
 DAlphaSet : VKeys = [#1..#31,'A'..'Z','a'..'z'];
 DFileNameSet : VKeys =
[#1..#31,'!','#'..')','-'..'.','0'..'9','@'..'Z','^'..'{','}'..'~'];
 DPathSet : VKeys =
[#1..#31,'!','#'..')','-'..'.','0'..':','@'..'Z','^'..'{','}'..'~','\'];
 DFileMaskSet : VKeys =
[#1..#31,'!','#'..'*','-'..'.','0'..':','?'..'Z','^'..'{','}'..'~','\'];
 DDateSet : VKeys = [#1..#31,'0'..'9','/'];
 DTimeSet : VKeys = [#1..#31,'0'..'9',':'];

 cmValidateYourself = 5000;
 cmValidatedOK = 5001;


procedure RegisterFInputLine;

const
 RFInputLine : TStreamRec = (
 ObjType: 20000;
 VmtLink: Ofs(typeof(TFInputLine)^);
 Load: @TFInputLine.Load;
 Store: @TFinputLine.Store
 );

implementation

uses Views, MsgBox, StrFmt, Dos;

function CurrentDate : string;
var
 Year,Month,Day,DOW : word;
 DateStr : string[10];
begin
 GetDate(Year,Month,Day,DOW);
 DateStr := SFLongint(Month,2)+'/'
 +SFLongInt(Day,2)+'/'
 +SFLongInt(Year mod 100,2);
 for DOW := 1 to length(DateStr) do
 if DateStr[DOW] = ' ' then
 DateStr[DOW] := '0';
 CurrentDate := DateStr;
end;

function CurrentTime : string;
var
 Hour,Minute,Second,Sec100 : word;
 TimeStr : string[10];
begin
 GetTime(Hour,Minute,Second,Sec100);
 TimeStr := SFLongInt(Hour,2)+':'
 +SFLongInt(Minute,2)+':'
 +SFLongInt(Second,2);
 for Sec100 := 1 to length(TimeStr) do
 if TimeStr[Sec100] = ' ' then
 TimeStr[Sec100] := '0';
 CurrentTime := TimeStr;
end;

procedure RegisterFInputLine;
begin
 RegisterType(RFInputLine);
end;

constructor TFInputLine.Init(var Bounds: TRect; AMaxLen: integer;
 ChrSet: VKeys; DType, Dec: byte);
begin
 if (DType in [DDate,DTime]) and (AMaxLen < 8) then
 AMaxLen := 8;

 TInputLine.Init(Bounds,AMaxLen);

 ValidKeys:= ChrSet;

 DataType := DType;
 Decimals := Dec;
 Validated := true;
 ValidSent := false;
 case DataType of
 DReal,DByte,DLongInt,
 DShortInt,DWord : imMode := imRightJustify;

 DChar,DString,
 DDate,DTime : imMode := imLeftJustify;
 end;
 if ValidKeys = DUpperSet then
 imMode := imMode or imConvertUpper;
 EventMask := EventMask or evMessage;
end;

constructor TFInputLine.Load(var S: TStream);
begin
 TInputLine.Load(S);
 S.Read(ValidKeys, sizeof(VKeys));
 S.Read(DataType, sizeof(byte));
 S.Read(Decimals, sizeof(byte));
 S.Read(imMode, sizeof(word));
 S.Read(Validated, sizeof(boolean));
 S.Read(ValidSent, sizeof(boolean));
end;

procedure TFInputLine.Store(var S: TStream);
begin
 TInputLine.Store(S);
 S.Write(ValidKeys, sizeof(VKeys));
 S.Write(DataType, sizeof(byte));
 S.Write(Decimals, sizeof(byte));
 S.Write(imMode, sizeof(word));
 S.Write(Validated, sizeof(boolean));
 S.Write(ValidSent, sizeof(boolean));
end;

procedure TFInputLine.HandleEvent(var Event: TEvent);
var
 NewEvent: TEvent;
begin
 case Event.What of
 evKeyDown : begin
 if (imMode and imConvertUpper) <> 0 then
 Event.CharCode := upcase(Event.CharCode);
 if not(Event.CharCode in [#0..#31]) then
 begin
 Validated := false;
 ValidSent := false;
 end;
 if (Event.CharCode <> #0) and not(Event.CharCode in ValidKeys) then
 ClearEvent(Event);
 end;
 evBroadcast: begin
 if (Event.Command = cmReceivedFocus) and
 (Event.InfoPtr <> @Self) and
 ((Owner^.State and sfSelected) <> 0) and
 not(Validated) and not(ValidSent) then

 begin
 NewEvent.What := evBroadcast;
 NewEvent.InfoPtr := @Self;
 NewEvent.Command := cmValidateYourself;
 PutEvent(NewEvent);
 ValidSent := true;
 end;
 if (Event.Command = cmValidateYourself) and
 (Event.InfoPtr = @Self) then
 begin
 if not CheckRange then
 begin
 ErrorHandler;
 Select;
 end
 else
 begin
 NewEvent.What := evBroadCast;
 NewEvent.InfoPtr := @Self;
 NewEvent.Command := cmValidatedOK;
 PutEvent(NewEvent);
 Validated := true;
 end;
 ValidSent := false;
 ClearEvent(Event);
 end;
 end;
 end;
 TInputLine.HandleEvent(Event);
end;

procedure TFInputLine.GetData(var Rec);
var
 Code : integer;
begin
 case DataType of
 Dstring,
 DDate,
 DTime : TInputLine.GetData(Rec);
 DChar : char(Rec) := Data^[1];
 DReal : val(Data^, real(Rec) , Code);
 DByte : val(Data^, byte(Rec) , Code);
 DShortInt : val(Data^, shortint(Rec) , Code);
 DInteger : val(Data^, integer(Rec) , Code);
 DLongInt : val(Data^, longint(Rec) , Code);
 DWord : val(Data^, word(Rec) , Code);
 end;
end;

procedure TFInputLine.SetData(var Rec);
begin
 case DataType of
 DString,
 DDate,
 DTime : TInputLine.SetData(Rec);
 DChar : Data^ := char(Rec);
 DReal : Data^ := SFDReal(real(Rec),MaxLen,Decimals);
 DByte : Data^ := SFLongInt(byte(Rec),MaxLen);
 DShortInt : Data^ := SFLongInt(shortint(Rec),MaxLen);

 DInteger : Data^ := SFLongInt(integer(Rec),MaxLen);
 DLongInt : Data^ := SFLongInt(longint(Rec),MaxLen);
 DWord : Data^ := SFLongInt(word(Rec),MaxLen);
 end;
 SelectAll(true);
end;

function TFInputLine.DataSize: word;
begin
 case DataType of
 DString,
 DDate,
 DTime : DataSize := TInputLine.DataSize;
 DChar : DataSize := sizeof(char);
 DReal : DataSize := sizeof(real);
 DByte : DataSize := sizeof(byte);
 DShortInt : DataSize := sizeof(shortint);
 DInteger : DataSize := sizeof(integer);
 DLongInt : DataSize := sizeof(longint);
 DWord : DataSize := sizeof(word);
 else
 DataSize := TInputLine.DataSize;
 end;
end;

procedure TFInputLine.Draw;
var
 RD : real;
 Code : integer;
begin
 if not((State and sfSelected) <> 0) then
 case DataType of
 DReal : begin
 if Data^ = '' then
 Data^ := SFDReal(0.0,MaxLen,Decimals)
 else
 begin
 val(Data^, RD, Code);
 Data^ := SFDReal(RD,MaxLen,Decimals);
 end;
 end;

 DByte,
 DShortInt,
 DInteger,
 DLongInt,
 DWord : if Data^ = '' then Data^ := SFLongInt(0,MaxLen);

 DDate : if Data^ = '' then Data^ := CurrentDate;
 DTime : if Data^ = '' then Data^ := CurrentTime;

 end;

 if State and (sfFocused+sfSelected) <> 0 then
 begin
 if (imMode and imRightJustify) <> 0 then
 while (length(Data^) > 0) and (Data^[1] = ' ') do
 delete(Data^,1,1);
 end

 else
 begin
 if ((imMode and imRightJustify) <> 0) and (Data^ <> '') then
 while (length(Data^) < MaxLen) do
 insert(' ',Data^,1);
 if (imMode and imLeftJustify) <> 0 then
 while (length(Data^) > 0) and (Data^[1] = ' ') do
 delete(Data^,1,1);

 end;
 TInputLine.Draw;
end;

function TFInputLine.CheckRange: boolean;
var
 MH,DM,YS : longint;
 Code : integer;
 MHs,DMs,YSs : string[2];
 Delim : char;
 Ok : boolean;
begin
 Ok := true;
 case DataType of
 DDate,
 DTime : begin
 if DataType = DDate then Delim := '/' else Delim := ':';
 if pos(Delim,Data^) > 0 then
 begin
 MHs := copy(Data^,1,pos(Delim,Data^));
 DMs := copy(Data^,pos(Delim,Data^)+1,2);
 delete(Data^,pos(Delim,Data^),1);
 YSs := copy(Data^,pos(Delim,Data^)+1,2);
 if length(MHs) < 2 then MHs := '0' + MHs;
 if length(DMs) < 2 then DMs := '0' + DMs;
 if length(YSs) < 2 then YSs := '0' + YSs;
 Data^ := MHs + DMs + YSs;
 end;
 if (length(Data^) >= 6) and (pos(Delim,Data^) = 0) then
 begin
 val(copy(Data^,1,2), MH, Code);
 if Code <> 0 then MH := 0;
 val(copy(Data^,3,2), DM, Code);
 if Code <> 0 then DM := 0;
 val(copy(Data^,5,2), YS, Code);
 if Code <> 0 then YS := 0;
 if DataType = DDate then
 begin
 if (MH > 12) or (MH < 1) or
 (DM > 31) or (DM < 1) then Ok := false;
 end
 else
 begin
 if (MH > 23) or (MH < 0) or
 (DM > 59) or (DM < 0) or
 (YS > 59) or (YS < 0) then Ok := false;
 end;
 insert(Delim,Data^,5);
 insert(Delim,Data^,3);
 end

 else
 Ok := false;
 end;

 DByte : begin
 val(Data^, MH, Code);
 if (Code <> 0) or (MH > 255) or (MH < 0) then Ok := false;
 end;

 DShortint :
 begin
 val(Data^, MH, Code);
 if (Code <> 0) or (MH < -127) or (MH > 127) then Ok := false;
 end;

 DInteger :
 begin
 val(Data^, MH, Code);
 if (Code <> 0) or (MH < -32768) or (MH > 32767) then Ok := false;
 end;

 DWord : begin
 val(Data^, MH, Code);
 if (Code <> 0) or (MH < 0) or (MH > 65535) then Ok := false;
 end;
 end;
 CheckRange := Ok;
end;

procedure TFInputLine.ErrorHandler;
var
 MsgString : string[80];
 Params : array[0..1] of longint;
 Event: TEvent;
begin
 fillchar(Params,sizeof(params),#0);
 MsgString := '';
 case DataType of
 DDate : MsgString := ' Invalid Date Format! Enter Date as MM/DD/YY ';
 DTime : MsgString := ' Invalid Time Format! Enter Time as HH:MM:SS ';
 DByte,
 DShortInt,
 DInteger,
 DWord : begin
 MsgString := ' Number must be between %d and %d ';
 case DataType of
 DByte : Params[1] := 255;
 DShortInt : begin Params[0] := -128; Params[1] := 127; end;
 DInteger : begin Params[0] := -32768; Params[1] := 32768; end;
 DWord : Params[1] := 65535;
 end;
 end;
 end;
 MessageBox(MsgString, @Params, mfError + mfOkButton);
end;

end.

































































February, 1992
GRAPHICS PROGRAMMING


More 3-D Animation


 This article contains the following executables: 3D.ARC


Michael Abrash


As I'm fond of pointing out, computer animation isn't a matter of
mathematically exact modeling or raw technical prowess, but rather of fooling
the eye and the mind. That's especially true for 3-D animation, where we're
not only trying to convince viewers that they're seeing objects on a
screen--when in truth that screen contains no objects at all, only gaggles of
pixels--but we're also trying to create the illusion that the objects exist in
three-space, possessing four dimensions (counting movement over time) of their
own. To make this magic happen, we must provide cues for the eye not only to
pick out boundaries, but also to detect depth, orientation, and motion. This
involves perspective, shading, proper handling of hidden surfaces, and rapid
and smooth screen updates; the whole deal is considerably more difficult to
pull off on a PC than 2-D animation.
And yet, in some senses, 3-D animation is easier than 2-D. Because there's
more going on in 3-D animation, the eye and brain tend to make more
assumptions, and so are more apt to see what they expect to see, rather than
what's actually there. If you're piloting a (virtual) ship through a field of
thousands of asteroids at high speed, you're unlikely to notice if the more
distant asteroids occasionally seem to go right through each other, or if the
topographic detail on the asteroids' surfaces sometimes shifts about a bit.
You'll be busy viewing the asteroids in their primary role, as objects to be
navigated around, and the mere presence of topographic detail will suffice;
without being aware of it, you'll fill in the blanks. Your mind will see the
topography peripherally, recognize it for what it is supposed to be, and,
unless the landscape does something really obtrusive such as vanishing
altogether or suddenly shooting a spike miles into space, you will see what
you expect to see: a bunch of nicely detailed asteroids tumbling around you.
To what extent can you rely on the eye and mind to make up for imperfections
in the 3-D animation process? In some areas, hardly at all; for example,
jaggies crawling along edges stick out like red flags, and likewise for
flicker. In other areas, though, the human perceptual system is more forgiving
than you'd think. Consider this. At the end of Return of the Jedi, in the
battle to end all battles around the Death Star, there is a sequence of about
five seconds in which several spaceships are visible in the background. One of
those spaceships (and it's not very far in the background, either) looks a bit
unusual. What it looks like is a sneaker. In fact, it is a sneaker--but unless
you know to look for it, you'll never notice it, because your mind is busy
making simplifying assumptions about the complex scene it's seeing--and one of
those assumptions is that medium-sized objects floating in space are
spaceships, unless proven otherwise. (Thanks to Chris Hecker for pointing this
out. I'd never have noticed the sneaker, myself, without being tipped
off--which is, of course, the whole point.)
If it's good enough for George Lucas, it's good enough for us. And with that,
let's resume our quest for real-time 3-D animation on the PC.


One-sided Polygons: Backface Removal


Last month, we implemented the basic polygon drawing pipeline, transforming a
polygon all the way from its basic definition in object space, through the
shared 3-D world space, and into the 3-D space as seen from the viewpoint,
called "view space." From view space, we performed a perspective projection to
convert the polygon into screen space, then mapped the transformed and
projected vertices to the nearest screen coordinates and filled the polygon.
Armed with code that implemented this pipeline, we were able to watch as a
polygon rotated about its Y axis, and were able to move the polygon around in
space freely.
One of the drawbacks of last month's approach was that the polygon had two
visible sides. Why is that a drawback? Well, it isn't, necessarily, but in our
case we want to use polygons to build solid objects with continuous surfaces,
and in that context, only one side of a polygon is ever visible; the other
side always faces the inside of the object, and can never be seen. It would
save time and simplify the process of hidden surface removal if we could
quickly and easily determine whether the inside or outside face of each
polygon was facing us, so that we could draw each polygon only if it were
visible (that is, had the outside face pointing toward the viewer). On
average, half the polygons in an object could be instantly rejected by a test
of this sort. Such testing of polygon visibility goes by a number of names in
the literature, including backplane culling, backface removal, and assorted
variations thereon; I'll refer to it as backface removal.
For a single convex polyhedron, removal of polygons that aren't facing the
viewer would solve all hidden surface problems. In a convex polyhedron, any
polygon facing the viewer can never be obscured by any other polygon in that
polyhedron; this falls out of the definition of a convex polyhedron. Likewise,
any polygon facing away from the viewer can never be visible. Therefore, in
order to draw a convex polyhedron, if you draw all polygons facing toward the
viewer but none facing away from the viewer, everything will work out
properly, with no additional checking for overlap and hidden surfaces needed.
Unfortunately, backface removal completely solves the hidden surface problem
only for convex polyhedrons, and only if there's a single convex polyhedron
involved; when convex polyhedrons overlap, other methods must be used.
Nonetheless, backface removal does instantly halve the number of polygons to
be handled in rendering any particular scene. Backface removal can also speed
hidden-surface handling if objects are built out of convex polyhedrons, as
we'll see in a future column. This month, though, we have only one convex
polyhedron to deal with, so backface removal alone will do the trick.
Given that I've convinced you that backface removal would be a handy thing to
have, how do we actually do it? A logical approach, often implemented in PC
literature, would be to calculate the plane equation for the plane in which
the polygon lies, and see which way the normal (perpendicular) vector to the
plane points. That works, but there's more efficient way to calculate the
normal to the polygon: as the cross-product of two of the polygon's edges.
The cross-product of two vectors is defined as the vector shown in Figure 1.
One interesting property of the cross-product vector is that it is
perpendicular to the plane in which the two original vectors lie. If we take
the cross-product of the vectors that form two edges of a polygon, the result
will be a vector perpendicular to the polygon; then, we'll know that the
polygon is visible if and only if the cross-product vector points toward the
viewer. We need one more thing to make the cross-product approach work,
though. The cross-product can actually point either way, depending on which
edges of the polygon we choose to work with and the order in which we evaluate
them, so we must establish some conventions for defining polygons and
evaluating the cross-product. We'll define only convex polygons, with the
vertices defined in clockwise order, as viewed from the outside; that is, if
you're looking at the visible side of the polygon, the vertices will appear in
the polygon definition in clockwise order. With those assumptions, the
cross-product becomes a quick and easy indicator of polygon orientation with
respect to the viewer; we'll calculate it as the cross-product of the first
and last vectors in a polygon, as shown in Figure 2, and if it's pointing
toward the viewer, we'll know that the polygon is visible. Actually, we don't
even have to calculate the entire cross-product vector, because the Z
component alone suffices to tell us which way the polygon is facing: positive
Z means visible, negative Z means not. The Z component can be calculated very
efficiently, with only two multiplies and a subtraction.
The question remains of the proper space in which to perform backface removal.
There's a temptation to perform it in view space, which is, after all, the
space defined with respect to the viewer, but view space is not a good choice.
Screen space--the space in which perspective projection has been performed--is
the best choice. The purpose of backface removal is to determine whether each
polygon is visible to the viewer, and, despite its name, view space does not
provide that information; unlike screen space, it does not reflect perspective
effects.
Backface removal may also be performed using the polygon vertices in screen
coordinates, which are integers. This is less accurate than using the screen
space coordinates, which are floating point, but is, by the same token,
faster. In Listing Three, backface removal is performed in screen coordinates
in the interests of speed.
Backface removal, as implemented in Listing Three, will not work reliably if
the polygon is not convex, if the vertices don't appear in counterclockwise
order, if either the first or last edge in a polygon has zero length, or if
the first and last edges are congruent. These latter two points are the reason
it's preferable to work in screen space rather than screen coordinates (which
suffer from rounding problems), speed considerations aside.


Backface Removal in Action


Listings One through Five together form a program that rotates a solid cube in
real time under user control. Listing One (page 147) is the main program;
Listing Two (page 147) performs transformation and projection; Listing Three
(page 147) performs backface removal and draws visible faces; Listing Four
(page 148) concatenates incremental rotations to the object-to-world
transformation matrix; Listing Five (page 150) is the general header file.
Also required from previous columns are: Listings One and Two from last month
(draw clipped line list, matrix math functions); Listings One and Six from
July 1991, (mode X mode set, rectangle fill); Listing Six from September 1991;
Listing Four from March 1991 (polygon edge scan); and the FillConvexPolygon()
function from Listing One from February 1991. All necessary modules, along
with a make file, will be available as part of the code from this issue.
The sample program places a cube, floating in three-space, under the complete
control of the user. The arrow keys may be used to move the cube left, right,
up, and down, and the A and T keys may be used to move the cube away from or
toward the viewer. The F1 and F2 keys perform rotation around the Z axis, the
axis running from the viewer straight into the screen. The 4 and 6 keys
perform rotation around the Y (vertical) axis, and the 2 and 8 keys perform
rotation around the X axis, which runs horizontally across the screen; the
latter four keys are most conveniently used by flipping the keypad to the
numeric state.
The demo involves six polygons, one for each side of the cube. Each of the
polygons must be transformed and projected, so it would seem that 24 vertices
(four for each polygon) must be handled, but some steps have been taken to
improve performance. All vertices for the object have been stored in a single
list; the definition of each face contains not the vertices for that face
themselves, but rather indexes into the object's vertex list, as shown in
Figure 3. This reduces the number of vertices to be manipulated from 24 to 8,
for there are, after all, only eight vertices in a cube, with three faces
sharing each vertex. In this way, the transformation burden is lightened by
two-thirds. Also, as mentioned earlier, backface removal is performed with
integers, in screen coordinates, rather than with floating-point values in
screen space. Finally, the RecalcXForm flag is set whenever the user changes
the object-to-world transformation. Only when this flag is set is the full
object-to-view transformation recalculated and the object's vertices
transformed and projected again; otherwise, the values already stored within
the object are reused. In the sample application, this brings no visual
improvement, because there's only the one object, but the underlying mechanism
is sound: in a full-blown 3-D animation application, with multiple objects
moving about the screen, it would help a great deal to flag which of the
objects had moved with respect to the viewer, performing a new transformation
and projection only for those that had.
With the above optimizations, the sample program is certainly adequately
responsive on a 20-MHz 386 (sans 387; I'm sure it's wonderfully responsive
with a 387). Still, it couldn't quite keep up with the keyboard when I
modified it to read only one key each time through the loop--and we're talking
about only eight vertices here. This indicates that we're already near the
limit of animation complexity possible with our current approach. It's time to
start rethinking that approach; over two-thirds of the overall time is spent
in floating-point calculations, and it's there that we'll begin to attack the
performance bottleneck we find ourselves up against.


Incremental Transformation


Listing Four contains three functions; each concatenates an additional
rotation around one of the three axes to an existing rotation. In order to
improve performance, only the matrix entries that are affected in a rotation
around each particular axis are recalculated (all but four of the entries in a
single-axis rotation matrix are either 0 or 1, as shown last month). This cuts
the number of floating-point multiplies from the 64 required for the
multiplication of two 4x4 matrices to just 12, and floating point adds from 48
to 6.
Be aware that Listing Four performs an incremental rotation on top of whatever
rotation is already in the matrix. The cube may already have been turned left,
right, up, down, and sideways; regardless, Listing Four just tacks the
specified rotation onto whatever already exists. In this way, the
object-to-world transformation matrix contains a history of all the rotations
ever specified by the user, concatenated one after another onto the original
matrix. Potential loss of precision is a problem associated with using such an
approach to represent a very long concatenation of transformations, especially
with fixed-point arithmetic; that's not a problem for us yet, but we'll run
into it eventually.


A Note on Rounding Negative Numbers


Last month, I added 0.5 and truncated in order to round values from
floating-point to integer format. In Listing Two this month, I've switched to
adding 0.5 and using the floor() function. For positive values, the two
approaches are equivalent; for negative values, only the floor() approach
works properly.


Object Representation



Each object consists of a list of vertices and a list of faces, with the
vertices of each face defined by pointers into the vertex list; this allows
each vertex to be transformed exactly once, even though several faces may
share a single vertex. Each object contains the vertices not only in their
original, untransformed state, but in three other forms as well: transformed
to view space, transformed and projected to screen space, and converted to
screen coordinates. Earlier, we saw that it can be convenient to store the
screen coordinates within the object, so that if the object hasn't moved with
respect to the viewer, it can be redrawn without the need for recalculation,
but why bother storing the view and screen space forms of the vertices as
well?
The screen space vertices are useful for some sorts of hidden surface removal.
For example, in order to determine whether two polygons overlap as seen by the
viewer, you must first know how they look to the viewer, accounting for
perspective; screen space provides that information. (So do the final screen
coordinates, but with less accuracy, and without any Z information.) The view
space vertices are useful for collision and proximity detection; screen space
can't be used here, because objects are distorted by the perspective
projection into screen space. World space would serve as well as view space
for collision detection, but because it's possible to transform directly from
object space to view space with a single matrix, it's often preferable to skip
over world space altogether. It's not mandatory that vertices be stored for
all these different spaces, but the coordinates in all those spaces have to be
calculated as intermediate steps anyway, so we might as well keep them around
for those occasions when they're needed.
Coming up: shading, hidden surfaces, and performance.

_GRAPHICS PROGRAMMING COLUMN_
by Michael Abrash




[LISTING ONE]


/* 3D animation program to view a cube as it rotates in mode X. The viewpoint
is fixed at the origin (0,0,0) of world space, looking in the direction of
increasingly negative Z. A right-handed coordinate system is used throughout.
All C code tested with Borland C++ 2.0 in C compilation mode */
#include <conio.h>
#include <dos.h>
#include <math.h>
#include "polygon.h"

#define ROTATION (M_PI / 30.0) /* rotate by 6 degrees at a time */

/* Base offset of page to which to draw */
unsigned int CurrentPageBase = 0;
/* Clip rectangle; clips to the screen */
int ClipMinX=0, ClipMinY=0;
int ClipMaxX=SCREEN_WIDTH, ClipMaxY=SCREEN_HEIGHT;
/* Rectangle specifying extent to be erased in each page */
struct Rect EraseRect[2] = { {0, 0, SCREEN_WIDTH, SCREEN_HEIGHT},
 {0, 0, SCREEN_WIDTH, SCREEN_HEIGHT} };
static unsigned int PageStartOffsets[2] =
 {PAGE0_START_OFFSET,PAGE1_START_OFFSET};
int DisplayedPage, NonDisplayedPage;
/* Transformation from cube's object space to world space. Initially
 set up to perform no rotation and to move the cube into world
 space -100 units away from the origin down the Z axis. Given the
 viewing point, -100 down the Z axis means 100 units away in the
 direction of view. The program dynamically changes both the
 translation and the rotation. */
static double CubeWorldXform[4][4] = {
 {1.0, 0.0, 0.0, 0.0},
 {0.0, 1.0, 0.0, 0.0},
 {0.0, 0.0, 1.0, -100.0},
 {0.0, 0.0, 0.0, 1.0} };
/* Transformation from world space into view space. Because in this
 application the view point is fixed at the origin of world space,
 looking down the Z axis in the direction of increasing Z, view space is
 identical to world space, and this is the identity matrix */
static double WorldViewXform[4][4] = {
 {1.0, 0.0, 0.0, 0.0},
 {0.0, 1.0, 0.0, 0.0},
 {0.0, 0.0, 1.0, 0.0},
 {0.0, 0.0, 0.0, 1.0}
};
/* All vertices in the cube */
static struct Point3 CubeVerts[] = {
 {15,15,15,1},{15,15,-15,1},{15,-15,15,1},{15,-15,-15,1},

 {-15,15,15,1},{-15,15,-15,1},{-15,-15,15,1},{-15,-15,-15,1}};
/* Vertices after transformation */
static struct Point3
 XformedCubeVerts[sizeof(CubeVerts)/sizeof(struct Point3)];
/* Vertices after projection */
static struct Point3
 ProjectedCubeVerts[sizeof(CubeVerts)/sizeof(struct Point3)];
/* Vertices in screen coordinates */
static struct Point
 ScreenCubeVerts[sizeof(CubeVerts)/sizeof(struct Point3)];
/* Vertex indices for individual faces */
static int Face1[] = {1,3,2,0};
static int Face2[] = {5,7,3,1};
static int Face3[] = {4,5,1,0};
static int Face4[] = {3,7,6,2};
static int Face5[] = {5,4,6,7};
static int Face6[] = {0,2,6,4};
/* List of cube faces */
static struct Face CubeFaces[] = {{Face1,4,15},{Face2,4,14},
 {Face3,4,12},{Face4,4,11},{Face5,4,10},{Face6,4,9}};
/* Master description for cube */
static struct Object Cube = {sizeof(CubeVerts)/sizeof(struct Point3),
 CubeVerts, XformedCubeVerts, ProjectedCubeVerts, ScreenCubeVerts,
 sizeof(CubeFaces)/sizeof(struct Face), CubeFaces};

void main() {
 int Done = 0, RecalcXform = 1;
 double WorkingXform[4][4];
 union REGS regset;

 /* Set up the initial transformation */
 Set320x240Mode(); /* set the screen to mode X */
 ShowPage(PageStartOffsets[DisplayedPage = 0]);
 /* Keep transforming the cube, drawing it to the undisplayed page,
 and flipping the page to show it */
 do {
 /* Regenerate the object->view transformation and
 retransform/project if necessary */
 if (RecalcXform) {
 ConcatXforms(WorldViewXform, CubeWorldXform, WorkingXform);
 /* Transform and project all the vertices in the cube */
 XformAndProjectPoints(WorkingXform, &Cube);
 RecalcXform = 0;
 }
 CurrentPageBase = /* select other page for drawing to */
 PageStartOffsets[NonDisplayedPage = DisplayedPage ^ 1];
 /* Clear the portion of the non-displayed page that was drawn
 to last time, then reset the erase extent */
 FillRectangleX(EraseRect[NonDisplayedPage].Left,
 EraseRect[NonDisplayedPage].Top,
 EraseRect[NonDisplayedPage].Right,
 EraseRect[NonDisplayedPage].Bottom, CurrentPageBase, 0);
 EraseRect[NonDisplayedPage].Left =
 EraseRect[NonDisplayedPage].Top = 0x7FFF;
 EraseRect[NonDisplayedPage].Right =
 EraseRect[NonDisplayedPage].Bottom = 0;
 /* Draw all visible faces of the cube */
 DrawVisibleFaces(&Cube);
 /* Flip to display the page into which we just drew */

 ShowPage(PageStartOffsets[DisplayedPage = NonDisplayedPage]);
 while (kbhit()) {
 switch (getch()) {
 case 0x1B: /* Esc to exit */
 Done = 1; break;
 case 'A': case 'a': /* away (-Z) */
 CubeWorldXform[2][3] -= 3.0; RecalcXform = 1; break;
 case 'T': /* towards (+Z). Don't allow to get too */
 case 't': /* close, so Z clipping isn't needed */
 if (CubeWorldXform[2][3] < -40.0) {
 CubeWorldXform[2][3] += 3.0;
 RecalcXform = 1;
 }
 break;
 case '4': /* rotate clockwise around Y */
 AppendRotationY(CubeWorldXform, -ROTATION);
 RecalcXform=1; break;
 case '6': /* rotate counterclockwise around Y */
 AppendRotationY(CubeWorldXform, ROTATION);
 RecalcXform=1; break;
 case '8': /* rotate clockwise around X */
 AppendRotationX(CubeWorldXform, -ROTATION);
 RecalcXform=1; break;
 case '2': /* rotate counterclockwise around X */
 AppendRotationX(CubeWorldXform, ROTATION);
 RecalcXform=1; break;
 case 0: /* extended code */
 switch (getch()) {
 case 0x3B: /* rotate counterclockwise around Z */
 AppendRotationZ(CubeWorldXform, ROTATION);
 RecalcXform=1; break;
 case 0x3C: /* rotate clockwise around Z */
 AppendRotationZ(CubeWorldXform, -ROTATION);
 RecalcXform=1; break;
 case 0x4B: /* left (-X) */
 CubeWorldXform[0][3] -= 3.0; RecalcXform=1; break;
 case 0x4D: /* right (+X) */
 CubeWorldXform[0][3] += 3.0; RecalcXform=1; break;
 case 0x48: /* up (+Y) */
 CubeWorldXform[1][3] += 3.0; RecalcXform=1; break;
 case 0x50: /* down (-Y) */
 CubeWorldXform[1][3] -= 3.0; RecalcXform=1; break;
 default:
 break;
 }
 break;
 default: /* any other key to pause */
 getch(); break;
 }
 }
 } while (!Done);
 /* Return to text mode and exit */
 regset.x.ax = 0x0003; /* AL = 3 selects 80x25 text mode */
 int86(0x10, &regset, &regset);
}







[LISTING TWO]


/* Transforms all vertices in the specified object into view space, then
perspective projects them to screen space and maps them to screen coordinates,
storing the results in the object. */
#include <math.h>
#include "polygon.h"

void XformAndProjectPoints(double Xform[4][4],
 struct Object * ObjectToXform)
{
 int i, NumPoints = ObjectToXform->NumVerts;
 struct Point3 * Points = ObjectToXform->VertexList;
 struct Point3 * XformedPoints = ObjectToXform->XformedVertexList;
 struct Point3 * ProjectedPoints =
 ObjectToXform->ProjectedVertexList;
 struct Point * ScreenPoints = ObjectToXform->ScreenVertexList;

 for (i=0; i<NumPoints; i++, Points++, XformedPoints++,
 ProjectedPoints++, ScreenPoints++) {
 /* Transform to view space */
 XformVec(Xform, (double *)Points, (double *)XformedPoints);
 /* Perspective-project to screen space */
 ProjectedPoints->X = XformedPoints->X / XformedPoints->Z *
 PROJECTION_RATIO * (SCREEN_WIDTH / 2.0);
 ProjectedPoints->Y = XformedPoints->Y / XformedPoints->Z *
 PROJECTION_RATIO * (SCREEN_WIDTH / 2.0);
 ProjectedPoints->Z = XformedPoints->Z;
 /* Convert to screen coordinates. The Y coord is negated to
 flip from increasing Y being up to increasing Y being down,
 as expected by the polygon filler. Add in half the screen
 width and height to center on the screen */
 ScreenPoints->X = ((int) floor(ProjectedPoints->X + 0.5)) +
 SCREEN_WIDTH/2;
 ScreenPoints->Y = (-((int) floor(ProjectedPoints->Y + 0.5))) +
 SCREEN_HEIGHT/2;
 }
}






[LISTING THREE]


/* Draws all visible faces (faces pointing toward the viewer) in the specified
object. The object must have previously been transformed and projected, so
that the ScreenVertexList array is filled in. */
#include "polygon.h"

void DrawVisibleFaces(struct Object * ObjectToXform)
{
 int i, j, NumFaces = ObjectToXform->NumFaces, NumVertices;
 int * VertNumsPtr;

 struct Face * FacePtr = ObjectToXform->FaceList;
 struct Point * ScreenPoints = ObjectToXform->ScreenVertexList;
 long v1,v2,w1,w2;
 struct Point Vertices[MAX_POLY_LENGTH];
 struct PointListHeader Polygon;

 /* Draw each visible face (polygon) of the object in turn */
 for (i=0; i<NumFaces; i++, FacePtr++) {
 NumVertices = FacePtr->NumVerts;
 /* Copy over the face's vertices from the vertex list */
 for (j=0, VertNumsPtr=FacePtr->VertNums; j<NumVertices; j++)
 Vertices[j] = ScreenPoints[*VertNumsPtr++];
 /* Draw only if outside face showing (if the normal to the
 polygon points toward the viewer; that is, has a positive
 Z component) */
 v1 = Vertices[1].X - Vertices[0].X;
 w1 = Vertices[NumVertices-1].X - Vertices[0].X;
 v2 = Vertices[1].Y - Vertices[0].Y;
 w2 = Vertices[NumVertices-1].Y - Vertices[0].Y;
 if ((v1*w2 - v2*w1) > 0) {
 /* It is facing the screen, so draw */
 /* Appropriately adjust the extent of the rectangle used to
 erase this page later */
 for (j=0; j<NumVertices; j++) {
 if (Vertices[j].X > EraseRect[NonDisplayedPage].Right)
 if (Vertices[j].X < SCREEN_WIDTH)
 EraseRect[NonDisplayedPage].Right = Vertices[j].X;
 else EraseRect[NonDisplayedPage].Right = SCREEN_WIDTH;
 if (Vertices[j].Y > EraseRect[NonDisplayedPage].Bottom)
 if (Vertices[j].Y < SCREEN_HEIGHT)
 EraseRect[NonDisplayedPage].Bottom = Vertices[j].Y;
 else EraseRect[NonDisplayedPage].Bottom=SCREEN_HEIGHT;
 if (Vertices[j].X < EraseRect[NonDisplayedPage].Left)
 if (Vertices[j].X > 0)
 EraseRect[NonDisplayedPage].Left = Vertices[j].X;
 else EraseRect[NonDisplayedPage].Left = 0;
 if (Vertices[j].Y < EraseRect[NonDisplayedPage].Top)
 if (Vertices[j].Y > 0)
 EraseRect[NonDisplayedPage].Top = Vertices[j].Y;
 else EraseRect[NonDisplayedPage].Top = 0;
 }
 /* Draw the polygon */
 DRAW_POLYGON(Vertices, NumVertices, FacePtr->Color, 0, 0);
 }
 }
}






[LISTING FOUR]


/* Routines to perform incremental rotations around the three axes */
#include <math.h>
#include "polygon.h"


/* Concatenate a rotation by Angle around the X axis to the transformation in
XformToChange, placing result back in XformToChange. */
void AppendRotationX(double XformToChange[4][4], double Angle)
{
 double Temp10, Temp11, Temp12, Temp20, Temp21, Temp22;
 double CosTemp = cos(Angle), SinTemp = sin(Angle);

 /* Calculate the new values of the four affected matrix entries */
 Temp10 = CosTemp*XformToChange[1][0]+ -SinTemp*XformToChange[2][0];
 Temp11 = CosTemp*XformToChange[1][1]+ -SinTemp*XformToChange[2][1];
 Temp12 = CosTemp*XformToChange[1][2]+ -SinTemp*XformToChange[2][2];
 Temp20 = SinTemp*XformToChange[1][0]+ CosTemp*XformToChange[2][0];
 Temp21 = SinTemp*XformToChange[1][1]+ CosTemp*XformToChange[2][1];
 Temp22 = SinTemp*XformToChange[1][2]+ CosTemp*XformToChange[2][2];
 /* Put the results back into XformToChange */
 XformToChange[1][0] = Temp10; XformToChange[1][1] = Temp11;
 XformToChange[1][2] = Temp12; XformToChange[2][0] = Temp20;
 XformToChange[2][1] = Temp21; XformToChange[2][2] = Temp22;
}

/* Concatenate a rotation by Angle around the Y axis to the transformation in
XformToChange, placing result back in XformToChange. */
void AppendRotationY(double XformToChange[4][4], double Angle)
{
 double Temp00, Temp01, Temp02, Temp20, Temp21, Temp22;
 double CosTemp = cos(Angle), SinTemp = sin(Angle);

 /* Calculate the new values of the four affected matrix entries */
 Temp00 = CosTemp*XformToChange[0][0]+ SinTemp*XformToChange[2][0];
 Temp01 = CosTemp*XformToChange[0][1]+ SinTemp*XformToChange[2][1];
 Temp02 = CosTemp*XformToChange[0][2]+ SinTemp*XformToChange[2][2];
 Temp20 = -SinTemp*XformToChange[0][0]+ CosTemp*XformToChange[2][0];
 Temp21 = -SinTemp*XformToChange[0][1]+ CosTemp*XformToChange[2][1];
 Temp22 = -SinTemp*XformToChange[0][2]+ CosTemp*XformToChange[2][2];
 /* Put the results back into XformToChange */
 XformToChange[0][0] = Temp00; XformToChange[0][1] = Temp01;
 XformToChange[0][2] = Temp02; XformToChange[2][0] = Temp20;
 XformToChange[2][1] = Temp21; XformToChange[2][2] = Temp22;
}

/* Concatenate a rotation by Angle around the Z axis to the transformation in
XformToChange, placing result back in XformToChange. */
void AppendRotationZ(double XformToChange[4][4], double Angle)
{
 double Temp00, Temp01, Temp02, Temp10, Temp11, Temp12;
 double CosTemp = cos(Angle), SinTemp = sin(Angle);

 /* Calculate the new values of the four affected matrix entries */
 Temp00 = CosTemp*XformToChange[0][0]+ -SinTemp*XformToChange[1][0];
 Temp01 = CosTemp*XformToChange[0][1]+ -SinTemp*XformToChange[1][1];
 Temp02 = CosTemp*XformToChange[0][2]+ -SinTemp*XformToChange[1][2];
 Temp10 = SinTemp*XformToChange[0][0]+ CosTemp*XformToChange[1][0];
 Temp11 = SinTemp*XformToChange[0][1]+ CosTemp*XformToChange[1][1];
 Temp12 = SinTemp*XformToChange[0][2]+ CosTemp*XformToChange[1][2];
 /* Put the results back into XformToChange */
 XformToChange[0][0] = Temp00; XformToChange[0][1] = Temp01;
 XformToChange[0][2] = Temp02; XformToChange[1][0] = Temp10;
 XformToChange[1][1] = Temp11; XformToChange[1][2] = Temp12;
}







[LISTING FIVE]


/* POLYGON.H: Header file for polygon-filling code, also includes a number of
useful items for 3D animation. */

#define MAX_POLY_LENGTH 4 /* four vertices is the max per poly */
#define SCREEN_WIDTH 320
#define SCREEN_HEIGHT 240
#define PAGE0_START_OFFSET 0
#define PAGE1_START_OFFSET (((long)SCREEN_HEIGHT*SCREEN_WIDTH)/4)
/* Ratio: distance from viewpoint to projection plane / width of projection
plane. Defines the width of the field of view. Lower absolute values = wider
fields of view; higher values = narrower */
#define PROJECTION_RATIO -2.0 /* negative because visible Z
 coordinates are negative */
/* Draws the polygon described by the point list PointList in color Color with
all vertices offset by (X,Y) */
#define DRAW_POLYGON(PointList,NumPoints,Color,X,Y) \
 Polygon.Length = NumPoints; Polygon.PointPtr = PointList; \
 FillConvexPolygon(&Polygon, Color, X, Y);

/* Describes a single 2D point */
struct Point {
 int X; /* X coordinate */
 int Y; /* Y coordinate */
};
/* Describes a single 3D point in homogeneous coordinates */
struct Point3 {
 double X; /* X coordinate */
 double Y; /* Y coordinate */
 double Z; /* Z coordinate */
 double W;
};
/* Describes a series of points (used to store a list of vertices that
describe a polygon; each vertex is assumed to connect to the two adjacent
vertices, and the last vertex is assumed to connect to the first) */
struct PointListHeader {
 int Length; /* # of points */
 struct Point * PointPtr; /* pointer to list of points */
};
/* Describes beginning and ending X coordinates of a single horizontal line */
struct HLine {
 int XStart; /* X coordinate of leftmost pixel in line */
 int XEnd; /* X coordinate of rightmost pixel in line */
};
/* Describes a Length-long series of horizontal lines, all assumed to be on
contiguous scan lines starting at YStart and proceeding downward (describes
a scan-converted polygon to low-level hardware-dependent drawing code) */
struct HLineList {
 int Length; /* # of horizontal lines */
 int YStart; /* Y coordinate of topmost line */
 struct HLine * HLinePtr; /* pointer to list of horz lines */

};
struct Rect { int Left, Top, Right, Bottom; };
/* Structure describing one face of an object (one polygon) */
struct Face {
 int * VertNums; /* pointer to vertex ptrs */
 int NumVerts; /* # of vertices */
 int Color; /* polygon color */
};
/* Structure describing an object */
struct Object {
 int NumVerts;
 struct Point3 * VertexList;
 struct Point3 * XformedVertexList;
 struct Point3 * ProjectedVertexList;
 struct Point * ScreenVertexList;
 int NumFaces;
 struct Face * FaceList;
};




[LISTING SIX]

extern void XformVec(double Xform[4][4], double * SourceVec,
 double * DestVec);
extern void ConcatXforms(double SourceXform1[4][4],
 double SourceXform2[4][4], double DestXform[4][4]);
extern void XformAndProjectPoly(double Xform[4][4],
 struct Point3 * Poly, int PolyLength, int Color);
extern int FillConvexPolygon(struct PointListHeader *, int, int, int);
extern void Set320x240Mode(void);
extern void ShowPage(unsigned int StartOffset);
extern void FillRectangleX(int StartX, int StartY, int EndX,
 int EndY, unsigned int PageBase, int Color);
extern void XformAndProjectPoints(double Xform[4][4],
 struct Object * ObjectToXform);
extern void DrawVisibleFaces(struct Object * ObjectToXform);
extern void AppendRotationX(double XformToChange[4][4], double Angle);
extern void AppendRotationY(double XformToChange[4][4], double Angle);
extern void AppendRotationZ(double XformToChange[4][4], double Angle);
extern int DisplayedPage, NonDisplayedPage;
extern struct Rect EraseRect[];



















February, 1992
PROGRAMMER'S BOOKSHELF


10 lbs. of Data in a 5-lb. Bag




Andrew Schulman


About a year ago, I purchased a highly regarded textbook on text compression
and was pleased to find, among the mathematical formulas, some source code for
a data-compression program. While typing in the code, I marvelled at how
something so seemingly abstract as information theory could yield something as
tangible as reduced hard-disk consumption, shorter transmission times, and
lower online fees. The program was fairly small, and my speculations on the
almost unreasonable effectiveness of information theory were put on hold while
I compiled and ran the program.
Well, the resulting data-compression program turned out to have an interesting
property: Much as a failed alchemist might turn gold into lead, it made all
files larger; the accompanying decompression program (yeah, I typed that in
too) made them smaller. You can probably appreciate the fact that I have since
learned to be more cautious when purchasing books on this subject. Data
compression really does have a crucial theoretical foundation, making a
pleasant contrast to the ad hoc mess that exists in most other areas of
computing. But as the compression program that makes everything larger shows,
this theoretical foundation is not always exploited in the most practical or
productive way.


Mark Nelson to the Rescue!


Nelson's The Data Compression Book now stands as the best all-around book on
this subject, providing both the crucial mathematical foundations of
information theory and genuine, working C code that actually compresses.
Nelson, vice president of development for Greenleaf Software, takes the reader
through a variety of data compression techniques, following their historical
progression. First, the book covers statistics-based techniques, showing the
evolution from Huffman's 1952 landmark "method for the construction of
minimum-redundancy codes," to the need for adaptive coding, to arithmetic
coding, which answered the need for coding with a nonintegral number of bits.
The next three chapters deal with dictionary-based compression, which began
with Ziv and Lempel's universal algorithm paper from 1977, and some variety of
which is used in most contemporary compression software, such as PKZIP, LHArc,
and the mighty Stacker. Next are two chapters on "lossy" compression, without
dedicated hardware, of sound and graphics. Each technique i demonstrated with
a detailed walkthrough of working C code; the code is also provided on the two
accompanying disks.
One of the key points to emerge from the book is the importance of the model
used for compression. Data compression, Nelson points out, consists not of
mere "coding," that is, of deciding how best to represent symbols based on the
probability of their appearance in a stream of data, but, equally important,
of "modeling," which is what determines the probability in the first place. A
good example of what Nelson means by "model" is the difference between viewing
a file as a collection of characters on the one hand, and viewing it as a
collection of pairs or triplets of characters on the other.
As discussed in Kas Thomas's article, "Entropy" (DDJ, February 1991), the
minimum number of bits required to represent a symbol, based on the
probability of its appearance, is known as its entropy. But Nelson makes the
important point that
...unlike the thermodynamic measure of entropy, we can use no absolute number
for the information content of a given message. The problem is that when we
calculate entropy, we use a number that gives us the probability of a given
symbol. The probability figure we use is actually the probability for a given
model, not an absolute number. If we change the model, the probability will
change with it.... This seemingly unstable notion of a character's probability
proves troublesome to many people. They prefer that a character have a fixed
"true" probability.
But, in dictionary-based compression in particular, the model chosen is all
important. Another interesting point is that the history of data compression
techniques relates closely to the hardware available at any given time.
Abstract algorithms were discovered at just about the time it became practical
to implement them, given typical memory availability and CPU speeds. The
genesis of dictionary-based techniques in 1977 is a good example.
While walking through the entire history of data compression, Nelson takes the
reader all the way to the present, covering the despised Unisys LZW patent,
ARC, PKZIP, Yoshi's LHArc, Stac Electronics' QIC-122 compression standard, and
so on. While some of this material has appeared previously in Nelson's DDJ
articles, almost all of it is new.
For the many programs that come with the book, Nelson has built an extremely
nice foundation library, into which he plugs the different data compression
and decompression routines. At a time when you see so much repetitive code in
programming books, with each program differing by only a few lines from the
preceding one (books on Windows programming are particularly bad in this way),
it's nice to see Nelson's book practicing "compression" at this conceptual
level. This is truly "minimum redundancy coding!"
I was a little surprised, though, by the book's lack of emphasis on speed of
compression and decompression. This seems an important subject. The run time
for some of the programs could have been halved simply by using the C
setvbuf() function to read and write more data at a time. Also, the BIT-IO
module seems somewhat inefficient; running the compression programs under a
profiler indicates that almost all time is spent here, rather than in the
compression routines themselves. BIT-IO uses a "rack" in which bits are stored
until a complete byte can be written out to a file; I suspect that writing out
a 4-byte long instead would improve the performance. But it is probably unfair
to complain about this, in what is really a wonderfully enjoyable book on
programming. Even someone who isn't a C programmer will probably get something
from this inside look at how programs such as PKZIP and Stacker operate.


Silicon Dreams


Nelson notes that "data compression is perhaps the fundamental expression of
Information Theory." There is something paradoxical about information theory,
because meaningless gibberish has greater entropy, that is, is worth more
bits, than the relatively structured gibberish that I'm writing now.
Information theory often comes under attack for neglecting "meaning." For
example, the brilliant art critic Rudolf Arnheim, in his Entropy and Art,
complains of "the information theorist, who persists in ignoring structure."
But the proof of the pudding is in the eating: Completely random banging on
keys does not compress as much as the text file I'm writing. The existence of
data compression programs, and of efficient programs that not only detect but
correct errors, shows that Claude Shannon was, to put it mildly, on to
something.
There are, of course, many good books on the subject of information theory.
Shannon's original paper from 1948, together with an essay by Warren Weaver,
is still in print as The Mathematical Theory of Communication, and is
amazingly readable and relevant after over 40 years.
A well-known popular account is John Pierce's Introduction to Information
Theory: Symbols, Signals, and Noise. A really popular account is Jeremy
Campbell's Grammatical Man: Information, Entropy, Language, and Life. The
problem with these popular surveys is that they bite off more than they can
chew: Much as critics of information theory demand that it somehow take into
account "meaning," the authors of these popular works on information theory
try to make it explain, in the course of 300 pages, life, the universe, and
everything. This in spite of Shannon's own warning, in a 1956 article titled
"The Bandwagon" (quoted in Campbell's Grammatical Man), that information
theory "has perhaps ballooned to an importance beyond its actual
accomplishments.... Seldom do more than a handful of nature's secrets give way
at one time."
Robert Lucky's recent popular explanation of information theory, Silicon
Dreams, does a wonderful job precisely because--like information theory itself
-- it does not try to explain everything. The mere fact that data can be
compressed is, when you think about it, mysterious enough without also
dragging in any possible connection between this and the ultimate heat death
of the universe. In contrast to other books on information theory for a
general audience, Lucky sticks solidly to subjects such as Shannon's theorem,
the nature of text, text input and output, speech, speech processing,
graphics, and graphics processing.
Those, in fact, are the key chapters in Lucky's book. For example, there is a
solid 50-page chapter on text, which includes in-depth looks at the
statistical properties of English and at text compression. The chapter on text
input has a wonderful discussion of the nature of typing and of keyboards.
(The persistence of the QWERTY keyboard against innovations such as Dvorak's
might serve as a lesson for anyone who wants to replace an intrenched mess
with a brand-spanking-new innovation.) Another 50-page chapter discusses
speech processing, including coding, synthesis, and recognition. A chapter on
picture processing looks in depth at the storage, generation, processing, and
coding of graphics.
Most impressive in all this is that Lucky, executive director of research at
AT&T Bell Laboratories, has managed to discuss all this in a book that could
easily be understood by one's non-computer spouse. Basically, Silicon Dreams
is a very readable survey, for a general audience, of signal processing, the
Fourier transform, Lempel-Ziv, Huffman coding, linear predictive coding, and
other topics you probably wouldn't expect to find discussed outside a
for-professionals-only text. Anyone who wants to understand the theoretical
foundation that makes possible software such as PKZIP and Stacker will want to
read Silicon Dreams. Even professionals who already know about some of these
topics will benefit from reading the book, because it will probably teach them
some things they didn't already know, and it will certainly help put what they
do know into a wider context.
Lucky's chapters on sound and graphics--like the code-intensive chapters on
the subject in Nelson's data compression books -- are important to anyone
interested in multimedia. I particularly recommend Lucky's account of the
failure of Picturephone, which seems to me to say something about multimedia
today.


The Entropy Problem


As I've said, Lucky concentrates on the mysterious-enough hard facts of
information theory, and leaves alone the more nebulous questions in which
others have become mired. This includes the ultimate question, which is what,
if anything, the entropy of information theory has to do with the entropy of
Clausius's second law of thermodynamics. Of course, they may have nothing to
do with each other; it is reported that John von Neumann suggested that
Shannon call his measure "entropy," because "no one knows what entropy is, so
in a debate you will always have the advantage."
In addition, Lucky notes that the striking similarity of the formulas involved
may reflect little more than the fact that "any analysis of order and disorder
would result in similar logarithmic equations." In any case, he writes, "such
philosophical arguments might be made appropriately late at night over a
bottle of wine." I've decided to take his advice literally. It's late at
night, and I've been into the wine, so I feel fully equipped to talk about our
final book this month, Maxwell's Demon. This is a collection of 25 papers on
the link between information theory and the second law of thermodynamics, as
they relate to a mythical being, Maxwell's demon, who uses observation alone
to defy the second law. The general idea is that, if the type of entropy one
might read about in a book on data compression is the same as the entropy that
Clausius said tends to a maximum, then the demon is consistent with the second
law.
Some of the flavor of these papers comes out in a remark in one of them,
Charles Bennett's "Notes on the history of reversible computation," which
refers to "the realization that one bit of information is somehow equivalent
to k ln 2 units of entropy, or about 2.3 X 10{24} cal/Kelvin." This is a
remarkable realization!
Other papers included in Maxwell's Demon are Szilard's 1929 classic "On the
decrease of entropy in a thermodynamic system by the intervention of
intelligent beings," Landauer's "Computation: a fundamental physical view" and
"Irreversibility and heat generation in the computing process," and Bennett's
"The thermodynamics of computation."
Surprisingly, one viewpoint not well represented here is that of Edward
Fredkin, who is concerned, not with the physical basis of computation, but
with--get this--the computational basis of physical reality. The world, in
other words, is a computer, and reality is digital, not analog. (I wonder what
size hard disk it takes. Will it run Stacker?) Fredkin's reversible logic
gates and billiard-ball computer are discussed in some of the papers in
Maxwell's Demon, but for his wider views you might enjoy the biographical
sketch of Fredkin (who was reportedly the model for the character Professor
Steven Falken in the 1983 movie War Games) in Robert Wright's Three Scientists
and Their Gods. It goes very nicely with the Chianti.











February, 1992
OF INTEREST





CodeTAP, a source-level, runtime debugger for embedded systems using Intel's
8OC186/C188EA, XL, EB, and EC microprocessors, has been released by Applied
Microsystems. CodeTAP provides a transparent window into the internal
functioning of the processor for runtime debugging in the target environment.
It requires no target memory space, I/0 ports, or interrupts, and quickly
plugs into the target.
The components that make up CodeTAP are a Target Access Probe, an RS232
Communications Adapter, and the VALIDATE/Soft-Scope III windowed source-and
assembly-level debugger. CodeTAP supports Intel, Microsoft, and Microtec
Research C compilers and provides access to high-level data structures,
arrays, and dynamic variables. It also provides eight hardware breakpoints,
and software breakpoints are unlimited. You can single-step or operate at full
clock speed up to 20 MHz with no wait states.
CodeTAP is available for PC hosts and costs $5995. Reader service no. 20.
Applied Microsystems Corp. 5020 148th Ave. NE Redmond, WA 98073-9702
800-426-3925 or 206-882-2000
Now available from DiagSoft is QAPlus/WIN, diagnostic software for systems
running Windows 3. QAPlus/WIN provides tools for analyzing, testing, and
tuning Windows configurations for optimum system performance.
You can view and edit files through more than 40 screens. The Windows section
provides Windows-related configuration and setup information and allows you to
edit specific files. QAPlus/WIN also includes a context-sensitive help system.
To test the entire system, you click on one of ten icons on a master test
screen, selecting the motherboard, multimedia components, fax/modem cards,
hard or floppy disk drives, keyboard, COM ports, printers, mouse, joystick,
and so on. QAPlus/WIN follows a default or user-scripted series of pass/fail
checks on each system component, then generates a summary of the errors
encountered.
QAPlus/WIN retails for $159.95. Reader service no. 21.
DiagSoft Inc. 5615 Scotts Valley Drive, #140 Scotts Valley, CA 95066
408-438-8247
Open has released Aspect, software that allows a single application source to
work with Motif, Open Look, Windows, and Macintosh. It also features a
windowing system for character terminals that emulates a graphical user
interface.
Aspect does not simulate look and feel, but instead works directly with the
native toolbox. It provides a C-callable Application Programming Interface and
an Interactive Design Tool (IDT) used to graphically design the user
interface. The IDT and its resource database also let you delay binding parts
of the user interface until run time, making it possible to customize the
application's user interface without recompilation.
We spoke with a company that is using Aspect to build an interface for their
UNIX product. They were impressed with the intuitive editor for the resource
definition language, which they claimed boosted programmer productivity
without compromising functionality. In addition, they believe Aspect has the
most refined event-driven, object-oriented architecture for an open, portable
API.
Aspect for Motif is shipping on several UNIX platforms; additional platforms
and models will follow soon. Prices start at $795. Reader service no. 22.
Open Inc. 655 Southpointe Ct., Suite 200 Colorado Springs, CO 80906
719-527-9700
Two new C++ class libraries are now available from Rogue Wave Matrix.h++ and
Linpack.h++. They extend C++ to include numerical algorithms previously
available only in Fortran.
Both libraries are compatible with Rogue Wave's other C++ class libraries,
Tools.h++ and Math.h++. Matrix.h++ includes all the functionality of Math.h++:
general matrices, vectors, statistics, complex numbers, Fast Fourier
Transformation, and so on. It adds specialized matrix classes such as banded,
symmetric, positive-definite, Heritian, and triagonal, while taking advantage
of Math.h++'s optimized low-level assembly routines.
Linpack.h++ includes all of Matrix.h++ and all the functionality of the
original Fortran version: solutions of systems of equations for a variety of
matrix types, solutions of over- and underdetermined systems of equations,
incremental least squares solvers, and more.
Prices start at $199 for Matrix.h++ and $299 for Linpack.h++. Reader service
no. 23.
Rogue Wave Software Inc. 1325 NW 9th St. Corvallis, OR 97330 503-757-2311
The Xenix 386 version of C4, Axon Development's object-oriented programming
environment, has just been released. C4 combines the fast application
development and prototyping facilities of a 4GL with the efficient packaging
and reusability of object-oriented code.
C4 can be used as a low-level, general-purpose language as well as a
high-level, application-oriented development system. Prebuilt building blocks
make it easy to include advanced user-interface features such as mouse
support, pop-up menus, dialog boxes, cut-and-paste, hypertext, file browsing,
and ad hoc reporting. C4 supports incremental development: You can quickly
build working prototypes, complete with menus, file access screens, reports,
and so on. Easy development is further enhanced by the interactive programming
environment, "instant" compiles, and flexible database features, including
variable-length untyped fields and the ability to add fields at any time.
The Xenix version is compatible with MS-DOS versions and costs $1295. Reader
service no. 24.
Axon Development Corp. 102-294 Venture Crescent Saskatoon, Saskatchewan S7K
6M1 Canada 306-652-8202
Franklin Software is shipping RTX51, a real-time, multitasking operating
system adapted to the I/O and memory characteristics of the 8051 family.
RTX51's functions include preemptive and non-preemptive task scheduling, task
prioritization, intertask communication, multiple-event management, internal
and external ROM and RAM management, and access to all 8051 hardware,
including timers and serial I/O.
RTX51 performs round-robin and preemptive task switching for up to 256 tasks
and up to four scheduling priorities. An OS Wait function suspends task
processing and waits for an interrupt, time-out, or message from a task or
interrupt. Communications capabilities include a BITBUS interface available as
an RTX51 task, allowing quick and easy design of microcontroller networks.
RTX51 uses Franklin C51 compiler features such as register parameter passing,
reentrancy, typed pointers, and mixed-memory model capabilities that reduce
code size and increase execution speed. RTX51 includes the BL51 Linker, which
checks boundary conditions and creates task description tables.
An abbreviated version, RTXTINY, is available to run on single-chip 8051s
without xdata memory requirements.
RTX51, including the BL51 Banking Linker and source code for RTXTINY, costs
$1995; RTXTINY alone sells for $995. Reader service no. 25.
Franklin Software Inc. 888 Saratoga Ave., #2 San Jose, CA 95129 408-296-8051
Tom Swan's latest book, Turbo Pascal for Windows 3.0 Programming, has been
published by Bantam. This is a guide to developing Windows 3 applications
using the latest version of the Turbo Pascal compiler and is endorsed by
Borland. The book includes detailed examples of event-driven programming code
and gives insight into accessing Windows 3 features from the Turbo Pascal
resource toolkit, using the Object Windows library, and controlling the
Graphics Device Interface. Turbo Pascal for Windows 3.0 Programming has 768
pages and costs $29.95 (ISBN # 0553-35293-8). Reader service no. 26.
Bantam Electronic Publishing 666 Fifth Ave. New York, NY 10103 212-492-9545
Interactive has released version 3.0 of its UNIX System V/386, Release 3.2
operating system. The new version has a redesigned user interface, additional
hardware support, performance improvements, and internationalization support
and retains compliance with ANSI C, USL SVID, POSIX 1003.1, and FIPS 151-1
standards.
The user interface features pull-down menus, pop-up dialog boxes and forms,
and context-sensitive help. It is consistent throughout system installation,
kernel configuration, disk copying, printer configuration, application
installation, and addition of users. Furthermore, the Kernel Configuration
package has been rewritten to be modular and extendable.
The International Supplement allows vendors to display their applications in
the appropriate language for each environment--there's no need for a separate
copy of the application for each language. The supplement contains over 30
popular UNIX system utilities so that users can, for example, sort text files
using their local dictionary. Also included is a guide with all reference
information on internationalization features.
The operating system has been optimized in the following manner: The High
Performance Device Driver supports up to six controllers/host-board adapters
and up to 32 devices connected to each controller/adapter; the Fast File
System provides performance improvement when mounting and unmounting large
file systems; the Very Fast File System has been added for high-speed
sequential reading and writing of very large files; and a CD-ROM File System
lets you read files on a CD-ROM as if they were part of the Interactive UNIX
file system.
Single-user systems cost $495; multi-user extensions are $400. Reader service
no. 27.
Interactive 2401 Colorado Ave., 3rd Floor Santa Monica, CA 90404 213-453-8649
The Modula Collection, a module library for Sun's Modula-2 compiler, is
available from Odegard Labs. It includes high-level modules with comprehensive
support for streams; multiprogramming, including lightweight processes;
networking; static- and dynamic-length strings; and abstract data types such
as bags, sets, stacks, queues, sequences, and dictionaries. There is a
comprehensive interface to UNIX system calls and data structures and a manual
that includes module dependency diagrams.
The Modula Collection runs under SunOS 4.1.1 and requires Sun Modula-2,
Version 2.3. $2000 buys a one-three user license; additional users cost $500
each. Reader service no. 28.
Odegard Labs Inc. 100 Bush St., Suite 625 San Francisco, CA 94104 415-434-4242
X-arRAY is the new Fortran callable library for 386/486 PCs from Davis
Associate . With it, you can allocate, access, and manipulate up to a gigabyte
of extended memory using Microsoft Fortran under MS-DOS. X-arRAY does not
require DOS extenders, 32-bit compilers, MS-Windows, or flat memory model
operating systems.
X-arRAY allows you to move DOS files or data in conventional memory to and
from extended memory; pass data arrays between independent job steps through
extended memory; obtain extended memory allocation analysis using a new
external command; operate directly on data arrays in extended memory using
X-arRAY primitives; and create arbitrary functions using combinations of
X-arRAY primitives.
X-arRAY also affords automatic selection of an appropriate extended-memory
allocation method and keyboard interrupts for status reports during protected
mode operation.
Dr. Barton Lane of Plasma Dynamics in Belmont, Mass. told DDJ that he believes
X-arRAY to be the only way to efficiently access extended memory from
Microsoft Fortran. "It gives the clearest explanation of allocation methods
for extended memory that I've seen," he added.
X-arRAY's introductory price is $49.50; the regular price will be $99. Reader
service no. 29.
Davis Associates Inc. 43 Holden Road West Newton, MA 02165 617-244-1450
Driver 488/WIN from IOtech is an IEEE 488 driver in the form of a dynamic link
library for integrating IEEE 488 instrument control into Windows applications.
Driver488/WIN allows multiple IEEE 488 tasks to simultaneously access the same
IEEE 488 interface board without confusing it with each instrument.
Asynchronous events are dealt with according to Windows' event-handling
system.
Driver488/WIN supports Microsoft C, QuickC, and Visual Basic and Borland's C,
C++, and Pascal products. It includes WINtest, an application that creates
code and tests command lines and can run concurrently with other editors and
development systems. Driver488/WIN can be configured to control IOtech's 8-
and 16-bit IEEE 488.2 interface boards for PC, AT, and EISA bus computers.
Driver488/WIN is available with the Personal488/WIN package for $395; with
Personal488AT/WIN for $495; or separately, for $195. Reader service no. 30.
IOtech Inc. 25971 Cannon Road Cleveland, OH 44146 216-439-4091
Decos/Graphics is a graphical window management system in a C library from
Decos Software Engineering. Decos/Graphics features a complete window
management system; dialogs containing list boxes, push buttons, radio buttons,
edit fields, and scroll bars; icons; font control; a text editor to link to
your programs; drawing functions; and more.
Decos/Graphics is for DOS machines and costs $795. A separate font and icon
editor is $195. Reader service no. 31.

Decos Software Engineering Inc. 115 East Boca Raton Road Boca Raton, FL 33432
407-367-0407





























































February, 1992
SWAINE'S FLAMES


Usability Testing




Michael Swaine


Microsoft has a lab for it; so does Apple. IBM probably has several (one for
each of the breakaway republics). Lotus and Tandy and Borland send theirs out.
It's not fundamentally new, this business of testing software for usability.
Human-factors research is the basis of usability testing, and that's been
around longer than personal computers. Usability testing is human-factors
research swallowed up by the voracious software development cycle.
But it's not even new as a snack for that time- and energy-swallowing
appetite.


What It Was


The current trend is the second era of personal computer software usability
testing. There was an earlier, now legendary, era in personal computing when
usability testing was used to be sure that the target user could actually use
the product. That was back in the early days of personal computing, or
microcomputing, back when the user looked and talked and thought just like the
developer because the user was a developer. Or a hacker. Or a hobbyist. Or an
enthusiast. Or, like the customers I dealt with when I did service calls for a
computer store back in 1979, a pioneer who, for obscure reasons, was willing
to work with a few arrows sticking out of his or her back.
During that wild and innocent era, usability testing meant passing the product
around among your friends. It was beta testing, to give it its proper Greek
letter; there was no important difference then between beta testing, usability
testing, and showing the product to your pals (that is, to those pals who were
in the know).
That era ended in 1981, but for the most part the entire personal computer
software industry has continued to rely way too long on beta testing and the
feedback of insiders who are not the target audience for its products.


What It Is


Usability testing means letting the target users try the software, monitoring
their behavior, analyzing their comments, and setting them tasks that test
specific aspects of the ease of use and naturalness of the user interface.
Psychologists in lab coats watching their subjects through one-way glass.
But Apple's human interface guy, Bruce "Tog" Tognazzini, says it doesn't all
have to be that formal. In the October 1991 Apple developer newspaper, Apple
Direct, he presents a case study of a very casual usability test. Although Tog
has the lab-coated testers available to him, he knows that most developers
don't have Apple's resources. "I believe that you can do your own initial
testing if you can overcome your own biases, and if you can find a group of
dear friends who will support you." One bias that Tog may have had to
overcome: One of his dear friends pointed out that the Macintosh control Tog
had defined shouldn't be incompatible with the behavior of a similar control
in Microsoft Windows.
Informal usability testing is only part of the process, though. It can
identify problems, but it can't identify the absence of them. For that you
need a more formal testing procedure, and a representative sample of your
target users.


What It Ain't


Here's an example of what not to do with usability testing.
Give ten users several tasks to perform and five graphical user interfaces to
work in. Don't bother to make the tasks reasonable ones for the GUIs involved,
and be sure to define the tasks in computer terms, rather than work-related
terms: "Change the name of an icon" for example, or "eject a disk without
using the mouse," rather than "compute your Federal income tax."
Most readers don't read manuals anyway, so don't give them any.
Weight your user group heavily in favor of DOS users so that judgments
regarding usability can be confounded with perceptions of similarity to DOS.
Don't use the results to identify concrete problems with the GUIs, but collect
the users' gut reactions. You're looking for insights such as this one: "For
some reason I liked Windows, but I don't know if it was any easier."
Publish the results in a computer magazine, claiming to "find out--once and
for all--which graphical user interface is easiest to use."
The computer magazine that recently did all of the above probably should have
run the idea through a usability lab first.


















March, 1992
March, 1992
EDITORIAL


Catching Up is Hard to Do




Jonathan Erickson


Last month, I discussed two separate, but tightly coupled, telecommunications
issues: 1. What seemed to me to be Southwestern Bell's attempts to
differentiate--and charge accordingly--between voice and data traveling across
the phone lines; and 2. the phone company's efforts to charge BBS operators
business rates, not residential rates, for BBSs running out of their homes.
Southwestern Bell hoped to get its online cash registers ringing by filing a
petition with the Missouri Public Services Commission to phase out
"information terminal services," while phasing in more stringent
residential/business rate guidelines. I wasn't alone in interpreting the phone
company's action in this way. Martha Hogerty, public counsel representing
Missouri consumers, told the Kansas City Star, "This looks like anybody with a
modem would have to be on a business rate." A few days after we went to press
with our February issue, however, Southwestern Bell withdrew the petition,
citing consumer confusion; BBS operators say it was because of public outcry.
In any event, both issues have been put on hold, although they're far from
resolved.
In a related matter, the Regional Bell Operating Companies (RBOCs) have
launched newspaper and radio ads professing that if they can't pursue new
nonbasic telephone ventures (such as information services), the health and
well-being of children in rural Montana will be in jeopardy. (Okay, the gist
of the ad is that a naked baby, suffering from a rare blood disease, can't get
up-to-date medical care in Montana, but can be saved by a RBOC-developed
interactive information system that will enable specialists in Pittsburgh to
supervise the work of GPs in Montana. Of course, the ad ignores alternative
technological breakthroughs--such as airplanes that could fly the child to
Pittsburgh.)
If you really want evidence of the RBOC's benevolence, check with St. Louis
resident Barbara Clements. Barbara, who has cerebral palsy, runs a BBS that's
become her chief means of communicating with the world beyond her walls. "Six
years ago, before I got my modem," she told the St. Louis Post-Dispatch, "I
was a total hermit." By charging business rates instead of residential rates
for her BBS, Southwestern Bell will force Barbara to shut down the system,
isolating her once again. (Note that the RBOC's Montana infant is allegorical;
Barbara isn't.)


The Language Without a Name Gets a Name


In his July and August 1991 "Programming Paradigms" columns, Michael Swaine
presented an interview with Bob Jervis, developer of Wizard C, the precursor
to Borland's Turbo C. In their conversation, Jervis briefly discussed his new
C-like language. Michael entitled the columns "The Language Without a Name"
because at the time, Jervis hadn't settled on a name for it. He now has.
Jervis calls the language "Parasol," short for "Parallel Systems Object
Language," and he's written about it in a multipart article in the December
1991 Journal of C Language Translation (2051 Swans Neck Way, Reston, VA 22091,
phone 703-860-0091 or Internet jct@aussie.com).


The FCC Makes its Proposal


In September 1991, I wrote about upcoming changes in the allocation of the
radio spectrum. (See also the letter on page 12 of this issue from Henry
Crawford, who attended the FCC hearings.)
The most recent development is that the FCC has proposed changes in how the
spectrum might be reallocated. In the proposal, new spectrum users would buy
spectrum space from existing users as well as paying the costs of existing
users moving to new frequencies or wired networks.
The $64 question is, which frequencies will be reallocated--and who gets them.
The FCC is proposing that the existing users be from the public service
sector--police and fire departments, electric utilities, railroads, and the
like. Presumably, these agencies would move to higher-frequency, shorter-range
spectrum areas.
In any event, the FCC will be receiving comments over the next few months
before drafting its final rule. As reader Crawford says, now is your chance to
have a say in shaping the future. The next opportunity may be a long time
coming.


Grace Hopper, R.I.P.


And finally, the New Year was scarcely underway when we got the sad news that
computer science pioneer Grace Hopper passed away. During her long career,
Hopper was known as "the first lady of software," "Amazing Grace," and (more
formally) "Rear Admiral Hopper."
Adm. Hopper, who is credited as being the codeveloper of COBOL, began her
programming career during World War II at Harvard, where she learned to code
on the Mark I, the first large-scale digital computer. She is also credited
with leading the team that coined the term "bug" to describe program bloopers.
(The first "bug" was, in fact, just that--a large moth that found its way into
the Mark I's circuitry.) The most recent of Hopper's many awards was a
National Medal of Technology bestowed on her by President Bush last fall.





















March, 1992
LETTERS







You Got It


Dear DDJ,
Thank you for printing long letters. Please continue.
Patrick J. Killips
Whitewater, Wisconsin


What Goes Around Might Come Around


Dear DDJ,
Jeff Duntemann's November 1991 "Structured Programming" column once again
demonstrates that those who ignore history are doomed to repeat it. In the
rush to espouse such fads as GUIs and OOP, it's too easy to forget that we
still lack even such amenities as well-constructed runtime libraries and
uniform subroutine calling conventions for the traditional languages.
As early as the mid-seventies, the DECsystem-10, under the TOPS-10 operating
system, offered uniform calling conventions. Parameters were passed by means
of an "argument block," the first entry of which was always the number of
parameters being passed. Each actual parameter was tagged with a type code. A
program written in Fortran, Cobol, compiling Basic, or assembly language could
therefore call a subroutine written in the same or any other of these
languages, and could examine the argument count and the type codes to
determine how it had been called. "Polymorphism" was therefore quite easy to
implement.
In contrast, programmers of today's microcomputers have to worry about such
details as the difference between C and Pascal calling conventions and string
representations, not to mention big-endian vs. little-endian representation of
data.
The runtime libraries for the DEC-10 were equally well designed.
Interdependence of library routines was minimized, and each library was
prefaced with a table of contents to speed searches by the linker. In
contrast, as Jeff observed, today's programmers of microcomputers too often
get the whole gorilla when all they want is the banana. I sometimes use
Nantucket's Clipper database language even when no database is involved, to
take advantage of its ability to build a data entry screen with a minimum of
effort. But my enthusiasm for this approach is somewhat dampened by one small
disadvantage: Although the smallest possible program, the null program, will
"compile" on my machine in one or two seconds, linking the resulting object
file requires over 30 seconds, and the .EXE file occupies nearly 160K. That
hardly seems a reasonable amount of overhead for a program which does nothing.
It is true of the traditional programming languages, no less than of OOP
techniques, that we are unlikely to see much progress towards such desiderata
as reusable code and portability until the industry recognizes that
conventions for intermodule communication, like those for interprocess
communication, belong with the operating system rather than with the
individual programming languages. Nor will we see much progress in reducing
object code size or increasing programmer productivity until language
designers recognize that a well-designed runtime library is as important as
the syntax of the language. And if all this were accomplished, might we find
that the tools required to realize the putative advantages of OOP have been
lying there all along, just waiting for an environment in which to use them?
Arpad Elo, Jr.
St. Johnsbury, Vermont


Working Fast in Modula-2


Dear DDJ,
I feel compelled to take exception to something Jeff Duntemann says in his
otherwise excellent and enjoyable October "Structured Programming" column,
"Sympathy on the Loss of One of Your Legs."
In the section entitled "Working Fast," after quite rightly stating that C is
hopeless for what he calls "lightning development," and going on to say that
until you have accumulated a truly tremendous high-level toolkit, Pascal is
not much better--something that could well be true, he then goes on to say
"Modula-2 is worse than both." This is simply not true. Nor is the reason that
he gives for saying so, namely that the tools aren't there and never will be.
In Europe, where working conditions are different and the demand for vertical
applications provides a living for numerous software developers, we have
members who are quite capable of turning out an intricate business-type
application in a week or two. Modula-2 has proved to be an excellent language
for lightning development and mad-dash gonzo programming, if only because it's
hard to make mistakes, bugs are usually quickly located, and reusable program
templates are easily established.
As for tools, I would be happy to ensure that any reader who writes to me
receives a copy of a catalog published in Britain which is entirely devoted to
Modula-2 tools.
As an example, I would mention that there is at least one application
generator, one I use myself, which allows a sophisticated application to be
built--most of it interactively--in less than a week, once you are up to
speed. Even adding on the cost of a compiler, it costs less than Clarion.
The choice of alternative tools may not be as great as it is with C, but they
are there. In any case, with Modula-2 it doesn't take long to write your own
tools as you need them!
Finally, I would like to mention that in his June 1991 article, "What's New
with Modula-2?" Kim King did not include our organization in the "Modula-2
Resource Guide: User Groups and Publications." So I would like to take this
opportunity to correct the omission and supply the missing details: BCS
Modula-2 Specialist Group, c/o The Secretary, 131 Carshalton Park Rd.,
Carshalton, Surrey, SM5 3SJ, United Kingdom.
Euan Hill
Surrey, England


Solving It Subconsciously


Dear DDJ,
In Ray Duncan's review of How to Solve It: A New Aspect of Mathematical Method
("Programmer's Bookshelf," January 1992) he brings up a most interesting
aspect of the mathematical (or any other) method when he raises the question
of the phenomenon of subconscious work. This is a topic that has intrigued
many. The talented and famous French mathematician, Henri Poincaire, was much
intrigued by this subject and wrote several essays on it. I am sure Polya must
have been familiar with these and I believe Ray is being unfair when he says,
"Polya opts out, however, when faced with one of the topics that most
intrigues me, the phenomenon of subconscious work." I do not believe Polya
opts out--rather that he believes that he has no more to offer on this subject
than Poincaire.
There is, I am certain, extensive modern literature on this subject but I also
must admit to being largely ignorant of it. There may well be some relation to
the (perhaps simpler) problem of how we pull something out of our memory,
though Poincaire certainly would not have made such a connection. In any
event, it is indeed an intriguing question. But so is "Why am I me?"
Morton F. Kaplon
Bethlehem, Pennsylvania


The FCC, PCS, and You



Dear DDJ,
I'm writing in regard to the September 1991 Editorial, "Radio Days, or Making
Waves on the Airways."
The Federal Communications Commission will soon decide whether to allocate
radio spectrum space for data exchanges between computer users. Thus, as a
communications lawyer and software designer, it was with great interest that I
attended a recent hearing at the Federal Communications Commission involving a
new technology called "Personal Communications Systems" (PCS). Although the
outcome of this proceeding could shape the direction of personal computing for
years to come, I was surprised to find that only Apple Computer presented oral
testimony on behalf of the computer industry at the hearing.
At stake in the FCC proceeding is whether or not spectrum will be dedicated
for the use of wireless computer networks called "Data-PCS." The proposal
filed by Apple (which was supported by IBM in papers filed with the FCC) seeks
to use this spectrum to build wireless computer networks of a "local area"
nature, about 50 meters in scope. If the proposal is granted, Apple will be
able to sell wireless computer LANs right out of the box. Costs associated
with Data-PCS, such as relocating present spectrum users, would be added onto
the price of the computers at a cost of about $10 per unit. The whole scheme
would be essentially unregulated since Apple proposes using a model similar to
the FCC's Part 15, which governs potential frequency interference by consumer
electrical appliances.
Apple should be applauded for its foresight in building a working relationship
with the FCC. However, its views are essentially those of a hardware
manufacturer. Data-PCS offers the far greater possibility of creating
wide-area wireless networks which would free computer users from the present
tyranny of telephonic data communications with its arcane interface and
inefficient cost structure. Empowering computer users with a wireless network
on a city-wide basis in a free and open manner is possible. Instead of an
unregulated Part 15 approach, network managers could be licensed and regulated
in accordance with the public interest in the same way that the FCC currently
regulates radio and television broadcasters. Such a networking system could
result in a greater competitiveness in the next century. At present, the major
players in PCS are equipment manufacturers and existing cellular telephone
companies that are trying to sell the FCC on the idea of personal telephones.
Since we already have a wired telephone network along with a cellular network
to provide voice communications, it would appear to be far more in the public
interest to establish a new service dedicated to providing personal computer
communications.
This FCC proceeding is something we, as computer users and software experts,
should become involved in. One of the panelists before the FCC suggested the
creation of a committee of users to communicate their needs and wants to the
FCC. Should such a committee be formed, computer users and software
specialists should become involved in this critical process. It is doubtful
that the FCC will ever again offer such an opportunity to the computer
community.
Henry E. Crawford
Washington, D.C.


The Clarion Man


Dear DDJ,
I read the October 1991 "Structured Programming" column on vertical markets
and was mentally suggesting the ideal tool as I read. Lo and behold, I turned
to page 152 and there it was: Clarion.
I have been a consultant in the industry for 17 years and am fluent in all
mainstream languages and database design. I have had my fair share of
Assembler, Cobol, C, Basic, Pascal, Modula-2, Actor, etc.--being basically a
tinkerer at heart. However, I fell over Clarion a couple of years back and
have had a successful love affair with it since that day. More especially so
in a "vertical market," where I have developed a package for bailiffs using 65
percent Designer, 35 percent sweat. It didn't all quite happen in your "two or
three days" but I would still be trying to decide whether to use Object
classes, Windows, TV, or develop it with JPI's Btree toolkit if Clarion hadn't
taken me roughly by the arm and thrust me into the land of actually earning $$
for your work.
I have had considerable experience with the package and agree with your
findings absolutely!! It is a pity that more people have not seen the light.
I have done quite a bit of "LEM" development in C and Assembler and (being a
user of JPI's products as well as Borland's) was thrilled to find out that
Release 3 of Clarion (due out in early 1992) will be using JPI's Topspeed
environment as its engine.
Given your comments, you will probably agree that the folks at Clarion
Software have come up with an awesome package. Tight compact code with smart
linking; able to access DOS and Windows DLLs; programming in the language of
your choice; preemptive multitasking; be Objective or be Obvious; ability to
use a whole bunch of off the shelf products. The mind boggles!
Brent Stock
Melbourne, Australia
Editor's note: Speaking of JPI and Clarion... The two companies recently
announced an intent to merge. In addition to the standard business coupling,
they intend on integrating JPI's optimizing code generator with Clarion's 4GL
language, claiming database applications will benefit from JPI's fast
execution and Clarion's small size and ease of maintenance.


Stringing You Along


Dear DDJ,
In Steve Teale's article, "Proposing a C++ String Class Standard" (October,
1991) he requests feedback on the program interface for a standard String
class. OK, here it is.
One of the strong points of C++ is the lessening of the global name space
pollution. Class member functions may overload a function name already defined
elsewhere in the system without fear of conflict. Which function is actually
called is based on the context of its usage (i.e., which types are being
operated upon). Unfortunately, the class names themselves occupy the global
name space. When you use an intuitive name to define a standard class, you run
the risk of collision with user-implemented classes that have already been
implemented with the same name. I would like to see a standard prefix on all
standard classes so that I could avoid those prefixes for any classes which I
define. As an example: CppString, or AnsiString.
Another technique that I and many others are using is to suffix type names
with _t, for example, the ANSI system type size_t. This helps considerably
with automated text searches using your favorite editor or utilities such as
grep. This would suggest that the String class be named something like
CppString_t. Yes, it's a mouthful to type, but the effort is worth it in the
long run. If you are really lazy, you can define a name substitution macro and
call it anything you like, such as String. Perhaps the standard should include
both a long form name and an include file that can optionally substitute a
standard short form name. (This would defeat the original intent.)
I have found that you need to be very careful when overloading operators. If
the use of an operator is not intuitive, it makes the reading of the code very
difficult. A typical bad example is the overloading of the * and % operators
to mean "dot product" and "cross product" for a vector class. In formulas that
involved mixed types, vectors, and scalars, for instance, the reader must
spend more time on knowing the types of each variable and mentally translating
the action of the operators than on following the flow of the algorithm. A
better approach is to implement dot() and cross() member functions.
In Steve's example
 String v = "abcd";
 v +=1; //result "bdce"
the action of the += operator is not intuitive. In the example
 String v = "1234567890";
 v << = 1; //result "2345678901"
the << = operator is being used to rotate the string left when the intuitive
use would be to shift the string left.
I have also found the assignment operators to be very powerful for increasing
efficiency because you don't need to create a temporary variable to hold the
results. In the String class the use of += to concatenate to a string is a
good example.
Where this breaks down is operations that cannot be intuitively expressed
using symbolic operators. For vector classes, if you used the ~ operator to
express the computation of the unit vector, it is then easy and consistent to
define ~= as the computation of the unit vector in place, which is more
efficient. This does not work for operations defined using member functions.
If unit vector was implemented as the uvec() member function, there is no
direct translation to an assignment member function. A possible implementation
would be to create a uvecAssign() member function that returns a reference to
the object.
In Steve's proposed standard he suggests the implementation of member
functions upper() and lower(). These are implemented as copying the original
string and then converting it to upper case or lower case characters, and then
returning the copy. This is good and this is necessary, but not always
efficient. I would propose two additional member functions: upperAssign() and
lowerAssign() to convert the case of the string in place.
My use of the Assign suffix is just an example, and I feel it is a little too
verbose, but I would like to see a proposal for a standard naming convention
that takes additional assignment functions into account.
Many of the operators were declared as being friends of the String class, as
shown in Example 1. Example 2 shows how the first two operators could have
been implemented as member operators of class String. I was wondering at the
reasoning behind Steve's choice.
Example 1

 friend int operator+
 (const String&, const String&)
 friend int operator+
 (const String&, const char*)
 friend in operator+
 (const char*, const String&)

Example 2

 int operator+ (const String&)
 int operator+ (const char*


The National Institute of Health class library (see Data Abstraction and
Object-oriented Programming in C++, by Gorlen, Orlow, and Plexico) includes
the definition of a SubString class. I have not looked closely at their
implementation and possible uses, but if we are going to define a standard
string class, and a substring class is useful, then they should be defined at
the same time.
I would like to thank Steve for an excellent article and bringing out proposed
standards into a public forum.
Carey Brown
Denver, Colorado


Getting Embed With C++


Dear DDJ,
I was happy to see the article "C++ for Embedded Systems," by Stuart Phillips
and Kevin Rowett in the October 1991 issue. I have been working on a similar
embedded project using C for sometime now and was just starting to think about
how it might be implemented in C++. Since our system is not currently running
under DOS, we share many of the problems mentioned in the article, such as
having to modify the startup module, not being able to use any library
routines which call DOS, converting EXE files, and so on. The article and the
source listings were informative and helpful, with one glaring exception. All
of the software relating to the article which I downloaded was in C, not C++.
There are still some major unanswered questions which I have with regards to
moving to C++ for embedded software. One thing I would really like to know is
how to deal with dynamic memory because any calls to standard library
functions like malloc() and free() require DOS. In C we can get around this by
using only static memory. In C++, dynamic memory seems to be a necessity since
objects are created and destroyed at runtime. I eagerly anticipate any future
articles on C++ for embedded systems which might address this question.
Will Knight
Los Gatos, California
Stuart and Kevin respond: Thank you for the idea for a future article! You
raise an excellent point regarding C++ and its dependence on dynamic memory
allocation. Allocation requests are made whenever objects are created, with
corresponding release of memory being made when objects are deleted or fall
out of scope. In our article we recommended careful review before using any of
the standard library routines, by either inspection of their object code,
using the debugger, or purchasing the library source code from Borland.
Borland C++, Version 2.0 allocates dynamic memory from the heap. Heap
initialization is performed by the start up code contained in C.ASM and does
not require DOS support; we modified the start up code to set the heap for our
communications processor. You will need to adapt this for your environment.
Both malloc() and free() use memory from the heap in processing dynamic
allocation requests. The C++ operators 'new' and 'delete' call malloc() and
free(), respectively. The library versions of malloc() and free() may need to
be replaced for embedded system work if there is any possibility of interrupt
service routines creating objects or needing to allocate memory. Memory
allocation requests generally cause manipulation of linked lists or chained
memory blocks. Interrupts that require memory allocation requests should be
disabled while the linked lists are searched or altered. Failure to disable
interrupts in this scenario will result in memory leaks or worse!
Replacing malloc() and free() is not an overwhelming task; there are many
examples of memory management routines in books on C and C++. Many of these
routines can be adapted for embedded system use by examining their source code
for critical sections which must be protected from interruption. You must
exercise caution when deciding in which areas to disable interrupts since the
performance of embedded systems is generally dependent on interrupt latency.
We'll certainly address this issue in any future articles we write.












































March, 1992
THE UCR STANDARD ASSEMBLY LANGUAGE LIBRARY


For 80x86 assembly language programmers


 This article contains the following executables: UCRASM.ARC


Randall Hyde


Randy is the designer of numerous hardware and software projects, including
assemblers. In addition to consulting, he is currently an instructor in
computer science at California Polytechnic University, Pomona and UC
Riverside. He can be contacted at 16217 Sunset Trail, Riverside, CA 92506.


Given the obvious benefits of space and speed, it's sad that assembly language
has such a bad reputation. Some people cite portability, others complain about
how hard it is to learn, but the real reason people don't write more code in
assembly language can be summed up in one word: effort. Assembly language will
never gain ground against high-level languages unless we can make it easier to
program.
There are many reasons why people feel that programming in assembly language
is much more difficult than programming in a high-level language such as C. A
few points, however, top the list:
Assembly language programmers usually "start from scratch" on new projects.
Good assembly language programming requires a completely new mind-set, and
most programmers are uncomfortable with the steep learning curve.
There is no standardized set of library routines supplied with all assemblers
(such as the standard C library). Commercial packages are available, but cost
often prevents their universal acceptance.
The University of California Riverside (UCR) Standard Library for 80x86
assembly language programmers is an attempt to overcome these problems. First,
the library provides a slate of important and useful high-level routines --
perfect for starting a new project. Second, the UCR StdLib routines mimic many
of the C standard library routines, so anyone who has invested time learning
the C standard library will feel right at home with the UCR StdLib. Finally,
the library -- including source code -- is free to anyone who wants it.
I've had quite a bit of success teaching students assembly language at the
University of California Riverside using this library. It definitely eases the
transition from a high-level language to assembly. Of course, the best
assembly language programs are not "C programs written with MOV instructions,"
but my experience shows that getting students to write real programs in
assembly language as soon as possible improves the quality of their education.
For professional assembly language programmers, the UCR Standard Library
package provides the opportunity to write code in assembly which would be too
kludgey in a high-level language (yes, assembly is more appropriate for many
applications), but require too much unwritten support code in a typical
assembly language application.


What's in the Library


While space restrictions prevent me from describing each entry in detail, a
few examples will give you an idea of the types of routines available in the
library. See Table 1, which lists the names and brief descriptions of all the
routines currently in the library. I must emphasize that current means as of
the writing of this article. There may be even more routines available by the
time you read this. The library is in a state of constant flux.
Table 1: A brief description of the routines available in the UCR StdLib
Package: The standard library routines are divided into nine distinct classes:
input, output, file I/O, conversions, utility, string, memory management,
character set, and floating point.

 Input Routines

 Getc Reads a single character from the (library) standard
 input.
 GetcStdIn Reads a single character from the DOS standard input.
 GetcBIOS Reads a single character from the keyboard by calling
 BIOS.

 SetInAdrs }
 GetInAdrs } Maintain pointers to the current input routine. By
 changing
 PushInAdrs } them around, you can redirect the input obtained
 through
 PopInAdrs } the Getc routine.

 Gets (m) Reads a string from the keyboard.
 Scanf Formatted input routine, similar to C.

 Output Routines

 Putc Writes a single character to the (library) standard
 output.
 PutCR Writes a CR/LF pair using Putc.
 PutcStdOut Calls DOS to print a character to the DOS standard
 output.
 PutcBIOS Calls BIOS to print a character to the display.


 GetOutAdrs, }
 SetOutAdrs, } Like the corresponding routines above, these
 routines
 PushOutAdrs, } maintain the stdlib standard output pointers.
 PopOutAdrs- }

 Puts Prints string pointed at by ES:DI.
 Puth Prints value in AL as two hex digits.
 Putw Prints value in AX as four hex digits.
 Puti Prints value in AX as signed decimal number.
 Putu Prints value in AX as unsigned decimal number.
 Putl Prints 32-bit value in DX:AX as signed decimal
 integer.
 Putul Prints 32-bit value in DX:AX as unsigned decimal
 integer.
 PutlSize Prints integer value in AX using CX print positions.
 PutUSize Prints unsigned value in AX using CX print positions.
 PutLSize Prints long integer value in DX:AX using CX print
 positions,
 PutULSize Prints unsigned long value in DX:AX using CX print
 positions,
 Print Prints zero-terminated literal string constant
 following the call.
 Printf Similar to the corresponding C routine.

 File I/O Routines

 Fcreate Creates (and opens for writing) a new file.
 ES:DI points at the name.
 FOpen Opens an existing file for reading or writing.
 ES:DI points at the name.
 FClose Closes a file.

 FReadOn, } Redirect the standard input routines so that they
 take
 FReadOff } their input from a file.
 FWriteOn, } Redirect the standard output routines so that they
 send
 FWriteOff } their output to a file.

 FSeek Moves the file pointer around in the file.
 DOSHandle Converts STDLIB file handle to DOS file handle.
 FDelete Deletes a file.
 FRename Renames a file.

 Conversion Routines

 Atol (2) Converts ASCII string to long integer returned in
 DX:AX.
 Atoul (2) Converts ASCII string to unsigned 32-bit value in
 DX:AX.
 Atou (2) Converts ASCII string to 16-bit unsigned integer in
 AX.
 Atoh (2) Converts ASCII string of hex digits to 16-bit value in
 AX.
 Atolh (2) Converts ASCII string of hex digits to 32-bit value
 in DX:AX.
 Atoi (2) Converts ASCII string to 16-bit signed integer in AX.


 Itoa (2,m) Converts signed integer value in AX to string.
 Utoa (2,m) Converts unsigned value in AX to string, similar to
 Itoa/2.
 Htoa (2,m) Converts binary value in AL to exactly two hexadecimal
 characters.
 Wtoa (2,m) Converts binary value in AX to exactly four
 hexadecimal characters.
 Ltoa (2,m) Converts 32-bit signed value in DX:AX to string, like
 Itoa.
 Ultoa (2,m) Converts 32-bit unsigned value in DX:AX to string.
 Sprintf (m) Performs in-core formatting operation, like its C
 counterpart.
 Sscanf Like the C sscanf routine.

 ToLower If character in AL is upper case, converts it to lower
 case.
 ToUpper Converts lower-case chars in AL to upper case.

 Utility Routines

 ISize On input, AX contains signed decimal integer. On
 output,
 AX contains number of print positions this integer
 would require (including sign).
 USize Returns output size of unsigned value in AX.
 LSize Returns output size of 32-bit signed value in DX:AX.
 ULSize Returns output size of 32-bit unsigned value in DX:AX.
 IsAINum Returns Z-flag set if character is alphanumeric.
 IsXDigit Returns Z-flag set if character is a hex digit.

 IsDigit Returns Z-flag set if character is a decimal digit.
 IsAlpha Returns Z-flag set if character is alphabetic.
 IsLower Returns Z-flag set if character is a lowercase
 alphabetic.
 IsUpper Returns Z-flag set if character is an uppercase
 alphabetic.

 String Routines

 Strcpy (I) Copies string from ES:DI to DX:SI (including 0 byte).
 StrDup (I) Copies string from ES:DI to newly allocate storage on
 heap. Returns the pointer to the new string in ES:DI.
 StrLen Computes the length of a string.
 Strcat (m, I, mI) Concatenates two strings.
 StrChr Searches for a character within a string.
 StrStr (I) Searches for a substring within a string.
 Strcmp (I) Compares two strings.
 Strupr (m) Converts all lowercase characters in string to upper
 case.
 Strlwr (m) Converts all uppercase characters in string to lower
 case.
 Strset (m) Overwrites all characters in string with character
 passed in AL.
 Strspan (I) Skips over characters in string which belong to a
 specified set. Returns number of skipped characters
 in CX.
 Strcspan (I) Like the routines above, but skips characters not in
 the set.
 Strins (m, I,mI) Inserts one string into another.

 Strdel (m) Deletes characters from a string.
 StrRev (m) Reverses characters in string (submitted by Mike
 Blaszczak).

 Memory Management Routines

 MemInit Initializes the memory manager.
 Malloc Allocates blocks of memory.
 Realloc Resizes an allocated block.
 Free Deallocates a block allocated via malloc.
 Dupptr Informs the memory manager that two different pointers
 refer to the same block. Free will not deallocate
 the block until you call it once for each duplicated
 pointer.
 IsInHeap Returns true if a pointer points anywhere within the
 heap area.
 IsPtr Returns true if ES:DI points at a currently allocated
 object.

 Character Set Routines

 Set Macro which lets you define up to eight character sets.
 CreateSets Allocates storage and intializes up to eight sets on
 the heap.
 EmptySet Sets a character set to the empty set.
 RangeSet Builds character set from lower-bound and upper-bound
 characters.
 AddStr (I) Adds the characters in a string to a character set.
 RmvStr (I) Removes characters specified in string from a character
 set.
 AddChar Adds a single character (in AL) to a character set.
 RmvChar Removes a single character from a character set.
 Member Sets Z-flag if character in AL is a member of specified
 character set.
 CopySet Copies the bits from one set to another.
 SetUnion Computes the union of two character sets.
 SetIntersect Computes the intersection of two a character sets.
 SetDifference Computes the set difference of two character sets.
 NextItem Returns the ASCII code of the first item found in the
 set.
 RmvItem Removes next available item from set and returns this
 value.

 Floating-Point Routines

 lsfpa Loads floating-point accumulator (fpacc) with 32-bit
 single precision value.
 ssfpa Stores fpacc into a single precision variable.
 ldfpa Loads fpacc from a 64-bit double precision variable.
 sdfpa Stores fpacc from a double precision variable.
 lefpa (I) Loads fpacc from an 80-bit extended precision variable.
 sefpa Stores fpacc into an extended precision variable.
 lsfpo Loads floating-point operand (fpop) from single
 precision variable.
 ldfpo Loads fpop from a double precision variable.
 lefpo (I) Loads fpop from an extended precision variable.
 itof Converts 16-bit integer to floating-point value in
 fpacc.
 utof Converts unsigned 16-bit value to floating point.

 ultof Converts unsigned 32-bit value to floating point.
 ltof Converts signed 32-bit value to floating point.
 fpadd Adds fpop to fpacc.
 fpsub Subtracts fpop from fpacc.
 fpcmp Compares fpacc to fpop.
 fpmul Multiplies fpacc by fpop.
 fpdiv Divides fpacc by fpop.
 ftoa Converts fpacc to a decimal string.
 etoa Converts fpacc to ASCII string in exponent form.

Notes: Routines with the "2" suffix imply that the routine does not preserve
the DI register. The routine returns DI pointing at the first character beyond
the string. If the routine does not have the "2" suffix, then it preserves the
ES:DI registers.
The "m" suffix implies that the routines automatically allocate storage for
their string results on the heap by calling malloc. These routines return a
pointer to the new string in ES:DI. Routines without the "m" suffix expect you
to pass the address of a sufficiently large buffer in the ES:DI register pair.
Routines with the "I" suffix accept literal string constants in the code
stream (that is, the string constant appears after the call to the specified
routine).
If the routine you want isn't in the library, write it up, send it in, and
we'll make sure it appears in the next release!
The library itself is currently divided into nine categories: input, output,
file I/0, conversions, utilities, string operations, memory management,
character-set operations, and floating-point operations. Most of these
routines are high level; that is, they are nontrivial and typically require
many lines of code and considerable knowledge to implement.
More Details.
The printf routine is a good example of a high-level routine. It's something C
programmers take for granted, but you rarely see it in typical assembly
language programs. A typical call to the printf routine is shown in Example 1.
As you can see, this code is close to that used by C and much more convenient
than the usual methods assembly language programmers employ. Except for the DB
and DD directives, this probably doesn't look a whole lot like assembly
language, though. The printf statement invokes a macro found in the
distribution stdlib.a file. Among other things, this macro issues a call to
the sl_printf subroutine in the library. I use macros to invoke the library
routines, not because I'm trying to be cute, but as a way to provide "smart
linking" facilities in MASM 5.1. See Listing One (page 80) for a sample macro
definition from the library.
Example 1: A typical call to the UCR printf routine

 printf
 db "The value of I is %3d, string s is %s\n",O
 dd I, S

Not all routines in the UCR StdLib completely mimic their C counterparts. For
example, itoa accepts an integer value in AX and converts it to a
zero-terminated string of characters, storing the string starting at ES:DI.
Like C, ES:DI must point at an array large enough to hold the result, or bad
things may happen. However, it's often inconvenient to declare an array and
load its address into ES:DI prior to calling itoa. Therefore, a second
routine, itoam, is available to simplify the calling sequence. You pass itoam
an integer value in AX and it returns a pointer to the converted string in
ES:DI. You do not have to preallocate storage for the call; itoam does this
automatically by calling malloc. At other times, it would be nice if itoa
returned a pointer to the end of the converted string rather than to the
beginning of it. (This is useful, for example, when building long strings via
successive calls to routines such as itoa.) A third routine, itoa2, provides
this facility. Having different routines such as itoa, itoa2, and itoam
reduces the size of a program because you don't need as many "glue"
instructions between library calls to massage the parameters.
The string routines should look familiar to C programmers. Almost all C
string-handling routines (and then some!) are present in the UCR StdLib. Like
itoa, there are various forms of some of the string functions, mainly for
convenience. For example, strcat takes four different forms: strcat, strcatl,
strcatm, and strcatml. These variants provide different ways of passing
parameters to and from the strcat routine. strcat concatenates two strings: It
copies the string pointed at by DX:SI to the end of the string pointed at by
ES:DI. The destination string (ES:DI) must have sufficient memory allocated to
hold the concatenation. strcatm also concatenates the string pointed at by
ES:DI and DX:SI, but it does not treat the string pointed at by ES:DI as the
target location. Instead, it (essentially) does what's shown in Example 2.
strcatl links a literal string constant (the "I" suffix stands for "literal")
to the end of the string referenced by ES:DI. Likewise, strcatml copies the
source string (ES:DI) onto the heap, appends the literal string constant
following the call, and returns the address of this new string in ES:DI. For
example, if filename points at a DOS filename and you want to append ".EXE" to
the end of it, you could use the code in Example 3.
Example 2: strcatm concatenates the string pointed at by ES:DI and DX:SI, but
it does not treat the string pointed at by ES:DI as the target location.

 temp = malloc( strlen(es:di) + strlen(dx:si) +1);
 strcpy(temp, es:di);
 strcpy(temp + strlen(es:di), dx:si);
 es:di = temp;

Example 3: Use this code to append .EXE to the end of filename.

 les di, filename
 strcatml
 db ".EXE", 0
 mov word ptr FullFileName, di
 mov word ptr FullFileName+2, es

Listing Two (page 80) provides a flavor of UCR Stdlib routine use. This is a
short program which computes various statistics about the information in a
text file and prints the results. The distribution package also includes
test.asm, a file that demonstrates use of each routine in the library.
Hopefully, Listing One will show you that it's worthwhile to obtain a copy of
the library, and the test file will exemplify the use of each routine in the
library.


Conclusion


To keep the library growing, I assign my assembly language students at UC
Riverside a final project of adding one new, useful routine to the library. My
Commercial Software Development class gets charged with testing, documenting,
and otherwise cleaning up the work of the assembly language students. As such,
new routines appear in the library on a periodic basis. The text box entitled
"Obtaining and Licensing the UCR Standard Library" provides details on
obtaining the latest version of the library.
The UCR Standard Library has proven to be of considerable aid in getting
students up to speed with 80x86 assembly language. Beginners are finding
assembly language easier to deal with than ever before, particularly when
combined with MASM 6.0's high-level constructs. Advanced assembly language
programmers use the library to help them quickly develop new programs in
record time. Given that the library is free for all to use, it is definitely
worth adding to your personal development suite if you use or are interested
in learning how to use 80x86 assembly language.


Obtaining and Licensing the UCR Standard Library


There are two "official" distribution points for the UCR StdLib distribution
package: BIX and Internet. If you have access to the BIX telecommunications
service, you may obtain a copy of "UCRSTDLB.ZIP" from the IBM.PC/LISTINGS
area. If you have ftp access to the Internet, you may obtain a copy of
UCRSTDLB.ZIP via anonymous ftp to ucrmath.ucr.edu. After logging on, switch to
the "PC" subdirectory (this is UNIX, so case counts!) and the rest should be
fairly obvious.
In addition to the two official distribution points, the UCR StdLib routines
are also available from many other telecommunications services, BBSs, and
shareware distributors, including M&T Online and the DDJ Forum on CompuServe.
However, there is a version control problem and I cannot guarantee that any
service other than the ones mentioned above will have the latest version of
the software. Given the way shareware and freeware software propagate around
the world, I do not see a reasonable solution to this problem.
All replies and comments should be sent to the following address:
Randall Hyde
Department of Computer Science
UC Riverside, Riverside, CA 92521

Internet e-mail: rhyde@ucrmath.ucr.edu
BIXmail: rhyde

_THE UCR STANDARD ASSEMBLY LANGUAGE LIBRARY_
by Randall Hyde



[LISTING ONE]

Printf macro
; The following extrn declaration only occurs if this is the first time
; printf appears in the source file.
;
 ifndef sl_printf
stdlib segment para public 'slcode'
 extrn sl_printf:far
stdlib ends
 endif
;
; Perform the call to the actual sl_printf routine:
 call far ptr sl_printf
 endm
;



[LISTING TWO]

;****************************************************************************
; TS: A "Text Statistics" package which demonstrates the use of the UCR
; Standard Library Package.
; Note: The purpose of this program is not to demonstrate blazing fast
; assembly code (it's not particularly fast) but rather to demonstrate how
; easy it is to write assembly code using the standard library and MASM 6.0.
; Randall Hyde
;***************************************************************************
; The following include must be outside any segment and before the
; ZZZZZZSEG segment. It includes all the macro definitions for the
; UCR Standard Library.
 include stdlib.a ;Links into the UCR Standard
 includelib stdlib.lib ; Library package.
;
dseg segment para public 'data'
;
WordCount dw 0 ;Holds file word count value
LineCnt dw 0 ;Holds # of lines in file
ControlCnt dw 0 ;Counts # of control characters
Punct dw 0 ;Counts # of punctuation characters
AlphaCnt dw 0 ;Counts # of alphabetic characters
NumericCnt dw 0 ;Counts numeric digits in file
Other dw 0 ;Counts other chars in file
MemorySize dw 0 ;# of paragraphs of free memory
Chars dw 0 ;Total number of chars in file
TotalChars dq 0.0 ;FP version of the above
FileHandle dw 0 ;STDLIB file handle
Const100 dq 100.0
;
; Create some sets to use in this program:

 set CharSet,Alphabetic,Punctuation,Control
;
; Character Counter array. CharCnt [ch] contains the number of "ch"
; characters appearing in the file.
CharCnt dw 256 dup (0)
;
; Boolean flag to denote in/not in a word:
InWord db 0
;
dseg ends
;
cseg segment para public 'code'
 assume cs:cseg, ds:dseg
;
; LESI is a macro which loads a 32-bit immediate value into ES:DI.
lesi macro adrs
 mov di, seg adrs
 mov es, di
 lea di, adrs
 endm
;
; ldxi loads a 32-bit immediate value into dx:si:
ldxi macro adrs
 mov dx, seg adrs
 lea si, adrs
 endm
;
; Some useful constants-
cr equ 13
lf equ 10
EOFError equ 8
;
 public PSP ;DOS Program Segment Prefix
PSP dw ? ;Needed by StdLib MemInit routine
;
; Okay, here's the main program which does the job:
TS proc
 mov cs:PSP, es ;Save pgm seg prefix
 mov ax, seg dseg ;Set up the segment registers
 mov ds, ax
 mov es, ax
;
; Initialize the memory manager, giving all free memory to the heap.
 mov dx, 0
 meminit
 mov MemorySize, cx ;Save # of available paragraphs.
;
; Set up the character sets:
; First, build the Alphabetic set:
 mov al, "A"
 mov ah, "Z"
 lesi Alphabetic
 RangeSet
 AddStrL
 db "abcdefghijklmnopqrstuvwxyz",0
;
; Create the set with the punctuation characters:
 lesi Punctuation
 AddStrL

 db "!@#$%^&*()_-+={[}]\':;<,>.?/~`", '"', 0
;
; Create the control character set:
 lesi Control
 mov al,0
 mov ah, 1fh
 RangeSet
 mov al, 7fh
 AddChar
;
; Print the amount of available memory and prompt the user to enter a file
name.
 printf
 db "Text Statistics Program",cr,lf
 db "There are %d paragraphs of memory available",cr,lf
 db lf
 db "Enter a filename:",0
 dd MemorySize
;
 getsm ;Get the filename.
;
; Open the file.
 mov al, 0 ;Open for reading.
 fopen ;Open the file.
 jnc GoodOpen
;
; If the carry flag comes back set, we've got an error, print an appropriate
; message and quit:
 print
 db "DOS error #",0
 puti ;Error code is in AX.
 putcr
 jmp Return2DOS
;
; If the carry flag comes back clear, we've successfully opened the file. AX
; contains the STDLIB filehandle, ES:DI still points at the filename
; allocated on the heap:
GoodOpen: mov FileHandle, AX ;Save STDLIB file handle.
 print
 db "Computing text statistics for ",0
 puts ;Print filename
 free ;Dispose of space on heap
 putcr
 putcr
;
; The following loops check for transitions between words and delimiters. Each
; time we go from "not a word" -> "word" this code bumps up word count by one.
 mov ax, FileHandle
 fReadOn
;
TSLoop: getc
 jnc NoError
 jmp ReadError
;
; See if the character is alphabetic
NoError: lesi Alphabetic ;Set contains A-Z, a-z
 Member
 jz NotAlphabetic
 inc AlphaCnt
 jmp StatDone

;
; See if the character is a digit:
NotAlphabetic: cmp al, "0"
 jb NotNumeric
 cmp al, "9"
 ja NotNumeric
 inc NumericCnt
 jmp StatDone
;
; See if the character is a punctuation character
NotNumeric: lesi Punctuation
 Member
 jz NotPunctuation
 inc Punct
 jmp StatDone
;
; See if this is a control character:
NotPunctuation: lesi Control
 Member
 jz NotControl
 inc ControlCnt
 jmp StatDone
;
NotControl: inc Other
StatDone: mov bl, al ;Use char as index into CharCnt
 mov bh, 0
 shl bx, 1 ;Convert word index to byte index
 inc CharCnt [bx]
;
; Count lines and characters here:
 cmp al, lf
 jne NotNewLine
 inc LineCnt
;
NotNewLine: inc Chars
; Count words down here
 cmp InWord, 0 ;See if we're in a word.
 je NotInAWord
 cmp al, " "
 ja WCDone
 mov InWord, 0 ;Just left a word
 jmp WCDone
;
NotInAWord: cmp al, " "
 jbe WCDone
 mov InWord, 1 ;Just entered a word
 inc WordCount
;
WCDone:
;
; Okay, or the current character into the character set so we can keep
; track of the characters which appear in this file.
 lesi CharSet
 AddChar
 jmp TSLoop
;
; Come down here on EOF or other read error.
ReadError: cmp ax, EOFError
 je Quit

 print
 db "DOS Error ",0
 puti
 putcr
 jmp Return2DOS
;
; Return to DOS.
Quit: freadoff
 mov ax, FileHandle
 fclose
 printf
 db cr,lf,lf
 db "Number of words in this file is %d",cr,lf
 db "Number of lines in this file is %d",cr,lf
 db "Number of control characters is %d",cr,lf
 db "Number of punctuation characters is %d",cr,lf
 db "Number of alphabetic characters is %d",cr,lf
 db "Number of numeric characters is %d",cr,lf
 db "Number of other characters is %d",cr,lf
 db "Total number of characters is %d",cr,lf
 db lf, 0
 dd WordCount,LineCnt,ControlCnt,Punct
 dd AlphaCnt,NumericCnt,Other,Chars
;
; Print the characters that actually appeared in the file.
 lesi CharSet
EC64: mov cx, 64 ;Chars/line on output.
EachChar: RmvItem
 cmp al, 0
 je CSDone
 cmp al, " "
 jbe EachChar
 putc
 loop EachChar
 putcr
 jmp EC64
;
CSDone: print
 db cr,lf,lf
 db "Press any key to continue:",0
 getc
 putcr
 putcr
;
; Now print the statistics for each character:
 mov ax, Chars ;Get character count,
 utof ; convert it to a floating
 lesi TotalChars ; point value, and save this
 sdfpa ; value in "TotalChars".
;
; Print out each character, the number of occurrences, and the ratio of
; this character's count to the total number of characters.
 mov bx, " "*2 ;Start output with spaces.
ComputeRatios: cmp CharCnt[bx], 0
 je SkipThisChar
 mov ax, bx
 shr ax, 1 ;Convert index to character
 putc ; and print it.
 print

 db " = ",0
 mov ax, CharCnt [bx]
 mov cx, 4
 putisize
 print
 db " Percentage of total is ",0
;
 utof
;
; Divide by the total number of characters in the file:
 lesi TotalChars
 ldfpo
 fpdiv
;
; Multiply by 100 to get a percentage
 lesi Const100
 ldfpo
 fpmul
;
; Print the ratio:
 mov al, 7
 mov ah, 3
 ftoam
 puts
 free
 print
 db "%",cr,lf,0
;
SkipThisChar: inc bx
 inc bx
 cmp bx, 200h
 jb ComputeRatios
 putcr
;
Return2DOS: mov ah, 4ch
 int 21h
;
TS endp
;
cseg ends
;
; Allocate a reasonable amount of space for the stack (2k).
sseg segment para stack 'stack'
stk db 256 dup ("stack ")
sseg ends
;
; zzzzzzseg must be the last segment that gets loaded into memory!
; The UCR Standard Library package uses this segment to determine where
; the end of the program lies.
zzzzzzseg segment para public 'zzzzzz'
LastBytes db 16 dup (?)
zzzzzzseg ends
 end TS





































































March, 1992
AN OBJECT-ORIENTED ASSEMBLY LANGUAGE MACRO LIBRARY


ASM macros with an OOP slant




Donald J. McSwain


Donald is a programmer for Digital Alchemy Inc., a start-up firm specializing
in process control and communications software. His experience includes
systems and applications software development for Asymetrix Corp., MITRE
Corp., Computer Sciences Corp., and the Kingdom of Saudi Arabia. He has been
involved in object-oriented programming for the past four years and can be
reached at Digital Alchemy, P.O. Box 254801, Sacramento, CA 95865, fax:
916-481-6467.


Object -oriented programming techniques allow you to develop reusable,
maintainable code by providing mechanisms for inheritance, data abstraction,
and encapsulation--features which C++, CLOS, and Smalltalk programmers
currently enjoy. But 80x86 assembly language programmers can also use
object-oriented programming techniques. For example, I've used the techniques
described in this article to develop an object-oriented assembly language
macro library that provides windows, pop-up menus, mouse support, horizontal
and vertical scroll bars, sound support, and the like.
In object-oriented systems, programs are organized as a collection of objects
related to one another through inheritance. An object is the embodiment of a
local state. It contains a description some data and procedures that operate
upon it. A message is the generic name for a set of procedures or methods.
An object may use methods from other objects through inheritance. For the
purpose of this discussion, objects that inherit methods from others are
derived objects, and those that do not are base objects. An ancestor is any
object that bestows methods. In this object-oriented programming scheme,
ancestral objects may contribute up to two methods per message.
A combined method is the runtime version of a message. Methods are assembled
into a combined method in a well defined manner based on an object's ancestor
list. However, only objects directly sent as messages may have combined
methods.
Method combining is done by first finding all of an object's ancestors, and
then collecting methods from each ancestor grouped by message. Object
collection starts with an object, its ancestors, their ancestors, and so on
recursively. Once this depth-first search is completed, duplicate objects are
removed to prevent duplication of effort. This object collection becomes the
basis of a search for method addresses grouped under the given message name.
In this scheme, there may be up to three methods per message per object. This
makes the combining of methods nontrivial. One of the many ways to combine
methods is through daemon combination, which specifies how these three element
sets of methods may be ordered to form a combined method.


Method Combining


The three methods that make up an object's local message are known as Before,
Primary, and After. Before methods execute "before" and After methods execute
"after" the Primary. When a combined method is assembled, only the local (that
is, uninherited) Primary method is included. In other words, ancestors can
contribute only Before and After methods to a combined method.
Combined methods are created only for objects which receive directly sent
messages. Methods received indirectly through inheritance will not have
combined methods. This optimizes message passing so that runtime searching is
not required to resolve a message pass into its associated set of methods.
Combined methods make message passing a simple matter of fetching and
executing method addresses.
The daemon combination scheme specifies methods combination in the following
manner: the local Before, ancestral Befores (in-depth first order), the local
Primary, ancestral Afters (in reverse order of Befores), and the local After
method.
To reiterate, combined method construction involves two steps. First, objects
found in a depth-first search of an object's ancestor list are placed onto a
list with duplicates removed. Second, this list is used to create an ordered
list of methods based on the daemon combination scheme.


Message Passing


Message passing becomes possible after combined methods are constructed.
Consequently, when an object is sent a message, it responds by executing the
corresponding combined method. This involves locating a pointer to a combined
method table and then fetching and executing the address in that table.
An object is known to ancestral objects through the object variable Self and
to other objects by name. Self provides a means for anonymous message passing
and access to instance data. This makes code generalization possible by
providing a way to easily share code and data.
Message arguments are passed by way of the stack to all methods in a combined
method. Therefore, methods must be stack neutral--they must not increase or
decrease the stack depth. If a method is to return a value on the stack, space
must be allocated prior to the message pass.
Example 1 demonstrates message passing with the send macro. send takes an
object name, a message name, and an optional message argument list. In the
first example, Self is assigned to Screen, the constant DoubleBdr is pushed
onto the stack, and the combined Screen-Refresh method is called. Upon return,
DoubleBdr is removed from the stack.
Example 1: Message passing using the send macro

 send Screen, Refresh, DoubleBdr ;Send Screen a Refresh msg
 send Self, Read ;Send Self a Read msg
 send Self, <WORD PTR[bx]> ;Send Self msg pointed to by BX
 register

Listing One (page 84) is the source code for the send macro. It pushes
arguments onto the stack, moves an object address into a register, moves the
message number into a register, calls the sendMsg procedure, and pops message
arguments off the stack.
sendMsg assigns Self, searching the object's message list for a matching
message number. If a match isn't found, it returns. Otherwise, it gets a
pointer to the combined method table, and selects a method count. Using the
method count as a loop counter, it then fetches and executes method addresses
located in the method table.


Object Definition


Using Microsoft MASM 5.1 conventions, source files are comprised of code and
data segments. For our purposes, the code segment will contain methods and
procedures, while the data segment holds object ad message definitions along
with other data.
Example 2 shows the use of the defObj macro for object definition. defObj
takes an object name, an ancestor list, an instance variable list, and a
message list. The object name may be any valid variable name. The ancestor
list may be empty, as in the case of base objects, or may contain the names of
objects to inherit from, as in the case of derived objects. The instance
variable list may be empty, or may contain three element entries of instance
variable name, size, and initial value. The last argument, the message list,
contains the names of messages which the object will respond to.
Example 2: Using the defObj macro for object definition

 defObj Window,\ ;Define Window object

 <>,\ ;As a base object
 <>,\ ;With no inst vars
 <Refresh> ;Responds to Refresh msg

 defObj Border,\ ;Define Border object
 <>,\ ;As a base object
 <>,\ ;With no inst vars
 <Refresh> ;Responds to Refresh msg

 defObj Screen,\ ;Define Screen object
 <Window, Border>,\ ;As a derived object
 <Row1, 1, 1,\ ;With these inst vars
 Col1, 1, 0,\
 Row2, 1, 23,\
 Col2, 1, 79,\
 Color, 1, 34h,
 BdrColor, 1, 30h,\
 MemSeg, 2, Nil>,\
 <Refresh> ;Responds to Refresh msg

The Screen object inherits some of its methods from Window and Border. Object
and message names are public symbols, and are the only data visible to
nonancestral objects. Ancestors, however, have access to all instance data
through the object variable Self.
Listing Two page 84) shows the source code for the defObj macro. It assembles
ancestor, instance variable, and message tables in memory. The _Object
structure is used by defObj to assemble pointers to these tables.


Message Definition


A message describes a set of operations on some data. To associate operations
to a message name, the defMsg macro is used. Example 3 demonstrates how you
can use defMsg to define messages. defMsg takes an object name, message name,
and a method list. The method list may contain up to three method names
representing the Before, Primary and After methods.
Example 3: Using defMsg to define messages

 defMsg Window,\ ;Define for Window object
 Refresh,\ ;The Refresh msg
 <clrWin,,> ;To clear window

 defMsg Border,\ ;Define for Border object
 Refresh,\ ;The Refresh msg
 <,,drawBdr> ;To draw border

 defMsg Screen,\ ;Define for Screen object
 Refresh,\ ;The Refresh msg
 <, drawBackDrop, drawLabel> ;To draw back drop and label

Many objects respond to the same message, but each will use a different set of
methods. Recall that this set may contain local and inherited methods. Thus
the combined Screen-Refresh method contains: clrWin, drawBackDrop, drawBdr,
and drawLabel.
Listing Three (page 85) shows source code for the defMsg macro. Using the
_Message structure, it assembles three entries containing a method address, or
null pointer. In turn, this table is pointed to by the concatenated
object-message name (that is, ScreenRefresh). This name is used to locate
local methods for method combining at initialization time.
Example 4 shows how you might generalize window labeling by creating a Label
object to handle the specifics. Label must be declared a Screen ancestor after
Border. drawLabel is declared under Label's Refresh message as an After
method. This produces the same combined method as before, but affords a
greater degree of modularity that makes your code easier to enhance and
maintain.
Example 4: Generalizing window labeling by creating a Label object to handle
the specifics

 defMsg Label,\ ;Define for Label object
 Refresh,\ ;The Refresh msg
 <, ,drawLabel> ;To draw label

 defObj label,\ ;Define Label object
 <>,\ ;As a base object
 <>,\ ;With no inst vars
 <Refresh> ;Responds to Refresh msg

 defMsg Screen,\ ;Define for Screen object
 Refresh,\ ;The Refresh msg
 <,drawBackDrop,> ;To draw back drop


 defObj Screen,\ ;Define Screen object
 <Window, Border, Label>,\ ;As a derived object
 <Row1, 1, 1,\ ;With these inst vars
 Col1, 1, 0,\
 Row2, 1, 23,\
 Col2, 1, 79,\
 Color, 1, 34h,\
 BdrColor, 1, 30h,\
 MemSeg, 2, Nil>,\
 <Refresh> ;Responds to Refresh msg

Ancestor lists determine who contributes code, and in what order they
contribute it. By changing the order of objects on an ancestor list, you alter
an object's behavior. For example, if Screen's ancestor list was changed to
Window, Label, Border, the combined Screen-Refresh message would instead
become clrWin, drawBackDrop, drawLabel, and drawBdr. Consequently, the label
would have been drawn prior to the border, thus overwriting it.


Object Initialization


Object initialization is a runtime activity invoked with the initObj macro. It
transforms an object's ancestor list into a table where duplicates have been
removed. It also creates combined methods for each of an object's declared
messages.
If an object is not initialized, combined methods will not be created for it.
This is desirable for objects such as Window, Border, and Label which are
never directly sent messages, but which receive them only through the
inheritance mechanism.
Example 5 shows how you initialize an object. Initialization order is
significant. An object must be initialized before its ancestors so that method
pointer information can be accessed before being overwritten.
Example 5: Initializing an object

 initObj Screen ;Combine methods for Screen

Listing Four(page 85) is the source code for the initObj macro. It moves an
object address into a register, and calls the initObject procedure. initObject
performs a depth-first search of the ancestor list to assemble a temporary
table of ancestor pointers, then builds combined method tables for each
declared message. initObject then replaces the local method list pointers in
the message table (assembled by defMsg) with pointers to combined methods.


Using Instance Data


Instance variable values may be retrieved with the getInst macro, and changed
with the setInst macro. Example 6 demonstrates usage of these macros. getInst
takes a destination register, an instance variable name, and an optional
object name. setInst takes an instance variable name, a source register, an
optional object name, and an optional variable size. The optional object name
specifies the source of the instance data. If not provided, it is assumed that
the SI register already contains the address of the source object. This would
be the case after one use of the getInst or setInst macro that included an
object name. Listing Five (page 86) is the source code for this macro.
Example 6: Retrieving instance variable values using the getInst macro

 getInst bl, Color, Screen ;Fetch Screen color
 setInst BdrColor, bl ;Copy it to BdrColor
 setInst Color, bl, Self ;And Self's color

getInst assembles instructions to place the object address in a register, and
based on the register size, moves instructions to copy data from the variable
to the register. setInst assembles code to place the object address into a
register, and move instructions to copy data from the register to memory. If
the move is from memory to memory then the optional size argument must also be
provided.
The getInst$ macro allows source object specification through an instance
variable instead of by name or by the object variable Self. getInst$ and
setInst$ work the same as getInst and setInst except the specified instance
variable points to the source object. It is assumed that Self points to the
object supplying this instance data.
Example 7 and Listing Six (page 87) show how you might use these macros.
Master is one of Self's instance variables and points to some object. This
allows the Color instance variable of any object pointed to by Master to be
accessed. Local object variables just provide another mechanism for code
generalization.
Example 7: Using the getInst$ and setinst$ macros

 getInst$ bl, Color, Master ;Fetch color from object pointed to by Master
 setInst Color, bl, Self ;Copy it to Self's color



An Example


As stated, the code presented in this article is part of a larger assembly
language macro library implemented using object-oriented programming schemes.
The program, supporting object-oriented concepts such as multiple inheritance,
was developed using Microsoft's MASM 5.1 and provides windows, pop-up menus,
mouse support, horizontal and vertical scroll bars, sound support, and the
like. Because of space constraints, the complete source code for this example
is available electronically.


Limitations


The use of object-oriented programming techniques with assembly language
allows for the development of highly modular code. Thus, reusability and ease
of maintenance of assembly code improves. However, taking advantage of these
features requires some trade-offs.
Object initialization must be done prior to message passing. This slows
program start-up, and adds a move and call instruction for every initialized
object. To overcome this limitation, you could add code to write your
initialized program to an executable file, possibly as a final step before
software delivery. Once initialization is done, however, message passing
becomes very efficient.

Another trade-off arises with message look-up. When a message is passed to an
object, its message table is searched to locate a pointer to a combined
method. Some search code optimizations could be made. For example, the test
for null pointers could be removed, but program corruption may occur when an
undeclared message is passed to an object. However, the most significant
optimization you can make is through intelligent object class design, which
suggests that you make complete use of inheritance, Before and After methods,
and generic objects.
Although no formal comparisons with other object-oriented programming
languages have been done, practical experience with this system has shown it
to be robust. In addition, programming productivity increases were very
noticeable after the system was learned.


References


Barstow, David R. et al. Interactive Programming Environments. New York, N.Y.:
McGraw-Hill, 1984.
Bobrow, David G., et al. Common LISP Object System Specification. X3J13
Document 88-002R.
Cannon, Howard I. Flavors: A Non-Hierarchical Approach to Object-Oriented
Programming. Unpublished paper, 1983.
Cox, Brad J. Object-Oriented Programming: An Evolutionary Approach. Reading,
Mass.: Addison-Wesley, 1986.
Dorfman, Len. Object-Oriented Assembly Language. Blue Ridge Summit, Penn.:
Windcrest, 1990.
Duncan, Ray. Advanced MS-DOS. Redmond, Wash.: Microsoft Press, 1986.
Hyde, Randall L. "Object-Oriented Programming in Assembly Language." Dr.
Dobb's Journal (March 1990).
Moon, David. "Object-Oriented Programming with Flavors." Proceedings of OOPSLA
'86.
Toutonghi, Michael. "21st Century Assembler." Computer Language (June, 1990).
Wegner, P., ed. The Object-Oriented Classification Paradigm: Research
Directions in Object-Oriented Programming. Cambridge, Mass.: MIT Press, 1987.
Wyatt, Allen. Using Assembly Language. Carmel, Ind.: Que Corp., 1987.


_AN OBJECT-ORIENTED ASSEMBLY LANGUAGE MACRO LIBRARY_
by Donald J. McSwain



[LISTING ONE]

Macro File: objects.mac

COMMENT % ===============================================================
Sets up stack, SI with object pointer, DX with message number, and calls
sendMsg procedure.
Passed: Obj - Name of receiving object; Msg - Message number
=========================================================================%
send MACRO Obj,Msg,ArgList
 pushArgs ArgList ;Push arguments onto stack
 IFIDN <Obj>,<Self> ;If object is Self
 mov si,Wptr[Self] ;Get object ptr from it
 ELSE
 IFDIF <Obj>,<si> ;If object ptr not in SI
 lea si,Obj ;Load SI with ptr to object
 ENDIF
 ENDIF

 IFDIF <Msg>,<dx> ;If msg number not in DX
 mov dx,Msg ;Put it in DX
 ENDIF

 call sendMsg ;Send message
 IFNB <ArgList> ;If arguments
 X = 0 ;Init stack depth counter
 IRP Arg,<ArgList> ;For every arg on stack
 X = X+2 ;Increment depth counter
 ENDM
 add sp,X ;Reset stack pointer
 ENDIF
 ENDM

COMMENT % ===============================================================

Pushes up to ten arguments onto the stack.
=========================================================================%
pushArgs MACRO A0,A1,A2,A3,A4,A5,A6,A7,A8,A9
 IFB <A0> ;If no more arguments
 EXITM ;Exit macro
 ENDIF

 IFIDN <A0>,<ax> ;If arg in AX
 push ax ;Push AX
 ELSE
 IFIDN <A0>,<bx> ;If arg in BX
 push bx ;Push BX
 ELSE
 IFIDN <A0>,<cx> ;If arg in CX
 push cx ;Push CX
 ELSE
 IFIDN <A0>,<dx> ;If arg in DX
 push dx ;Push DX
 ELSE
 mov bx,A0 ;Else move into BX
 push bx ;Push BX
 ENDIF
 ENDIF
 ENDIF
 ENDIF
 pushArgs A1,A2,A3,A4,A5,A6,A7,A8,A9
 ENDM

COMMENT % =============================================================
Finds the specified message for specified object.
Passed: Msg - Message number; Obj - Addr ptr to object structure
Passes: si - Pointer to combined method pointer
=========================================================================%
findMsg MACRO Obj,Msg,Lbl
 LOCAL fdmg1,fdmg2
 IFDIF <Obj>,<si> ;If object ptr is not in SI
 mov si,Obj ;Put it there
 ENDIF
 mov di,Wptr[si].Instances
 ;Addr of msg tbl end
 mov si,Wptr[si].Messages
 ;Addr of msg tbl beginning
fdmg1: lodsb ;Fetch msg number
 eq al,Msg,fdmg2 ;Exit if message is found
 add si,2 ;Else point to next message
 cmp si,di ;More messages?
 jb fdmg1 ;If so continue search

 IFNB <Lbl> ;If label provided
 jmp Lbl ;Jump to it upon failure
 ENDIF
fdmg2:
 ENDM

Source File: objects.asm
 PUBLIC sendMsg
COMMENT % ===================================================================
Sends the specified object the given message. This causes the execution of
the combined message for the object.

Passed: dx - Message number; si - Combined method ptr
=============================================================================%
sendMsg PROC NEAR
 findMsg si,dl,smg2 ;Search for message
 mov si,Wptr[si] ;Get method addr
 mov cx,Wptr[si] ;Get method count
smg1: add si,2 ;Point to method
 pushData <cx,si> ;Save loop cnt, addr ptr
 call Wptr[si] ;Execute method
 popData <si,cx> ;Restore addr ptr, loop cnt
 loop smg1 ;Loop
smg2: ret
sendMsg ENDP






[LISTING TWO]


Include File: objects.inc

COMMENT % ==================================================================
Data structure used to hold pointers to an object's ancestors, messages, and
instance variables.
===========================================================================%
_Object STRUC
 Objects DW Nil
 Messages DW Nil
 Instances DW Nil
_Object ENDS

Macro File: objects.mac

COMMENT % ===================================================================
Defines an object.
Passed: Obj - Object name; Objs - Ancestor list; Instances - Instance
variable list; Messages - Message list
=============================================================================%
defObj MACRO Obj,Objs,Instances,Messages
 LOCAL ObjTbl,MsgTbl,InstTbl

 ObjTbl LABEL WORD
 objsDef Obj,<Objs>

 MsgTbl LABEL WORD
 msgsDef Obj,<Messages>
 InstTbl LABEL WORD
 instDef <Instances>
 ALIGN 2
 PUBLIC Obj
 Obj _Object <ObjTbl,MsgTbl,InstTbl>
 ENDM

COMMENT % ===================================================================
Defines objects.
Passed: Obj - Object name; Objs - Ancestor list

=============================================================================%
objsDef MACRO Obj,Objs
 DW Obj
 IRP Obj,<Objs>
 DW Obj
 ENDM
 ENDM

COMMENT % ====================================================================
Defines messages.
Passed: Obj - Object name; Messages - Message list
=============================================================================%
msgsDef MACRO Obj,Messages
 IRP Msg,<Messages>
 DB Msg ;Msg# identifies msg
 IFNDEF Obj&&Msg
 DW Nil ;Obj has no local methods
 ELSE
 DW Obj&&Msg ;Obj has local methods
 ENDIF
 ENDM
 ENDM

COMMENT % ===================================================================
Defines instances variables.
Passed: Instances - Instance variable list
=============================================================================%
instDef MACRO Instances
 X = 0
 Y = 0
 IRP Inst,<Instances>
 defInst Inst,%X,%Y
 ENDM
 ENDM

COMMENT % ====================================================================
Defines an instance variable.
Passed: Inst - Instance variable name; Cnt - Instance variable field number;
Size - Size of instance variable
=============================================================================%
defInst MACRO Inst,Cnt,Size
 IFIDN <Cnt>,<0>
 X = X+1
 ELSE
 IFIDN <Cnt>,<1>
 X = X+1
 Y = Inst
 ELSE
 X = 0
 defVar Size,Inst
 ENDIF
 ENDIF
 ENDM

COMMENT % ====================================================================
Defines a data item.
Passed: Size - Size of data in bytes; Value - Value of data item
=============================================================================%
defVar MACRO Size,Value

 IFIDN <Size>,<1>
 DB Value
 ELSE
 IFIDN <Size>,<2>
 DW Value
 ELSE
 IFIDN <Size>,<4>
 DD Value
 ELSE
 IFIDN <Size>,<8>
 DQ Value
 ELSE
 IFIDN <Size>,<10>
 DT Value
 ENDIF
 ENDIF
 ENDIF
 ENDIF
 ENDIF
 ENDM






[LISTING THREE]

Include File: objects.inc

COMMENT % ==================================================================
Data structure used to hold pointers to a message's Before, Primary, and
After methods.
===========================================================================%
_Message STRUC
 Before DW Nil
 Primary DW Nil
 After DW Nil
_Message ENDS

Macro File: objects.mac

COMMENT % ====================================================================
Defines a message.
Passed: Obj - Object name; Msg - Message name; Methods - Method list
=============================================================================%
defMsg MACRO Obj,Msg,Methods
 ALIGN 2
 Obj&Msg _Message <Methods>
 ENDM






[LISTING FOUR]

Macro File: objects.mac


COMMENT % ====================================================================
Sets us SI to point to object, and calls initObject procedure.
=============================================================================%
initObj MACRO Obj
 lea si,Obj ;Pass object ptr
 call initObject ;Find all ancestors
 ENDM

Source File: objects.asm

 PUBLIC initObject
COMMENT % ====================================================================
Initializes an object by flattening its inheritance lattice to create
combined methods for its messages.
Passed: si - Addr ptr to object structure
=============================================================================%
initObject PROC NEAR
 lea di,Buffer ;Get buffer addr
 call findAncestors ;Find/Save all ancestors
 call evalMsgs ;Evaluate messages
 ret
initObject ENDP

COMMENT % ====================================================================
Finds all of an object's ancestors and saves them for use by the message
evaluator.
Passed: bx - Addr ptr to message table (end of object table); di - Addr ptr
to temporary object table; si - Addr ptr to object structure
=============================================================================%
findAncestors PROC NEAR
 pushData <bx,si> ;Save obj ptr
 mov bx,Wptr[si].Messages ;Get addr ptr to msg tbl
 mov si,Wptr[si].Objects ;Get addr of object tbl
 movsw ;Move obj ptr
fas1: eq bx,si,fas2 ;Exit if end of tbl
 push si
 mov si,Wptr[si] ;Get next object
 call findAncestors ;Find others
 pop si
 add si,2
 jmp fas1 ;More in tbl - Loop
fas2: mov Wptr[di],Nil ;Mark end of list
 popData <si,bx> ;Restore obj ptr
 ret
findAncestors ENDP

COMMENT % ====================================================================
Creates combined methods for all of an object's messages.
Passed: si - Addr ptr to object structure
=============================================================================%
evalMsgs PROC NEAR
 mov bx,Wptr[si].Messages ;Get addr of message tbl
 mov cx,Wptr[si].Instances ;Get addr of instance tbl
ems1: mov dl,Bptr[bx] ;Get msg number
 xor dh,dh
 call combineMethods ;Combine methods
 add bx,3 ;Point to next tbl entry
 neq bx,cx,ems1 ;More in tbl? - loop

 ret
evalMsgs ENDP

COMMENT % ====================================================================
Combines methods for all included objects.
Passed: dx - Message number; si - Addr ptr to object structure
=============================================================================%
combineMethods PROC NEAR
 push bx
 mov ?Compiled,Nil ;Clear compiled flag
 mov bx,Wptr[CompileStart] ;Get start of combined mthd
 mov Wptr[CompilePtr],bx ;Init location ptr
 mov di,Nil ;Zero count word
 call saveMethodAddr ;Save value
 call saveBefores ;Save Before methods
 mov bx,Primary ;Select Primary method type
 lea di,Buffer ;Get addr of tmp object tbl
 mov di,Wptr[di] ;Get tbl entry
 call saveMethod ;Save method
 call saveAfters ;Save After methods
 null ?Compiled,cms1 ;Nothing compiled? - Exit
 call updatePtrs ;Update message, location ptrs
cms1: pop bx
 ret
combineMethods ENDP

COMMENT % ====================================================================
Updates the message and location pointers.
Passed: dx - Message number; si - Addr ptr to object structure
=============================================================================%
updatePtrs PROC NEAR
 push si
 findMsg si,dl ;Find message
 mov di,Wptr[CompileStart] ;Get ptr to combined method
 mov Wptr[si],di ;Change message ptr

 mov bx,Wptr[CompilePtr] ;Get current compile location
 mov Wptr[CompileStart],bx ;Reset start of combined mthd
 pop si
 ret
updatePtrs ENDP

COMMENT % ====================================================================
Save the Before method type for the specified object.
Passed: dx - Message number
=============================================================================%
saveBefores PROC NEAR
 push si
 mov bx,Before ;Select Before method type
 lea si,Buffer ;Get addr of tmp object tbl
 mov di,Wptr[si] ;Get tbl entry
sbs1: call saveMethod ;Save method
 add si,2 ;Point to next tbl entry
 mov di,Wptr[si] ;Get next tbl entry
 identity di,sbs1 ;More in table? - loop
 pop si
 ret
saveBefores ENDP


COMMENT % ===================================================================
Save the After method type for the specified object.
Passed: dx - Message number
=============================================================================%
saveAfters PROC NEAR
 pushData <cx,si>
 mov bx,After ;Select After method type
 lea si,Buffer ;Get addr of tmp object tbl
 mov cx,si ;Save addr of object tbl
sas1: mov ax,Wptr[si] ;Get tbl entry
 null ax,sas2 ;Null? - End of tbl, exit
 add si,2 ;Point to next tbl entry
 jmp sas1 ;Loop
sas2: sub si,2 ;Point to previous tbl entry
 mov di,Wptr[si] ;Get next tbl entry
 call saveMethod ;Save method
 neq si,cx,sas2
 popData <si,cx>
 ret
saveAfters ENDP

COMMENT % ====================================================================
Save the specified method for specified object.
Passed: bx-Method type; di-Addr ptr to object structure; dx-Message number
=============================================================================%
saveMethod PROC NEAR
 pushData <bx,di,si>
 findMsg di,dl,svm3 ;Find message
 mov di,Wptr[si] ;Get method tbl addr ptr
 null di,svm3 ;Exit if no local methods
 mov di,Wptr[di+bx] ;Get method addr ptr
 null di,svm3 ;Exit if no message
 mov bx,Wptr[CompileStart] ;Get start of combined mthd
svm1: eq bx,Wptr[CompilePtr],svm2
 eq di,Wptr[bx],svm3 ;Exit if duplicate method
 add bx,2 ;Point to next addr
 jmp svm1 ;Check next addr
svm2: call saveMethodAddr ;Save method addr
svm3: popData <si,di,bx>
 ret
saveMethod ENDP

COMMENT % ====================================================================
Save value at current compile location, and increments location pointer.
Passed: di - Value to store
=============================================================================%
saveMethodAddr PROC NEAR
 mov ?Compiled,1 ;Set compiled flag
 mov bx,Wptr[CompilePtr] ;Get ptr to combined mthd end
 mov Wptr[bx],di ;Save value
 add bx,2 ;Point to next location
 mov Wptr[CompilePtr],bx ;Reset location ptr
 mov bx,Wptr[CompileStart] ;Get ptr mthd count
 mov di,Wptr[bx] ;Get mthd count
 inc di ;Increments mthd count
 mov Wptr[bx],di ;Save value
 ret
saveMethodAddr ENDP







[LISTING FIVE]

Macro File: objects.mac

COMMENT % ====================================================================
Gets an object's instance variable.
Passed: Dest- Destination register; Var - Instance variable name;
Obj - Source object
=============================================================================%
getInst MACRO Dest,Var,Obj
 IFNB <Obj>
 IFIDN <Obj>,<Self>
 mov si,WORD PTR[Self]
 mov si,WORD PTR[si].Instances
 ELSE
 IFIDN <si>,<Obj>
 mov si,WORD PTR[si].Instances
 ELSE
 mov si,Obj&.Instances
 ENDIF
 ENDIF
 ENDIF
 mov Dest,[si+Var]
 ENDM

COMMENT % ====================================================================
Sets an object's instance variable.
Passed: Var - Instance variable name; Source - Source register; Obj - Source
object; Size - Size of data
=============================================================================%
setInst MACRO Var,Source,Obj,Size
 IFNB <Obj>
 IFIDN <Obj>,<Self>
 mov si,WORD PTR[Self]
 mov si,WORD PTR[si].Instances
 ELSE
 mov si,Obj&.Instances
 ENDIF
 ENDIF
 setInst_ Var,Source,Size
 ENDM

COMMENT % ====================================================================
Assembles move instruction based on source register.
=============================================================================%
setInst_ MACRO Var,Source,Size
 IFIDN <Source>,<al>
 mov BYTE PTR[si+Var],Source
 ELSE
 IFIDN <Source>,<ah>
 mov BYTE PTR[si+Var],Source
 ELSE
 IFIDN <Source>,<bl>
 mov BYTE PTR[si+Var],Source

 ELSE
 IFIDN <Source>,<bh>
 mov BYTE PTR[si+Var],Source
 ELSE
 IFIDN <Source>,<cl>
 mov BYTE PTR[si+Var],Source
 ELSE
 IFIDN <Source>,<ch>
 mov BYTE PTR[si+Var],Source
 ELSE
 IFIDN <Source>,<dl>
 mov BYTE PTR[si+Var],Source
 ELSE
 IFIDN <Source>,<dh>
 mov BYTE PTR[si+Var],Source
 ELSE
 IFIDN <Source>,<ax>
 mov WORD PTR[si+Var],Source
 ELSE
 IFIDN <Source>,<bx>
 mov WORD PTR[si+Var],Source
 ELSE
 IFIDN <Source>,<cx>
 mov WORD PTR[si+Var],Source
 ELSE
 IFIDN <Source>,<dx>
 mov WORD PTR[si+Var],Source
 ELSE
 IFIDN <Source>,<di>
 mov WORD PTR[si+Var],Source
 ELSE
 IFIDN <Source>,<si>
 mov WORD PTR[si+Var],Source
 ELSE
 IFIDN <Size>,<1>
 mov BYTE PTR[si+Var],Source
 ELSE
 IFIDN <Size>,<2>
 mov WORD PTR[si+Var],Source
 ENDIF
 ENDIF
 ENDIF
 ENDIF
 ENDIF
 ENDIF
 ENDIF
 ENDIF
 ENDIF
 ENDIF
 ENDIF
 ENDIF
 ENDIF
 ENDIF
 ENDIF
 ENDIF
 ENDM







[LISTING SIX]

Macro File: objects.mac

COMMENT % ====================================================================
Gets an object's instance variable, but object is pointed to by one of
Self's instance variables.
Passed: Dest- Destination register; Var - Instance variable name;
Obj - Source object instance variable
=============================================================================%
getInst$ MACRO Dest,Var,Obj
 mov si,WORD PTR[Self]
 mov si,WORD PTR[si].Instances
 mov si,[si+Obj]
 mov si,WORD PTR[si].Instances
 mov Dest,[si+Var]
 ENDM

COMMENT % ====================================================================
Sets an object's instance variable, but object is pointed to by one of
Self's instance variables.
Passed: Var - Instance variable name; Source - Source register; Obj - Source
object instance variable; Size - Size of data
=============================================================================%
setInst$ MACRO Var,Source,Obj,Size
 mov si,WORD PTR[Self]
 mov si,WORD PTR[si].Instances
 mov si,[si+Obj]
 mov si,WORD PTR[si].Instances
 setInst_ Var,Source,Size
 ENDM




[Example 1]

send Screen,Refresh,DoubleBdr ;Send Screen a Refresh msg
send Self,Read ;Send Self a Read msg
send Self,<WORD PTR[bx]> ;Send Self msg pointed to by
 ; BX register


[Example 2]

defObj Window,\ ;Define Window object
 <>,\ ;As a base object
 <>,\ ;With no inst vars
 <Refresh> ;Responds to Refresh msg

defObj Border,\ ;Define Border object
 <>,\ ;As a base object
 <>,\ ;With no inst vars
 <Refresh> ;Responds to Refresh msg

defObj Screen,\ ;Define Screen object

 <Window,Border>,\ ;As a derived object
 <Row1,1,1,\ ;With these inst vars
 Col1,1,0,\
 Row2,1,23,\
 Col2,1,79,\
 Color,1,34h,\
 BdrColor,1,30h,\
 MemSeg,2,Nil>,\
 <Refresh> ;Responds to Refresh msg



[Example 3]

defMsg Window,\ ;Define for Window object
 Refresh,\ ;The Refresh msg
 <clrWin,,> ;To clear window

defMsg Border,\ ;Define for Border object
 Refresh,\ ;The Refresh msg
 <,,drawBdr> ;To draw border

defMsg Screen,\ ;Define for Screen object
 Refresh,\ ;The Refresh msg
 <,drawBackDrop,drawLabel>
 ;To draw back drop and label


[Example 4]

defMsg Label,\ ;Define for Label object
 Refresh,\ ;The Refresh msg
 <,,drawLabel> ;To draw label

defObj label,\ ;Define Label object
 <>,\ ;As a base object
 <>,\ ;With no inst vars
 <Refresh> ;Responds to Refresh msg

defMsg Screen,\ ;Define for Screen object
 Refresh,\ ;The Refresh msg
 <,drawBackDrop,> ;To draw back drop

defObj Screen,\ ;Define Screen object
 <Window,Border,Label>,\
 ;As a derived object
 <Row1,1,1,\ ;With these inst vars
 Col1,1,0,\
 Row2,1,23,\
 Col2,1,79,\
 Color,1,34h,\
 BdrColor,1,30h,\
 MemSeg,2,Nil>,\
 <Refresh> ;Responds to Refresh msg




[Example 5]


initObj Screen ;Combine methods for Screen


[Example 6]

getInst bl,Color,Screen ;Fetch Screen color
setInst BdrColor,bl ;Copy it to BdrColor
setInst Color,bl,Self ;And Self's color



[Example 7]

getInst$ bl,Color,Master ;Fetch color from object
 ; pointed to by Master
setInst Color,bl,Self ;Copy it to Self's color













































March, 1992
ASSEMBLY LANGUAGE PROGRAMMING FOR THE 80X87


Floating point from the inside out


 This article contains the following executables: ASMX87.ARC


Nicholas Wilt


Nicholas is a student at Dartmouth College, Hanover, New Hampshire. His
interests include computer graphics, C++, and assembler programming. He can be
reached through the editorial offices at DDJ.


Conventional wisdom holds that programming numeric coprocessors in assembler
is not worthwhile. After all, so the story goes, the 80x87 architecture is so
esoteric and complicated, and compilers do such a good job utilizing them,
that it's not worth the effort to program them in assembler.
Nothing could be further from the truth. The 80x87 coprocessor has a quirky
architecture, but no more so than its integer-based siblings. And we shall see
that coprocessor code is just as susceptible to optimization as integer-based
code, if not more so. This article is for assembler programmers not familiar
with the 80x87 math coprocessors. It highlights features of the 80x87 most
useful to the applications programmer, rather than attempting an exhaustive
treatise.


80x87 Registers


The 80x87 contains eight 80-bit floating-point registers. This 80-bit
"temporary real" format, intended for internal use by the 80x87, is the only
numeric format the chip can directly manipulate. It can load and store
integers and standard IEEE floating-point numbers, but they must be converted
to and from the temporary real format.
The integer and floating-point formats supported by the 80x87 are listed in
Table 1, along with their size specifiers. The integer formats are the same as
those supported by the 80x86, except that the 80x87 can directly load and
manipulate quadwords (64-bit integers) as well as words and doublewords.
Table 1: Integer and floating-point formats supported by the 80x87

 Format Operand Load
 type with
 -----------------------------------

 32-bit float dword ptr FLD
 64-bit double qword ptr FLD
 80-bit long double tbyte ptr FLD
 16-bit integer word ptr FILD
 32-bit integer dword ptr FILD
 64-bit integer qword ptr FILD

The temporary real format has a 64-bit mantissa, so 64-bit integers can be
represented without loss of precision.
The eight FPU registers are organized as a stack. Numbers can be loaded into
and stored from the top of the stack only. The top register on the stack is
called ST or ST(0). Registers are numbered by increasing depth from ST; the
first register from the top is ST(1), the second is ST(2), and so on. This
stack-oriented architecture is the strangest aspect of programming the 80x87;
floating-point units for other architectures are organized more like integer
units, with constant register names.
On the 80x87, every time you load something into the chip, all the register
names for the chip's contents change -- what used to be the stack top is now
ST(1), ST(1) becomes ST(2), and so on down the line. The comments in 80x87
assembler code almost always concentrate on keeping the stack contents
straight.
Like the integer processor, the 80x87 has special registers that contain flags
and control bits. The CW (Control Word) of the 80x87 has bits that control
such things as how to round numbers during conversions. The SW (Status Word)
has bits that describe the state of the machine (such as the result of a
comparison). These words can be accessed through the FLDCW (load control
word), FSTCW (store control word), and FSTSW (store status word) instructions.
Fortunately, we don't often have to worry about the control or status words.
The default values in the control word are reasonable for most applications
(after all, they are the same for us as they are for the compiler), and the
status word is useful to applications programmers mostly for doing
floating-point comparisons.


80x87 Instructions


The most important instructions on the 80x87 are the load and store
instructions. FLD pushes an operand onto the stack; the loaded operand becomes
ST. FILD, the integer version of FLD, pushes integer operands onto the stack.
Executing FLD on a 32-bit (dword-sized) operand converts a 32-bit,
floating-point number to temporary real and pushes it onto the register stack.
Any effective address can be used for the operand -- for example, to load a
32-bit local variable, you might write: FLD dword ptr [bp-4], Load local,
while to load a 64-bit value pointed to by DS:SI, you would write: FLD qword
ptr [si]; Load next value in array. To load a 16-bit integer, you use the FILD
instruction: FILD word ptr [bp+6]; Load first parameter.
Besides FLD and FILD, there are a number of special FLD instructions to load
special constants onto the stack. These instructions don't take any operands;
they just push a value onto the stack. FLDZ, for example, pushes 0.0 onto the
stack. Table 2 has a complete list of these special load instructions. These
instructions are useful not only because they are faster than loading
constants from memory, but also because they guarantee full 80-bit precision
for the constants loaded.
Table 2: Special FLD instructions to load special constants onto the stack

 Instruction Loads
 ----------------------------

 fldlg2 Log 2 (base 10)
 fldln2 Log 2 (base e)
 fldl2e Log e (base 2)

 fldl2t Log 10 (base 2)
 fldpi Pi (3.14159...)
 fldz Zero (0.0)
 fld1 One (1.0)

As with loading operands, storing operands can only be done from ST, the stack
top. The FST and FIST instructions store floating-point and integer formats.
To store the stack top in the form of a 64-bit, double-precision number, write
something such as: FST qword ptr [bp-8]; Store temp.
Often, you want to store the stack top and pop the stack in one operation.
This is when you won't be needing the contents of the stack top after storing
them. Operations that pop the stack after completion have a P appended onto
the instruction name. To store the stack top and then pop it off the register
stack, you use the FSTP and FISTP instructions in place of FST and FIST.
The instruction mnemonics for the 80x87 were chosen deliberately to help
understand what the instruction does. For instance, instructions for the
coprocessor always begin with the letter F; this distinguishes them as 80x87
instructions, because no integer instructions begin with F. Instructions that
begin with FI perform integer versions of the operation. The 80x87 needs
separate instructions to perform many operations on integers because the
operand size (for example, 32 bits or dword) doesn't always imply the operand
type. A 32-bit number can be a float or a long. So when you write FLD dword
ptr [bp+6], this loads a 32-bit floating-point number onto the stack. To load
a 32-bit integer onto the stack, you write: FILD dword ptr [bp+6].
The two instructions both load an operand onto the 80x87 stack, but they use
different conversions to arrive at the 80-bit internal representation.
Another convention is that instructions with a P appended to them pop the
stack after the operation. This is useful when used in conjunction with binary
operations such as addition, as well as with FSTP.


80x87 Arithmetic Instructions


There are several classes of arithmetic instruction on the 80x87. The simplest
class, unary instructions, perform an operation on a single operand. Absolute
value, negation, and square root all require only one operand. Unary
operations are predefined to operate on the stack top. For example, FSQRT
replaces ST with its square root. Operations are listed in Table 3.
Table 3: 80x87 unary instructions: For fsin, fcos, and fsincos ST must have a
magnitude of less than 2{63}.

 Instruction Operation it performs
 -------------------------------------------

 f2xm1 Computes 2{st-1}
 fabs Takes absolute value of ST.
 fchs Negates ST.
 frndint Rounds ST to integer.
 fsqrt Takes square root of ST.
 ftst Compares ST against 0
 and sets FPU flags in
 status word.

 387/j486 only
 fcos Replaces ST with its
 cosine.
 fsin Replaces ST with its sine.
 fsincos Takes sine and cosine of ST,
 leaves cosine in ST, sine
 in ST(1).

Unary operations only operate on ST, so it is often useful to swap registers
around inside the 80x87. The FXCH instruction exchanges the stack top with the
register operand specified. For example:
 FXCH ; Exchange ST and ST(1)
 FXCH ST(2); Exchange ST and ST(2)
Table 4 lists the two-operand operations available on the 80x87. These are the
most general, useful operations -- add, subtract, divide, and multiply.
Because they are used so often, Intel made them very powerful; there are many
variations of each instruction.
Table 4: Two-operand operations available on the 80x87

 Instruction Operation it performs
 ---------------------------------------

 fadd Add
 fcom Compare
 fdiv Compute destination/source
 fdivr Compute source/destination
 fmul Multiply
 fsub Compute destination-source
 fsubr Compute source-destination

All two-operand instructions can be performed between registers on the FPU.
These must involve the stack top, however. You can write: FADD ST(1), ST;
ST(1)+=ST, but not FADD ST(2), ST(3); ST(2)+=ST(3). You can append a P to pop
the stack after performing the operation, as follows: FADDP ST(3), ST; Add ST
to ST(3), then pop.
In fact, one form of this operation is so common that it is implicit. If you
write simply: FMUL; Multiply ST(1) by ST, pop, it is the same as: FMULP ST(1),
ST; Multiply ST(1) by ST, pop.
You can also use memory operands with these instructions. As with
register-register operations, they must involve the stack top. Memory operands
are more restrictive, however -- the stack top is always the destination.
Also, only 32-bit and 64-bit memory operands are supported. You use a memory
operand as follows: FADDdword ptr es:[di]. This adds the 32-bit, floating-
point number located at ES:DI to ST. (Note: Using FADDP with a memory operand
doesn't make much sense -- you would add a number to the stack top, then pop
the result just computed!)
These memory-source instructions have 16- and 64-bit integer forms, as well.
If the number at ES:DI is a 32-bit integer, not a 32-bit floating-point
number, you can write: FIADD dword dtr es:[di].



FSUB and FDIV


I should say a few words here about FSUB and FDIV, the results of which depend
on the operand's order. By default, the source operand is subtracted from (or
divided into) the destination operand. Thus: FSUB ST(1), ST; ST(1) = ST(1)-ST
subtracts ST from ST(1). What if we want the destination to be subtracted from
the source? Then we append an R (for "reverse operands") to the instruction
mnemonic. While FSUB qword ptr [bp-8]; ST=ST-[bp-8] subtracts the 64-bit
floating-point number at [BP-8] from the stack top, to swap the operands you
write FSUBR qword ptr [bp-8]; ST=[bp-8]-ST. Instructions that have reversed
operations and also pop the stack end in RP (for example, FDIVRP and FSUBRP).


Comparisons


The 80x87 lets you compare floating-point numbers in a roundabout fashion. The
FCOM instruction compares ST to a given operand. For example, FCOM ST(2)
compares ST to ST(2) and sets the condition code bits to reflect the results
of the comparison.
FCOM has several forms. The memory form of the instruction compares ST to a
32- or 64-bit memory operand. FCOMP compares ST to the source operand and then
pops ST off the register stack. FICOM takes a 16- or 32-bit integer memory
operand. FCOM with no operands is the same as FCOMP ST(1). Finally, FCOMPP
compares ST to ST(1) and pops both off the register stack.
You examine the comparison results by storing the status word to a memory
operand with FSTSW, then loading it into AX and performing an SAHF
instruction. Intel designed the 80x87 so that the condition code bits in the
80x87 map directly to the carry and 0 bits in the 80x86. (Some other bits from
the status word get mapped to flags as well, but they are not relevant to
comparisons.) Then, because the comparison results are in the carry and 0
flags, the unsigned conditional jump instructions can be used in mnemonic
fashion: JA jumps if ST is greater than the source operand, JB jumps if it is
less then the source operand, and JE jumps if it is equal to the source
operand. solve_quadratic() has an example of a floating-point comparison to
avoid taking the square root of a negative number. Here is an example of how
to jump if ST>ST(1):
FCOM ST(1) ; Compare ST to ST(1) FSTSW [bp-2] ; Store SW to local MOV
ax,[bp-2] ; SAHF ; Flags<-AH JA Greater ; Jump if ST>ST(1)
Note that on machines running the 287 and better, you can write the status
word directly to AX with FSTSW AX.
Figure 1 is a graphical depiction of 80x87 assembler in action. The state of
the coprocessor stack is shown after each instruction in the sequence:
 fld A fabs fld B fadd C fmul fstp D
This sequence computes A*(B+C) and stores it in D, clearing the register stack
in the process. Actual code would probably have size specifiers and addresses
instead of capital letters, of course.
I haven't covered all the subtleties of 80x87 programming here. Writing a sine
or cosine routine on pre-387 coprocessors is not nearly as straightforward as
any of the applications illustrated in Listings One through Three (page 88).
But not covering the partial arctangent, partial tangent, and similar esoteric
instructions is no great loss; these operations are exactly where it's not
worthwhile to write in assembler. My own sine and cosine routines, which run
on all coprocessors, are only 12 percent faster than the library routines for
my compiler -- and the library routines know enough to use FSIN or FCOS when
available (on the 387 and 486). The performance hit is almost entirely due to
the library routine checking whether it can use the faster instruction.
So rewriting library functions isn't usually very helpful; it's when you know
something about the data that speed-ups are possible. For example, in C you
can take numbers to a power with pow(). But if you know the power is an
integer, you can get a significant speed-up.


Integer Powers


Listing One shows intpow(), a C-callable function that takes a
double-precision number x and an unsigned integer y; the function returns x^y
50-90 percent faster than my compiler's pow() library function. I have
examined the library function's code; although it can tell when y is integral,
it takes a while to figure that out. Our function assumes y is integral, and
so can compute the power more quickly. In addition to Listing One, I've
developed a test program that's available electronically; see page 3. This
program is provided in both executable (TESTPOW.EXE) and source (TESTPOW.C)
form. I have also provided include files and a makefile for Borland C++.


Summing Arrays


Another useful function, sumarray(), is shown in Listing Two. It sums an array
of floating-point values. sumarray() works with an array of floats, but it
could easily be modified to work with doubles or 10-byte temporary reals.
(Just change the dword ptr qualifiers to qword ptr or tbyte ptr, and change
the size of the pointer increment to 8 or 10.)
To evaluate sumarray()'s performance, I raced it against a C loop that summed
the same array. I compared the performance of this loop, which did not contain
any function calls or parameter passing, against calls to sumarray(). For
arrays of size 1024, sumarray() is about four times as fast as C code. In
fact, sumarray() is as fast as C even when the array is only of size 2! This
reflects the difficulty compilers have when generating code for the 80x87. The
stackbased architecture is difficult to model, and variables that belong in
80x87 registers often get put in local variables.
Another nice thing about sumarray() is that it preserves greater precision
than the equivalent loop in C. While the C loop repeatedly truncates the
subtotal to 64 bits, sumarray() keeps it in an 80-bit register. This could be
particularly important in an operation such as summation, where small errors
get larger as more additions are performed.
In addition to Listing Two, I've developed a test program that's available
electronically; see page 3. This program is provided in both executable
(TESTSUM.EXE) and source (TESTSUM.C) form. I've also provided include files
and a makefile for Borland C++.


Quadratic Formula


The third, most complicated routine, solve_quadratic(), illustrates a few
things that intpow() and sumarray() did not. For one thing, solve_quadratic
demonstrates how to perform a floating-point comparison.
The other key feature of solve_quadratic is that it passes its results back
via pointers, rather than as a return value. The return value of
solve_quadratic is used to indicate whether roots were found. If no roots are
found (if the discriminant of the quadratic equation is negative),
solve_quadratic returns 0; if it finds a pair of roots, it writes them to the
two pointers passed to it and returns 1.
I haven't measured the performance of solve_quadratic against compiled code; I
didn't need to. Compilers never generate code that looks like solve_quadratic.
If you don't believe me, have your compiler issue the assembler for a few
floating-point intensive functions. You will see some fine examples of how to
keep the register stack as empty as possible when working with the 80x87. Some
compilers are better than others, but they are no match for a determined
human. This is not the compilers' fault; the 80x87 architecture is simply
difficult to issue code for.
Floating-point calling conventions vary from compiler to compiler. The
functions here assume that floating-point parameters are passed as double, and
that floating-point return values are returned in ST. This is the calling
convention for Borland C++ and its predecessors. Calling conventions vary,
however; some compilers may pass floats as 32-bit values rather than promoting
them to double, for instance. Check your manual for your compiler's
conventions.
As with the other programs, I've written a test program that's available
electronically; see page 3. This program is provided in both executable
(TESTQUAD.EXE) and source (TESTQUAD.C) form. I've also provided include files
and a makefile for Borland C++.


Conclusion


If you want to learn more about 80x87 programming, or you want to clarify what
we have covered here, try altering the routines presented here to work
differently. Make sumarray() sum an array of doubles or integers; make
solve_quadratic pass back floats or long doubles instead of doubles. Make the
routines work with a different calling convention, such as that of Microsoft
C. When you can easily make simple changes such as these, you will be equipped
to address your own floating-point applications in assembler.
Well, we've gone over the architecture and instruction set of the 80x87, and
hopefully you're convinced that the math coprocessors are worth talking to
occasionally. This article has only scratched the surface of 80x87
programming, but it should contain enough background material for you to start
learning on your own.


_ASSEMBLY LANGUAGE PROGRAMMING FOR THE 80X87_
by Nicholas Wilt



[LISTING ONE]

; pow.asm: integer power function callable from Borland C++.
; Copyright (C) 1991 by Nicholas Wilt. All rights reserved.

.MODEL LARGE,C

.CODE

; double intpow(double x, unsigned int y);
; Returns x^y.

 PUBLIC intpow

intpow PROC X:QWORD,Y:WORD
 fld1 ; Load 1 into the 80x87
 mov cx,Y ; Get y
 fld X ; Load x into the 80x87

 jcxz Return ; If y already zero, return

TestY: test cx,1 ; Is the LSB of y set?
 jz NextIteration ; Jump if no
 fmul st(1),st ; ret *= x
NextIteration:
 fmul st,st(0) ; Square x
 shr cx,1 ; y >>= 1
 jnz TestY ; Continue if nonzero
Return: fstp st ; Pop stack. Return value is
 ; now in ST(0).
 ret
intpow ENDP

 END







[LISTING TWO]

; sumarray(), a Borland C++-callable function to sum the values
; in an array of floats.

; Copyright (C) 1991 by Nicholas Wilt. All rights reserved.

.MODEL LARGE,C

.CODE

; double sumarray(float *arr, int n);
; Returns sum of the n elements in arr.

 PUBLIC sumarray

sumarray PROC ARR:DWORD,N:WORD
 les bx,ARR ; ES:BX <- pointer to arr

 mov cx,N ; Get number of elements
 fldz ; Load zero
 jcxz LeaveSum ; Leave if count is 0.
DoSum: fadd dword ptr es:[bx] ; Add value in array to ST(0).
 add bx,4 ; Point to next value in array.
 loop DoSum ; Loop until done.
LeaveSum: ; Return value in ST(0).
 ret
sumarray ENDP

 END






[LISTING THREE]

; quad.asm: integer power function callable from Borland C++.
; Copyright (C) 1991 by Nicholas Wilt. All rights reserved.

.MODEL LARGE,C

.CODE

; int solve_quadratic(double a, double b, double c, double *x1, double *x2);
; solve_quadratic takes the coefficients of a quadratic polynomial
; and finds the roots of that polynomial. If there are two real
; roots, it writes them back to x1 and x2 and returns 1. If the
; discriminant b^2-4*a*c is less than 0, the roots are not real
; and solve_quadratic returns 0.
; The quadratic formula is as follows:
; (-b +/- sqrt(b*b-4*a*c)) / 2*a

 PUBLIC solve_quadratic

solve_quadratic PROC A:QWORD,B:QWORD,C:QWORD,X1:DWORD,X2:DWORD
 ; Comments show stack contents
 ; separated by signs
 ; Stack top is at left
 fld A ; a Here, ST(0) contains a.
 fld B ; b a Now ST(1) has a. Etc.

 ; Negate b -- squaring it is sign-independent, and we need b
 ; negated in all the other instances in the formula.
 fchs

 fld C ; c b a
 fld st(1) ; b c b a
 fmul st,st(0) ; b*b c b a

 ; Just plain fxch has an implicit operand of ST(1).

 fxch ; c b*b b a
 fadd st,st(0) ; 2*c b*b b a
 fadd st,st(0) ; 4*c b*b b a
 fmul st,st(3) ; 4*a*c b*b b a
 fsubp st(1), st ; b*b-4*a*c b a

 ftst ; Compare against 0

 ; We've computed b*b-4*a*c. If negative, we return 0.
 ; To do the comparison, we have to store the 80x87 status
 ; word and use sahf to store it into the flags. Once it's
 ; in the flags, we can jump on above or equal (jae) to jump if the
 ; number tested is greater than 0 after the FTST instruction above.
 sub sp,2 ; Quick, allocate a local
 mov bx,sp ; Point BX at it
 fstsw ss:[bx] ; Store the 80x87's status word there
 mov ax,ss:[bx] ; AX <- status word
 add sp,2 ; Deallocate the local
 sahf ; Get SW into flags
 jae ComputeResults ; Jump if positive
 fstp st ; This instruction clears the stack
 fstp st ; we've got three values on the stack
 fstp st ; to clear
 xor ax,ax ; Return 0 - no roots found.
 jmp short LeaveQuadratic
ComputeResults:
 fsqrt ; Find square root of discriminant
 fxch st(2) ; a b sqrt(b*b-4*a*c)
 fadd st,st(0) ; 2*a b sqrt(b*b-4*a*c)
 fld st(1) ; b 2*a b sqrt(b*b-4*a*c)
 fadd st,st(3) ; b+sqrt(b*b-4*a*c) 2*a b sq..
 fdiv st,st(1) ; x1 2*a b sqrt(b*b-4*a*c)
 les bx,X1 ;
 fstp qword ptr es:[bx] ; 2*a b sqrt(b*b-4*a*c)
 fxch ; b 2*a sqrt(b*b-4*a*c)
 fsubrp st(2),st ; 2*a -b - sqrt(b*b-4*a*c)
 fdiv ; x2
 les bx,X2 ; Store x2
 fstp qword ptr es:[bx]
 mov ax,1 ; Return 1
LeaveQuadratic:
 ret ; Return
solve_quadratic ENDP

 END























March, 1992
PORTING UNIX TO THE 386 DEVICE DRIVERS


Entering, exiting, and masking processor interrupts




William Frederick Jolitz and Lynne Greer Jolitz


Bill was the principal developer of 2.8 and 2.9BSD and was the chief architect
of National Semiconductor's GENIX project, the first virtual memory
microprocessor-based UNIX system. Prior to establishing TeleMuse, a market
research firm, Lynne was vice president of marketing at Symmetric Computer
Systems. They conduct seminars on BSD, ISDN, and TCP/IP. Send e-mail questions
or comments to lynne@berkeley.edu. (c) 1992 TeleMuse.


Many devices work on an "interrupt-driven" basis. That is to say, they signal
an asynchronous event by generating an exception, which tells the processor to
come and service them. To support this need, we must have the ability to
enter, exit, and mask various processor interrupts. This is fairly complex,
and merits detailed discussion.
In the original Bell Labs PDP-11/45 UNIX system, classes of devices shared a
particular level of interrupt priority (although each had a distinct interrupt
vector). For example, priority level 5 was associated with terminal
controllers, and priority 6 was associated with mass storage and timeslice
clock devices. In addition, the PDP-11/45 had an spl ("set processor level")
instruction that forced the processor itself to the given level on demand.
This spl mechanism was used to temporarily raise the priority level when
critical sections of code would be executed. These critical sections would
alter the variables or hardware devices upon which the interrupt service
routines depended, so the affected interrupts would themselves be temporarily
"locked out" while the world was changed. This procedure prevented
inconsistencies arising from a race condition between the interrupt code and
the critical sections. A typical case is shown in Example 1.
Example 1: The spl function

 ...
 saved priority = spl5();
 <critical code goes here>;
 splx (saved_priority);
 ...

All spl() functions returned the previous or "old" priority level so that it
could be restored by a subsequent call to the splx() function, which would
then force the priority indicated by its function parameter.
Most ports of UNIX attempted to preserve or emulate this mechanism, frequently
by calling the routines that manage spl(). In 4.3BSD, the routines were
separated out into more descriptive and less enigmatic names. Thus,
splclock(), for example, defeated all timeslice clock interrupts, and splbio()
defeated mass storage devices that depended on the block I/O (or bio for
short) mechanism.


386 ISA Interrupt Mechanism in Detail


Our 386 ISA bus machine uses a different arrangement from those outlined
above; see Figure 1. Unlike the PDP-11, VAX, and 68000, the interrupt priority
control is not part of the processor itself. Instead, it's actually part of
the ISA bus. The 386 relies on two 8259 Interrupt Control Units (ICUs)
attached to the processor, just like an I/O device. So, instead of having
special instructions or registers in the processor to manage interrupts, the
386 uses I/O instructions on dedicated I/O ports. Additional signals to the
microprocessor allow the ICUs to indicate the presence of an interrupt and
pass back information on which interrupt to dispatch to. These ICUs function
as a programmable filter of device interrupts, and the 386 can process these
interrupts as presented or ignore them as a whole.
The ICUs are attached in a cascaded arrangement, with the master ICU directly
connected to the 386 and the slave ICU connected to one of the eight interrupt
input lines that each ICU possesses. Because of this layout, although we have
two ICUs with eight lines apiece, only 15 interrupts are actually generatable
(see Listing One, page 90), as the third interrupt (IRQ2) is not allowed. Even
more confusing the arrangement of relative priority the interrupt priorities
for the slave interrupts (IRQ8-15) are jammed in between IRQ1 and IRQ3. And
finally, to maintain compatibility with the original PC, what used to be IRQ2
is now attached to the slave to IRQ9, with the newer interrupt signals on the
slave ICU (other than IRQ9) available only to the AT or 16-bit wide cards.
Each ICU has three registers full of bits (a bit corresponds to each
interrupt) representing its device interrupt lines: request, "in service," and
mask. Sampled interrupts are present in the request register, and interrupts
that have been signaled to the processor are recorded in the "in service"
register. The mask register is managed by the software to selectively disable
interrupts as needed -- to take them out of the running for interrupting the
processor.
Unlike the PDP-11, each interrupt vector has its own interrupt priority level,
discrete from all others. Also, on the ISA there is no partitioning of
priority levels and class of devices. (For example, mass storage devices,
terminals, and network devices are all spread out and interspersed.) The
choice of interrupt vector frequently is a function of the given combination
of cards present in a system.


Interrupt Group Masks


The ICUs provide for quite a variety of programmable modes of operation, as
befits a part primarily designed for real-time process control applications.
We chose to use it in "fixed" priority mode and implement our own priority
mechanism in software. For this purpose, we created bit mask values that we
could shove into the ICU mask registers to defeat interrupts associated with
each group. Thus, for example, when spltty() is called to block the class of
terminal device interrupts, the group mask __ttymask__, which has a bit set
for each terminal device interrupt to be defeated, is placed in the ICU
interrupt mask register. This is quite different from the original PDP-11
arrangement!
A nested condition, where multiple different spls could be called, is a
possibility. So, the semantics of the spl routines have been altered slightly
to be the OR of all masks posted instead of assignment. Thus, if a sequence of
code called both spltty() and splbio(), the effect would be to defeat both
groups of devices inclusively. There are two exceptions to this: 1. When we
wish to unmask all valid interrupts with splnone() (for example, block none of
the interrupts); and 2. when we wish to restore the previous state of the
interrupt control unit's masks with splx(oldmask). Both of these exceptions
need to assign the mask, instead of just ORing a value to obtain the desired
effect.
These group masks (see Listing Two, page 90) must be constructed from the
interrupts the devices found during the autoconfiguration phase. (See DDJ,
November 1991.) Lists of devices to be configured for each group (such as the
tty) are collected. If the given device is found and attached, the interrupt
determined to be associated with this device (for the tty group, __ttymask__,
for example) is added to the appropriate group mask. Similarly, a mask of all
valid interrupt devices is maintained, so that unused interrupts are never
enabled.
To manipulate the interrupt masks (see Listing Three, page 90) we use the
spl() functions described earlier. They are implemented in the GCC inline
assembler macro fashion, so that they may be called with the smallest
additional overhead. As you might guess from this, the spl() functions are
called frequently, and shortening execution is of value. Please note that two
different versions of these macros are provided, for use with assembler files
and C files, with different calling conventions -- again to minimize overhead.
These functions disable the processor from handling interrupts (cli) so that
the mask can appear to change "atomically." Also, in deference to high-speed
386 systems, what appear to be spurious input instructions are added after
updating the ICU mask register. These instructions do a read of a known
"nonexistent" port. It is known that no data will be forced on the bus, and
that any outstanding output operations on the ISA bus will have been written
out before this instruction finishes. This code is necessary to mitigate the
sins of "clever" hardware that holds port output contents in a "write buffer"
so that output instructions can overlap execution (and it's still quite slow).
If this code is not inserted, when we turn on the processor's interrupt
processing again (sti), the new mask won't have made it to the ICU and we will
be running at the "old" priority for "a while." If this happens at an
inconvenient time (during an interrupt processing routine, for instance), we
might endlessly recurse and overrun the processor. The 386 is very unforgiving
in this regard--the processor will shut down and spontaneously reset itself.
Equally cleverly, the BIOS will then scrub the screen buffer and memory clean
of all evidence of what led to the disaster--a debugging nightmare.
One technique commonly used by PC software developers is to attempt to
memorize the contents of the screen prior to a shutdown. It's amazing what one
can remember during the 100 milliseconds or so that the chip is off doing
diagnostics. During particularly baffling sessions, different colors on
console messages and speaker tones were used to convey the content of
debugging information, because one could more quickly grasp a sequence of
tones or colors than the text of a message. This technique, while many times a
lifesaver, was somewhat difficult to explain to a passerby when the system
would hit one of these debugging sequences. To the uninitiated, the computer
would seem to have gone mad, pouring out messages and tones at a frightful
rate. In addition, such a person would never quite believe that a conclusive
answer could be obtained from such an outpouring, because the text would
usually be incomprehensible, although the arrangement of color and tones would
be unique enough to finger the guilty party. One might consider it a "poor
man's" in-circuit emulator of sorts, although perfect pitch is not usually
considered a requirement for systems programming!


Wiring Interrupts to Drivers


Our operating system drivers are written for the most part in C, thus
leveraging all the benefits of programming in a high-level language, such as
portability and readability. However, we need to somehow connect a device
driver's interrupt function (written in C) to the corresponding hardware
interrupt. This correspondence might change depending on the whims of
configuration, so this process is tied up (like the Gordian Knot) to
autoconfiguration.
On the 386, we use a two-step operation. In processing an interrupt, the 386
indexes the IDT (Interrupt Descriptor Table) to obtain the address of the
interrupt entry stub routine. It then executes the stub to gain entry to the
given driver. We call this routine a "stub" because its whole existence is
dedicated to providing a rendezvous point between the machine's concept of an
interrupt and our operating system kernel's higher-level concept of an
interrupt (for example, an exception handled by a C function).
One way to understand this process is to view it more conceptually. The 386
processes an interrupt by inserting a "hidden" call instruction before the
next consecutive instruction it would have processed otherwise. This hidden
call is an lcall or intersegment call to the function described by the
interrupt's IDT entry. The stub routine adds the necessary semantics (or
"glue") to expand this hidden call into a hidden driver-interrupt call, and
allows the effect to be reversed and the stack cleaned off when the driver
interrupt routine returns.
To allow the 386 processor to find the interrupt stubs, we need to fill out
the Interrupt Descriptor Table entry associated with this. The first 32
entries are used by the processor to deal with its own exceptions, so when we
initialize the ICUs, we have them relocate up their exception index
appropriately. So IRQ0-7 from the first ICU correspond to IDT entries 32-39,
and IRQ8-15 from the second ICU correspond to IDT entries 40-47. (Note: We
don't receive an interrupt on the third interrupt of the primary ICU because
it is used as the cascade input by the second ICU. However, it still requires
an interrupt table entry [34] as a placeholder, even though it is never used.)
The function setidt() (see Listing Four, page 91) fills out an IDT entry with
the required details.



The Interrupt Descriptor Table Entry


Interrupts on the 386 are extremely CISC-ish; they can be directed to
different ring levels (including user process ring 3), and can even context
switch to another process! No other processor we know of has such an elaborate
mechanism for interrupt entry. All details needed in the contents of the
interrupt descriptor (see Listing Five, page 91) reflect this wealth of
flexibility built into the 386. The interrupt descriptor can be a task gate,
an interrupt gate, or a trap gate. The entry can be directed to a separate
process, an interrupt entry stub (with interrupts disabled), or a trap entry
stub (with interrupt status unchanged). Because all these choices are gates
(MULTICS terminology), embedded in the interrupt descriptor is the selector of
another segment, described in the GDT/LDT tables. Along with the selector,
there is an offset into the selector, which locates the entry point within the
selector's segment entry stub. For our needs, the selector is always the
kernel's code selector, because with our current arrangement the only place we
can deal with interrupts is in the kernel. The offset corresponds to the
kernel virtual address of the interrupt (or trap) entry stub.
An additional field to note in the interrupt descriptor entry is the
descriptor priority level. It is used to allow or disallow 386 INT (software
interrupt) instructions to gain entry to this interrupt descriptor, as if an
interrupt had been processed from hardware. On the earlier 8088/8086 (the
original PC), these software interrupts were indistinguishable from hardware
interrupts. In fact, many of the interrupts that MS-DOS uses conflict with 386
and ICU IDT assignments. (Luckily for MS-DOS, this is only a problem for
protected mode and paging.) To prevent software interrupt instructions from
user mode processes from executing device driver code, the description
priority level (dpl) field in the IDT sets the maximum ring level allowed to
execute this IDT entry with a given INT instruction.


_PORTING UNIX TO THE 386: DEVICE DRIVERS_
by William Jolitz and Lynne Jolitz



[LISTING ONE]

/* [Excerpted from /sys/i386/isa/icu.h] */
 ...
/* Interrupt enable bits -- in order of priority */
#define IRQ0 0x0001 /* highest priority - timer */
#define IRQ1 0x0002
#define IRQ_SLAVE 0x0004
#define IRQ8 0x0100
#define IRQ9 0x0200
#define IRQ2 IRQ9
#define IRQ10 0x0400
#define IRQ11 0x0800
#define IRQ12 0x1000
#define IRQ13 0x2000
#define IRQ14 0x4000
#define IRQ15 0x8000
#define IRQ3 0x0008
#define IRQ4 0x0010
#define IRQ5 0x0020
#define IRQ6 0x0040
#define IRQ7 0x0080 /* lowest - parallel printer */
 ...






[LISTING TWO]

/* [Excerpted from /sys/i386/isa/icu.h] */
 ...
/* Interrupt "level" mechanism variables, masks, and macros */

#define INTREN(s) __nonemask__ &= ~(s)
#define INTRDIS(s) __nonemask__ = (s)
#define INTRMASK(msk,s) msk = (s)
 ...
/* [Excerpted from /sys/i386/isa/isa.c] */
 ...
/* Configure all ISA devices */
isa_configure() {
 struct isa_device *dvp;
 struct isa_driver *dp;
 splhigh();
 INTREN(IRQ_SLAVE);

 /* configure devices, constructing group masks as we go */
 for (dvp = isa_devtab_bio; config_isadev(dvp,&__biomask__); dvp++);
 for (dvp = isa_devtab_tty; config_isadev(dvp,&__ttymask__); dvp++);
 for (dvp = isa_devtab_net; config_isadev(dvp,&__netmask__); dvp++);
 for (dvp = isa_devtab_null; config_isadev(dvp,0); dvp++);
/* if we support slip, then any tty interrupt is a potential net intr */
#include "sl.h"
#if NSL > 0
 __netmask__ = __ttymask__;
 __ttymask__ = __netmask__;
#endif

 /* if not enabled, don't allow in ANY mask to become enabled */
 __biomask__ = __nonemask__;
 __ttymask__ = __nonemask__;
 __netmask__ = __nonemask__;
 __protomask__ = __nonemask__;
 splnone();
}
/* Configure an ISA device. */
config_isadev(isdp, mp)
 struct isa_device *isdp;
 int *mp;
{
 struct isa_driver *dp;
 if (dp = isdp->id_driver) {
 /* does this device have any I/O shared memory? */
 if (isdp->id_maddr) {
 extern int atdevbase[];
 /* convert from PC absolute physical to virtual */
 isdp->id_maddr -= IOM_BEGIN;
 isdp->id_maddr += (int)&atdevbase;
 }
 /* "Is there anyone on board?" -- Star Trek */
 isdp->id_alive = (*dp->probe)(isdp);
 if (isdp->id_alive) {
 printf("%s%d", dp->name, isdp->id_unit);
 (*dp->attach)(isdp);
 printf(" at 0x%x ", isdp->id_iobase);
 /* have we got an interrupt to wire down? */
 if(isdp->id_irq) {
 int intrno;
 intrno = ffs(isdp->id_irq)-1;
 printf("irq %d ", intrno);
 INTREN(isdp->id_irq);
 /* add to a group mask?? */
 if(mp)INTRMASK(*mp,isdp->id_irq);
 /* wire interrupt */
 setidt(ICU_OFFSET+intrno, isdp->id_intr,
 SDT_SYS386IGT, SEL_KPL);
 }
 /* perhaps a DMA channel request as well? */
 if (isdp->id_drq != -1) printf("drq %d ", isdp->id_drq);

 printf("on isa\n");
 }
 return (1);
 } else return(0);
}

...






[LISTING THREE]

/* [Excerpted from /sys/i386/include/param.h] */

 ...
#ifndef __ORPL__
/* Interrupt Group Masks */
extern u_short __highmask__; /* interrupts masked with splhigh() */
extern u_short __ttymask__; /* interrupts masked with spltty() */
extern u_short __biomask__; /* interrupts masked with splbio() */
extern u_short __netmask__; /* interrupts masked with splimp() */
extern u_short __protomask__; /* interrupts masked with splnet() */
extern u_short __nonemask__; /* interrupts masked with splnone() */

asm(" .set IO_ICU1, 0x20 ; .set IO_ICU2, 0xa0 ");

/* adjust priority level to disable a group of interrupts */
#define __ORPL__(m) ({ u_short oldpl, msk; \
 msk = (msk); \
 asm volatile (" \
 cli ; /* modify interrupts atomically */ \
 movw %1, %%dx ; /* get mask to OR in */ \
 inb $ IO_ICU1+1, %%al ; /* get low order mask */ \
 xchgb %%dl, %%al ; /* switch the old with the new */ \
 orb %%dl, %%al ; /* finally, OR both it in! */ \
 outb %%al, $ IO_ICU1+1 ; /* and stuff it back where it came */ \
 inb $ 0x84, %%al ; /* post it & handle write recovery */ \
 inb $ IO_ICU2+1, %%al ; /* next, get high order mask */ \
 xchgb %%dh, %%al ; /* switch the old with the new */ \
 orb %%dh, %%al ; /* finally, or it in! */ \
 outb %%al, $ IO_ICU2+1 ; /* and stuff it back where it came */ \
 inb $ 0x84, %%al ; /* post it & handle write recovery */ \
 movw %%dx, %0 ; /* return old mask */ \
 sti /* allow interrupts again */ " \
 : "&=g" (oldpl) /* return values */ \
 : "g" ((m)) /* arguments */ \
 : "ax", "dx" /* registers used */ \
 ); \
 oldpl; /* return the "old" value */ \
})
/* force priority mask to a set value */
#define __SETPL__(m) ({ u_short oldpl, msk; \
 msk = (msk); \
 asm volatile (" \
 cli ; /* modify interrupts atomically */ \
 movw %1, %%dx ; /* get mask to OR in */ \
 inb $ IO_ICU1+1, %%al ; /* get low order mask */ \
 xchgb %%dl, %%al ; /* switch the old with the new */ \
 outb %%al, $ IO_ICU1+1 ; /* and stuff it back where it came */ \
 inb $ 0x84, %%al ; /* post it & handle write recovery */ \
 inb $ IO_ICU2+1, %%al ; /* next, get high order mask */ \
 xchgb %%dh, %%al ; /* switch the old with the new */ \

 outb %%al, $ IO_ICU2+1 ; /* and stuff it back where it came */ \
 inb $ 0x84, %%al ; /* post it & handle write recovery */ \
 movw %%dx, %0 ; /* return old mask */ \
 sti /* allow interrupts again */ " \
 : "&=g" (oldpl) /* return values */ \
 : "g" ((m)) /* arguments */ \
 : "ax", "dx" /* registers used */ \
 ); \
 oldpl; /* return the "old" value */ \
})
#define splhigh() __ORPL__(__highmask__)
#define spltty() __ORPL__(__ttymask__)
#define splbio() __ORPL__(__biomask__)
#define splimp() __ORPL__(__netmask__)
#define splnet() __ORPL__(__protomask__)
#define splsoftclock() __ORPL__(__protomask__)
#define splx(v) ({ u_short val; \
 val = (v); \
 if (val == __nonemask__) (void) spl0(); /* zero is special */ \
 else (void) __SETPL__(val); \
})
#endif __ORPL__
 ...






[LISTING FOUR]

/* [Excerpted from /sys/i386/i386/machdep.c] */

 ...
/* Fill out a gate descriptor uses as an interrupt vector. */
setidt(idx, func, typ, dpl) char *func; {
 struct gate_descriptor *ip = idt + idx;
 ip->gd_looffset = (int)func;
 ip->gd_selector = GSEL(GCODE_SEL,SEL_KPL);
 ip->gd_stkcpy = 0;
 ip->gd_xx = 0;
 ip->gd_type = typ;
 ip->gd_dpl = dpl; /* can we allow INT's to this IDT index? */
 ip->gd_p = 1; /* yup, we're here, don't segment fault me */
 ip->gd_hioffset = ((int)func)>>16 ;
}
 ...






[LISTING FIVE]

/* [Excerpted from /sys/i386/include/segments.h] */

 ...
/* Gate descriptors (e.g. indirect descriptors)

 * [The IDT is made up of these as well.] */
struct gate_descriptor {
 unsigned gd_looffset:16 ; /* gate offset (lsb) */
 unsigned gd_selector:16 ; /* gate segment selector */
 unsigned gd_stkcpy:5 ; /* number of stack wds to cpy */
 unsigned gd_xx:3 ; /* unused */
 unsigned gd_type:5 ; /* segment type */
 unsigned gd_dpl:2 ; /* segment descriptor priority level */
 unsigned gd_p:1 ; /* segment descriptor present */
 unsigned gd_hioffset:16 ; /* gate offset (msb) */
} ;
 ...


















































March, 1992
DEVICE DRIVER MONITORING


A tool for debugging device drivers


 This article contains the following executables: DRVMONIT.ARC


Rick Knoblaugh


Rick is a software engineer specializing in systems programming and is the
coauthor of Screen Machine, a screen design/prototyping/code-generation
utility. He can be reached at P. 0. Box 1109, Half Moon Bay, CA 94019.


Many times when developing device drivers, I've needed a program for
monitoring and logging requests made to a given device driver. Such a
monitoring program would be helpful for tracking the sequence of commands
which lead up to a system crash or other problem. It could also be used for
inspecting various parameters and buffer contents passed to and from a device
driver.
When investigating a problem, a program such as this could help eliminate the
tedious task of setting a breakpoint at the driver interrupt routine, dumping
the request header, inspecting the command code, and looking up the command
code in a manual. It could also be useful in dumping other memory areas
depending on the type of command (that is, dumping using a pointer to a packet
or transfer area), breaking at the completion of the interrupt routine,
inspecting data buffers, and checking the error code in the request header
status field (possibly looking that up in the manual).
Clearly, this was a tool I needed. Consequently, I've developed the device
driver monitoring software described in this article. Benefits of this tool
are the ability to log driver requests and the subsequent I/O that is
performed. Thus, an expected results file can be created, providing the means
to compare the results of an existing driver with one that is currently under
development. Once created, these expected results files can be quite useful in
regression testing.


The Monitor


The complete device driver monitor program, DRVMONIT.SYS, is made up of
several modules: the device driver initialization code, a dispatcher to the
driver functions, the command processor routines, equates and structures, and
data for the program. In all, DRVMONIT.SYS consists of nearly 3000 lines of
code, more than can be presented in this article. However, it is all available
electronically.
DRVMONIT.SYS is itself an MS-DOS device driver. It must be an entry in your
CONFIG.SYS file located anywhere before the entry for the driver you wish to
monitor. This allows DRVMONIT to be available to log the earliest requests
(such as INIT) made to the device driver being monitored.
DRVMONIT provides a log of all driver requests showing the request header
contents, I/O buffers, and actual textual descriptions of the commands and
completion status.


Configuring Logging Options


DRVMONIT is highly configurable in terms of where, when, which, and how much
data is logged. The available configuration options are shown in Figure 1.
Figure 1: List of configuration options available in the device driver monitor

 Log data to printer, screen, or memory

 Display read and/or write buffers (the
 maximum number of blocks to be
 displayed can be configured for each)

 Show request header upon entry to and/
 or exit from the driver interrupt routine

 Start/stop logging after a specified
 number of driver requests

 Log only certain commands

 Optionally stop and wait for a key when
 an error is reported (useful for breaking
 in with your debugger for inspecting
 intermittent problems)

To select various configuration options, change the values in the
configuration data area of the DRVMONIT source code. Alter the config_dat
structure members to reflect the desired logging options. To enable/disable
the logging of individual commands, a log flag located in structures of
command information, cmd_table (described under "Logging commands") must be
altered. The ability to disable the logging of a command really becomes useful
when monitoring a driver which is constantly issuing a certain command (for
example, a console driver which is forever polling the keyboard). As an aside,
it would be useful to add to DRVMONIT a configuration screen where these
values could be changed interactively.
Notice that output can only be directed to the screen, printer, or memory.
Performing I/O using DOS functions is not possible while DOS is processing
requests to device drivers. Direct sector I/O via the BIOS is available, but
managing all of that seemed ugly. The output to memory option can be used by
external programs to write logged data to disk. Such programs can communicate
with DRVMONIT via IOCTL Input and Output commands supported by the DRVMONIT
device driver.
The IOCTL Output command provides DRVMONIT with a far pointer to the user data
area where data may be logged. In response to the IOCTL Input command,
DRVMONIT provides the caller with pointers to its configuration information.
The external program can use the pointers to change configuration information
at any time. In fact, via these pointers, the external program could provide
that interactive configuration facility just described.



Taking Control


DRVMONIT utilizes user software interrupt 60h to get control of a given device
driver. The driver to be monitored must have Int 60h as the first instruction
in its strategy routine. If it is inconvenient for you to actually place this
instruction in your source code, the Int 60h instruction (Ocd60h) can be
patched into the first 2 bytes of the strategy routine of your device driver.
You then indicate that you have done this by supplying the /P switch on the
command line. Following the /P switch, specify the hexadecimal values of the 2
bytes of code that were originally at offset zero of your strategy routine
(for example, /P2e89).
With this one hook into a given device driver, DRVMONIT gets initial control.
Then, by patching the device driver's code and manipulating stack values,
DRVMONIT also gets control upon entry into the driver's interrupt routine and,
upon return to the caller, of the driver's interrupt routine.


Int 60h Processing


The processing of these three monitored areas (strategy, interrupt, and return
from interrupt) is handled via a state machine. When an Int 60h occurs, the
monitor_process routine calls an appropriate processing routine based on the
area from which an interrupt is expected.


Logging Commands


Logging of individual commands is performed by the record_cmd routine. It is
called from process_int and/or process_ret depending on whether DRVMONIT has
been configured to log request-header contents on entry to the driver
interrupt routine, exit from the driver interrupt routine, or both. Record_cmd
calls find_func, which does a table lookup on the command code in the request
header.
The tables provide key information about each command. In order to effectively
log a command, DRVMONIT needs to know the textual description of the command,
whether the command is an I/O command, whether the command supports multiple
functions which have their own packet information (which needs to be logged),
and whether the user wants to log this particular command. The data that
provides this information is found in the cmd_table structure.
To enable/disable the logging of individual commands, change the structure
member, ri_log_flag, to either TRUE or FALSE. To enable/disable individual
functions within a command, change fd_log_flag in the cmd_f_detail structure.
If a command is an I/O command, an offset is provided in element ri_iotbl_ptr,
which points to a table of I/O information. This data is located at io_
cmd_table and follows the format defined in the io_table structure. Using this
information, DRVMONIT can find I/O data to be displayed.
If a command supports multiple functions, an offset is provided in element
ri_func_ptr, which points to a table of function information. This data is
located in the cmd_func_tbl structure. If find_func detects that this pointer
is not NULL, the routine find_sub_func is called to retrieve the following: a
description of the command function; an indication of whether the function
codes are located in the main request block or are contained in a packet area
to which a pointer has been provided; the length of the packet area; and an
indication of whether the function performs I/O, in which case a pointer to
the I/O table information is included.
This table of information is used to handle two types of commands which have
multiple functions within them: the generic IOCTL variety; and the IOCTL
input/output commands used with MSCDEX (the Microsoft MS-DOS CD-ROM
extensions). Note that if you are monitoring a device driver to be used with
MSCDEX, assemble DRVMONIT with msc_flag defined. This will cause CDLOGDAT.INC,
the include file which contains these special tables, to be used.
In the case of the generic IOCTL, the category and function codes are included
in the request header, and a pointer is provided to a packet area. In the
MSCDEX driver IOCTL input/output commands, the function code is not included
in the request header, but is the first byte of a control block area pointed
to by the transfer address pointer. These differences are handled by the
information in the structures.


Final Steps


When DRVMONIT receives control at the return from a driver's interrupt
routine, it finishes its eavesdropping with the following steps: First, it
must always check for the INIT or BUILD BPB commands. This allows it to
determine and store the sector size being used -- a good thing to know when
you're dumping I/O buffers.
The last task is to inspect the error code in the status field of the request
header and print a description. (For example, if the error code is 2, print
"Drive not ready.") Also, if DRVMONIT has been configured to wait for a key in
the event of an error, print a message and wait for any key to be pressed.


Future Enhancements


Enhancements which could be made to DRVMONIT might include a timing function
which would report how long the driver took to process certain commands.
Additional logging output options could be added, such as directing the logged
data to a COM port.
In the future, I may attempt to provide a DRVMONIT which sets up an 80386 V86
monitor. This would enable DRVMONIT to monitor and log the values input and
output to I/O addresses (as specified in the Task State Segment I/O permission
bit map). Thus, for example, you could inspect the particular commands
generated to a controller as a result of your driver receiving a particular
request.
As you use DRVMONIT, you'll discover other uses for it. It might, for example,
be useful for analyzing performance issues. Or, you may simply enjoy watching
the DRVMONIT output. It can sometimes be entertaining as well as practical.























March, 1992
THE AM29000 AS AN EMBEDDED CONTROLLER


Programming a RISC processor




Bob Lowell


Bob is an engineer for Doctor Design and can be contacted at 5415 Oberlin
Drive, San Diego, CA 92121.


One of the more interesting trends in embedded systems development is the
proliferation of Reduced Instruction Set Computer (RISC) processors. This
trend is especially evident in the laser printer market where there is a need
to control the increasingly complex graphics interpretation tasks required by
today's page description languages.
RISC chips, available in volume for under $50, boost graphics processing
performance up to 20 times that of the Motorola 68000 used in the Hewlett
Packard Laserjet and Apple Laserwriter printers. Consequently, graphics that
recently took five minutes or more to print out on inexpensive printers can
now print at an effectively instantaneous rate. This rate is limited only by
the print engine speed. Advanced Micro Devices' Am29000 is a good chip for
applications such as these because it can achieve very high performance
without greatly impacting the hardware component cost of the printer
controller board. Board cost considerations should be made relative to boards
currently designed to run Adobe Postscript or Hewlett Packard's PCL5, the most
popular page description languages today. The specific topics I discuss in
this article relate to printer controller board design; however, the general
concepts are important for the design of just about any 29000-based embedded
system project.


Unique Architectural Features of the Am29000


While there are many available RISC processors with somewhat similar features
and performance claims, the Am29000 has several architectural features that
distinguish it from others. The Am29000 designers focused their implementation
on the paramount factor for peak performance in a RISC microprocessor: a high
bandwidth memory interface. The unique architectural features of the Am29000
that help achieve this high-memory bandwidth, or transfer rate, are the
separate instruction (I) and data operand (D) buses and the branch target
cache. Like many of the other RISC chips available today, the Am29000 can
execute all of its instructions in a single clock. Single-cycle execution is
difficult to achieve, though, because it necessitates use of a single-cycle
memory subsystem. Until recently, memory subsystems designed to provide data
at this high rate were very expensive, typically employing static RAM chips.
And some RISC processors other than the Am29000 are not optimally tuned for
inexpensive (slow) memories; they end up running below their peak rate of one
instruction per clock because instructions and data cannot be transferred
rapidly enough.
The Am29000's memory interface achieves the highest bandwidth possible from
inexpensive DRAM and EPROM/mask ROM memories, making the execution rate of one
instruction per clock realizable. The separate instruction and data bus
interface on the Am29000, commonly referred to as the "three-bus interface,"
should be taken into account by any engineer evaluating a new microprocessor
such as the Am29000 for use in a laser printer controller design.


Practical RISC Memory Design Techniques for a Laser Printer


RISC microprocessors (and some new CISC chips, too) have a bus interface
designed to transfer one 32-bit data word every clock. This is necessary to
maintain the peak execution rate of one instruction per clock. This is
accomplished by sending out a single address for at least four sequential data
words. After the first cycle is run to processor memory, optimized hardware on
the board uses the stored address to access the subsequent data words at one
clock per word, a technique commonly known as "bursting." The memory design
techniques that allow single-clock accesses to sequential memory addresses are
dependent on the type of memory used. Dynamic RAM can use page mode cycles for
sequential accesses at single-clock rates. At higher frequencies,
"interleaving" is necessary to maintain this transfer rate. (Interleaving
involves having multiple banks of memory, with separate control signals. At
the same time one memory bank is accessed, one or more other banks are being
prepared for subsequent accesses. For ROM/ EPROM memory, an interleaving
technique is always necessary to support single-clock accesses at all but the
lowest operating frequencies.)
Designing a single-cycle access instruction memory, generally with interleaved
ROM or EPROM, is more practical than it may sound. Most Postscript or PCL5
controller boards utilize a large amount of nonvolatile memory to store fonts
and program routines. It typically runs between 1 and 2 megabytes. If the
controller board manufacturer uses EPROMs for this memory, it turns out to be
the highest cost item on the board. If there are 16 1-Mbit EPROMs on a board,
they can be interleaved as four memory banks. This enables implementing a
2-Mbyte instruction/font memory with single-clock access capability in a cost
effective way. The EPROMs are already part of the design, whether you support
single-cycle access or not (open up an HP Series III or a Postscript printer
if you're not convinced); the logic to implement interleaving is relatively
minuscule in cost. The argument that interleaving adds little to the
controller board cost is still true when you go to high-volume production. The
EPROMs are replaced by lower-cost mask ROMs and the discrete logic to control
the interleaving is usually put into an Application-Specific Integrated
Circuit (ASIC). At worst, a few extra memory chips and the board space they
require are what it costs to interleave. The Am29000 was designed with this in
mind. No high-performance laser printer controller design using the Am29000
should exclude it.


Cache Replacement Algorithm: Don't Cache What You Don't Need


The Am29000 cache memory stores instructions for full-speed execution, like
other RISC microprocessors. The size of this memory is only 512 bytes. That's
only enough space for 128 instructions; the choice of which instructions to
cache must be made carefully for highest overall performance. The logic that
determines when an instruction fetched from external memory will be stored in
the cache for possible future use was designed according to the "cache
replacement algorithm." The Am29000's cache replacement algorithm is vastly
different from those used in all the other RISC microprocessors. So are the
assumptions behind it.
Most RISC microprocessors cache all instruction accesses from main memory any
time they're made, as long as they're not already in the cache. The assumption
behind this is that the memory interface is too slow to keep up with the
processor's peak execution rate in most cases. This is fine for tight loops
that fit inside the small cache, but not very good as repetitive program
segments become larger and less localized in memory. Cached instruction
sequences are often overwritten by other instruction sequences before the
processor gets around to needing them again.
The assumption behind the Am29000 cache replacement algorithm is that the
memory interface has high enough bandwidth to maintain peak execution rates as
long as program execution is sequential. Once the Am29000 has established an
instruction burst, hardware on the board latches/increments the address and
controls all bus cycles for the subsequent instruction fetches. Data operand
accesses to external memory don't slow the instruction fetches down because
they occur on separate buses. The data accesses use the address bus, which is
freed up after the instruction burst is established, and the D bus.
Instruction accesses occur over the local code address bus, driven by the
latch/burst address counter, and the I bus. The data and code accesses occur
over separate buses during sequential program execution, so the instruction
prefetcher can run at full rate without being slowed down by data accesses.
The peak execution rate is fairly easy to maintain, assuming that each
instruction access occurs in a single clock. There is no point in caching
these instructions because the processor can read them the clock before
they're needed. So instructions that can be accessed in a burst read are not
cached.
When program execution branches, the Am29000 cannot maintain the peak transfer
rate of one instruction per clock. This is because it must drive a new address
out onto the bus, which must then be decoded, latched, and applied to the
memory chips for their access period before an instruction can be read. There
are several other more complex latency factors which may increase the amount
of time it takes to fetch the first instruction after execution has branched.
The Am29000 only caches the first four instructions fetched when a branch to
an uncached address is taken. These four instructions are called a "branch
target." Subsequent branches to a cached branch target will start fetching the
instruction immediately past the branch target as the first instruction in the
branch target executes from cache. This gives the board ample time to provide
the Am29000 the first instruction it needs without slowing it down. What the
cache stores are branch targets. Only 32 branch targets can reside in the
cache at once. This is typically many more branch targets than would reside in
a cache of the same size with a conventional replacement algorithm. AMD claims
a cache hit rate (percentage of time needed instructions are executed from
cache) of 65 percent for most software applications. This doesn't mean that 65
percent of the instructions execute at peak rate, as it would on most
processors. It means that 65 percent of branches execute at peak rate.
Sequential (nonbranch) instructions should execute very close to 100 percent
of the peak rate.


Memory Latency: How Long Should it Be?


The cache replacement algorithm and the optimal instruction memory latency are
intimately related. While the instruction memory should be capable of
delivering one instruction per clock once a burst has been started, the
initial instruction fetch cycle after a branch will take longer to complete,
as described earlier. The branch target cache stores the first four
instructions at the branch address, so four clocks sounds like the right
amount of time for the maximum initial cycle length. It is, but it's a bit
more complicated than that.
The Am29000 uses a technique called "delayed branching" to reduce the time it
spends frozen doing nothing while it fetches an instruction at a branch target
that isn't in the cache when the branch executes. A delayed branch actually
executes the instruction immediately following the branch before transferring
control to the branch address. The instruction following the branch is
referred to as being in the "delay slot." When the Am29OOO is executing the
instruction in the delay slot, it starts the bus cycle to read in the
instruction at the branch address. It would seem that the memory subsystem has
the delay-slot clock plus the four clocks the Am29000 takes executing the
instructions in the branch target cache to return the first instruction
without slowing the system down. That would be five clocks. But to maintain
peak execution rate, the first instruction fetched from memory must be decoded
by the Am29000 in the same clock that the last (fourth) instruction is being
executed from the branch target cache. So the initial access time, or latency,
the instruction memory is designed to have should be held to four clocks.


Don't Waste the Bus Cycle's First Clock Just Decoding Addresses


Many high-speed microprocessors, RISC or CISC, take nearly a full processor
clock to drive a valid address onto the bus, in the worst case. The control
signals to the memory chips often cannot be asserted until the second clock of
a bus cycle. That means that nonbursted accesses to memory, typically data,
may end up being one clock longer than they need to be. The Am29000 address
drivers help alleviate this. A 20-MHz Am29000 guarantees data to be valid 16
nanoseconds from the beginning of the bus cycle. This delay is tested for an
80-picofarad load. To achieve this short delay, the Am29000 address driver
circuits utilize a strong driver that's on in phase one of the first clock of
a bus cycle in parallel with a weak driver that's on for the rest of the
cycle.
It's common to start the memory cycle asynchronously when the system clock
(SYSCLK) falls, starting phase 2. If this is done with a high-speed logic
device, almost all of the second phase of the bus cycle's first clock can be
included in the memory access cycle. By not waiting until the beginning of the
second clock to start the memory cycle, you save 25 nanoseconds on a bus cycle
to memory at 20 MHz. Those 25 nanoseconds may allow a memory cycle to run with
one less processor waitstate; or they may allow using slower memory chips
without adding processor waitstates.
The ideal memory latency for data accesses wasn't mentioned in the previous
discussion on memory latency because it's much more complicated than
instruction memory latency. It's clearly less than four clocks. It depends on
factors such as how well the compiler can schedule the Am29000 load and store
instructions relative to when they're needed in execution. It's best to keep
data memory latency as short as possible. The short address valid delay helps
in this respect. The other significant advantage the Am29000 has is, again,
the separate bus for data operand accesses. Consider for a moment the higher
latency an internal data request from the processor would see if: 1.
Instruction and data accesses shared the same bus to external memory; and 2.
An instruction access was already in process when the data request was posted.
To be fair, the competing chips use on-chip write buffering and load
scheduling in the compiler to offset these problems somewhat, but a dedicated
data operand bus is a better solution. Whether the graphics data to be
processed is in an intermediate or final (bitmap) form, a low-latency data bus
increases performance. Certain routines can be programmed for burst access if
it's deemed optimal for their data. The Am29000 supports that, too.


Difficulty #1: Correctly Handling the Bus Invalid Signal



The Am29000 asserts a signal called Bus Invalid (BINV*) to indicate that a bus
cycle it has started must be aborted. If you were designing something such as
a workstation, an assertion of BINV might mean something else, but in a laser
printer it can't usefully mean anything else. This signal comes out in phase
two of the first clock of the bus cycle. For performance reasons, it's
advocated (and proven in a number of our designs) to start a cycle at the
beginning of phase 2 of this first clock. This means the memory control
signals (say RAS) will already be asserted by the time BINV can reliably be
sensed. The designer must gracefully abort the cycle. If it's a cycle to DRAM,
RAS cannot be pulled away immediately. Instead, the Bus Invalid signal must be
latched so that the control logic remembers that this is an aborted cycle. The
latched BINV signal is used to disable CAS, and RAS terminates when it would
in a normal DRAM cycle without BINV asserted. For EPROMs, the cycle can be
aborted as soon as convenient. It's often convenient to latch a signal like
BINV in the PAL that generates the memory control signals or the state
information for generating them. Unfortunately, the long setup times of
inexpensive 15-nanosecond clocked PALs preclude their use for state machines
in this application at 25-MHz operation and above.


Difficulty #2: The I/D Bus Contention Problem


In most cases, a practical design ends up having "swap buffers" connecting the
I and D buses so that code can run in data memory and data accesses can be
made from code memory. This runs contrary to the design philosophy of the
Am29000 and reduces its performance by precluding parallel operation when the
swap buffers are on. There are a number of situations where it's useful,
however. Most printers take font or emulation cartridges in the same socket.
The 2-Mbyte code/font memory advocated earlier in this article assumes data
accesses will be made to the interleaved code memory to get outline font
information. The fonts cannot practically be stored in chips separate from the
code. In fact, they take up more space than the code. But because both PCL5
and Postscript employ font caching in DRAM, font data accesses to the code
memory will be limited, and there is little performance penalty for putting
them together.
Hooking the two buses up does cause problems, though. If the Am29000 is
reading data on the D bus and it needs to write data out on the same bus, it
waits one clock to start the write cycle. This gives the memory subsystem on
the board time to turn off drivers from the read cycle and avoid contention.
But if the Am29000 is executing out of data memory, say DRAM, and it also
tries to write to that memory over the D bus, which is possible, a contention
will result unless an extra set of buffers (beside the swap buffers) is used.
This set of buffers goes between the D bus pins on the Am29000 and the D bus
connections on the rest of the board.




















































March, 1992
THE LOTUS OPEN MESSAGE INTERFACE


A platform-independent method for electronic mail




Al Stevens


Al is a contributing editor for DDJ and can be contacted at 411 Borel Ave.,
San Mateo, CA 94402.


There is a new API that will soon allow you to add electronic mail to your
applications with less work and more portability. That's because last October,
Lotus Development Corporation released the first draft of the Open Messaging
Interface API Functional Specification. OMI defines a standard,
platform-independent method for applications to exchange electronic mail. IBM
and Apple collaborated with Lotus in the definition of the specification.
Lotus is committing its own applications -- AmiPro, 1-2-3, and Freelance
Graphics -- to the OMI API, and they hope the rest of the industry will accept
and endorse it as well. At the time of this writing, the OMI API specification
is still in draft form, but the final specification should be available by the
time you're reading this article. Software mail drivers will soon follow to
implement the API on the various platforms where applications run.
The appearance of a standard API for electronic mail is a significant event
for applications developers. As you'll see later in this article, desktop
applications need to communicate among users, and until now there has been no
standard way to do that. OMI is reminiscent of the Lotus-Intel-Microsoft
Expanded Memory Specification (EMS) and the Microsoft, Intel, AST Research,
and Lotus eXtended Memory Specification (XMS) in that all three APIs define
standard ways for applications developers to access common system resources,
and the standards result from the cooperative efforts of multiple industry
giants. Third-party developers provide the necessary device drivers that
implement the APIs, and that will be true of OMI, too.
But OMI goes a step beyond EMS and XMS. It defines an API for the functions of
a complete electronic mail software process rather than for a hardware
architecture. For OMI to succeed, the vendors of operating systems and
networks must support it with API software libraries. Significantly absent
from the OMI specification designers is Microsoft. Perhaps without Microsoft,
OMI will test how well a PC API standard will fare without the support of the
number one software giant.


E-Mail Applications


We usually think of electronic mail as a stand-alone application where users
write messages to one another. The messages are like letters, and, in most
mail applications, the messages consist of human-readable text, can include
non-text files as attachments, can have several "carbon copy" recipients, and
will wind up in the receiver's message databases to be recalled and reread.
Such applications abound, but often users who need to communicate with one
another cannot because they use incompatible systems. Furthermore, many
non-mail applications need to exchange messages and files beyond the
boundaries of one user's workstation or a LAN. These two circumstances create
the need for a standard way for remote applications to exchange data.
What other kinds of applications require mail capabilities? We could dream up
many vertical applications that need to exchange text and data files. Some of
the best examples, however, are horizontal, and the first one that comes to
mind is the spreadsheet. (No wonder Lotus is committed to this project.)
Suppose you are a user who has just completed a complex spreadsheet. It is
time to disseminate the data to the other members of the project. What are
your options without electronic mail? If you are on a network, you can copy
the spreadsheet file to a public network directory and call everyone and tell
them to get the file. If you are not on a network, your communications program
can call the group members one-by-one and upload the file. Or you can copy and
distribute diskettes. Of course, it's easier if you and your coworkers have
compatible conventional mail systems. You note the spreadsheet's filename,
exit the spreadsheet application, enter the mail application, and send the
file to everyone. With luck, your electronic mail program resides in memory or
runs in a multitasker's window or task swapper's partition, and you don't have
to exit the spreadsheet program. But you still need to deal with three
possibly different user interfaces: the spreadsheet program, the operating
system to switch to the mail program, and the mail application itself.


Mail-Aware Applications


Suppose, instead, that your spreadsheet application itself is "mail aware." A
mail-aware program can send and receive electronic mail in a standard format
through a standard delivery system without involving a separate electronic
mail program. You simply tell the spreadsheet program to mail the file to the
group. You do not need to exit to a separate mail application. You can even
send a descriptive note along with the file so the receiver knows what you are
sending. If most of the group is using the spreadsheet application when their
mail arrives, they receive the descriptive note in a message window, and the
new file pops into their spreadsheet screen. Mission accomplished. But suppose
that George is not in the spreadsheet program; he is writing a report when the
mail shows up. If George has a mail-aware word processor with a compatible
mail delivery system, then the word processor notifies him that the mail is
in, and he reads the descriptive note from within the word processor. Of
course, a word processor can't do much with a spreadsheet file, but the word
processor's mail system can deliver the note to George and store the
spreadsheet file where it ought to go. George knows that the file has arrived,
and the next time he runs his spreadsheet application, there the file is.


Delivery Systems


Until now, such integration was possible only when the two applications came
from the same vendor, or when the two vendors collaborated on the interface
(which rarely happens) or independently developed their programs to operate
within a defined message delivery environment. There are such environments.
For example, most Netware electronic mail applications use the Netware Message
Handling Service (MHS) to send and receive mail between remote users. MHS is a
message delivery system. If your application builds a message in a precise
format and writes it into the appropriate subdirectory on the file server,
then MHS will deliver the message. If you and your mail application are
properly registered, MHS will collect messages addressed to you and deposit
them into the right place on the file server. Your mail application must
observe the arrival of the message, retrieve it, and interpret its contents.
MHS runs in a Netware network and on stand-alone work stations and exchanges
messages with other MHS systems. By itself, it will not exchange messages with
other delivery systems. For that, you need a gateway.
A gateway converts and copies messages from the format and location of one
system to those of another. If your mail application deposits a message to a
user in another system supported by the gateway, your message delivery system
gives the message to the gateway. The gateway knows the formats of the two
delivery systems it connects. The gateway translates the message and sends it
to the other system. When that system sends messages back, the gateway
translates them into the format of the local system and delivers them. In such
an environment, you develop your mail application to run with a chosen
delivery system (MHS, for example) and hope that gateways exist to connect you
to any other systems that your users might want to talk to.


The Parts of E-Mail


A typical electronic mail application consists of four parts: the user
interface, the address book, the message database, and the message delivery
system. Without OMI, every application must manage the first three and tightly
bind to the fourth. Many applications do their own message delivery for local
users and depend on gateways and message delivery systems to exchange mail
with remote users. That dependence requires the application to not only have
their own local delivery strategies but to conform to the conventions of the
message delivery system as well. The OMI API removes that dependence by
providing a common interface between the application and the delivery of all
messages, whether the senders and receivers are local, or whether the exchange
involves a message delivery system.


The OMI API


The OMI API is a common programming interface to all but the user interface
functions of electronic mail. It does not eliminate gateways because it does
not prescribe file formats, storage locations, or transmission media and
protocols, but it does eliminate the need for applications to be aware -- at
the source code level -- of the conventions associated with a particular
message system. And it makes an OMI-conforming application compatible with all
OMI environments.
An application conforms to OMI by calling functions that establish sessions,
send and receive messages, read and update user address books, and maintain a
message database. The application supplies the user interface and has no
concern for the formats and locations of the data files. The underlying OMI
engine takes care of that, thus resolving the question of which message
delivery system the application should support. If the industry embraces OMI
as Lotus expects, all popular platforms will have OMI libraries for developers
to use. In the worst case, if you wanted your OMI-conforming application to
run with some obscure message delivery system that had no OMI library, you
could develop the library yourself. The effort would be no more difficult than
the conventional route of binding your application directly to the message
delivery system. Once you finished the library, you'd have another product to
sell -- a new OMI engine for the heretofore neglected platform.
So, how does OMI promote portability for your application? Suppose you write
an OMI-conforming mail application that runs on PCs. If the NetWare network
has an OMI engine, you can build a NetWare version of your program simply by
linking your program modules with the NetWare OMI function library. If the
3-Com network has a similar OMI engine, you can as easily build a 3-Com
version of your program. Now, if either NetWare or 3-Com has a gateway to the
other, your Netware users and your 3-Com users can communicate. Furthermore,
if the systems have gateways to online services such as CompuServe mail, the
users can exchange messages on the service by using your application.
Your application can also talk to other conforming applications where either
one has a gateway. And that goes for applications running in totally different
host systems, too. DOS users can communicate with Macintosh users, and they
both can communicate with mainframe users, and so on, as long as everyone has
OMI-conforming applications with OMI engines installed and the necessary
gateways. Additionally, your application can communicate with other
applications on the same platform, provided that the application is mail aware
and OMI-conforming.
Many OMI API implementations will be function libraries. Programs will include
the library when they link. Others will be implemented as Dynamic Link
Libraries. For example, a Windows OMI API will work through calls to an OMI
DLL. Any OMI-conforming Windows application will execute properly no matter
what underlying message delivery system is in place. You won't need to relink
the application to a different OMI library, either. The user's Windows
installation would load the correct copy of the OMI DLL, and your application
would work the same no matter which one Windows loads.


OMI Functions



There are about 50 OMI functions divided into these seven categories: the
Standard Send function, Session Management functions, Message Creation and
Submission functions, Message Store functions, Message Access and Attribute
functions, Address Book functions, and Common Object functions. The OMI API
specification describes functions, data types, constants, and error codes in a
generic C-language context. It does not say whether the function definitions
have such things as far or pascal specifiers, no doubt leaving such
platform-dependent details to the implementer.


The Standard Send Function


The Standard Send function is one of the most interesting parts of the OMI
specification. It is a boon for applications developers and an albatross for
library implementors. With it, any application can be mail aware with little
more than an extra line of code. It works on the sending side of electronic
mail and includes in one function call everything an application needs to send
electronic mail.
To send a message, an application calls the Standard Send function. The
parameters include a list of recipients, an attachment file specification, and
text for the message and its subject. Here's where the magic comes in. If
those parameters are NULL, the OMI library takes over by prompting the user
for whatever is missing. This means that an implementation of OMI must include
the ability to pop up windows with which the user can type message text,
select recipients, and specify file specifications for attachments. This is a
significant feature if you are an application developer and you want to add
electronic mail to your application. To gain that e-mail check mark on the
bullet lists of magazine reviews, you simply add an option that calls the
Standard Send function with NULL parameters. Voila. E-Mail. Your application
will be able to send user-composed mail messages with user-specified
attachments to user-selected recipients. Although the user cannot receive any
mail, your application qualifies nonetheless as being mail aware. That's why
the Standard Send function is a boon to applications developers. Upon close
inspection, however, one might conclude that the capability has no more power
than a memory-resident mail program that pops up over the application. In my
opinion, the OMI specification could do as well without the user-prompting
requirement. Here's why.
If the Standard Send function is gravy to the application developer, it is a
burden to OMI library developers. They have to write the user interface code
for the selection list boxes, a text editor, and video window pop-ups over the
application, code that many applications will never use. It is apparent that
the OMI designers targeted environments such as Windows and the Macintosh
where the GUI manages a common user interface. But library developers for
text-mode environments such as DOS will have to handle the user interface
without help from the operating system. Those processes will add to the size
and complexity of the OMI library, and their looks and feels will seldom be
the same as those of the applications that they support.
Do not assume, however, that the Standard Send function has no value. The
strength of the Standard Send function is not in its ability to wedge an
electronic mail function into just any application but in its support for
nonmail applications to send data files with no other electronic mail
requirements. Such an application would develop the file and text parameters
for the Standard Send function based on the application's knowledge of the
data. There are several options presented by the function. The application
might omit the recipient parameters and let OMI prompt the user for that
information, or it could use its own user interface along with the OMI Address
Book functions to build a recipient list. A closed system might use embedded
recipient data, and the Standard Send function would be the application's only
interface to OMI.
The Standard Send function is the only part of OMI that requires a user
interface. The OMI specification does not say what that interface should look
like, only what it should do.
Although the OMI specification does not address the issue of partial
implementations, a minimal OMI library implementation might be no more than a
Standard Send function without the user interface. This implementation would
support those applications that provide all the parameters. A second-level OMI
library might eliminate the Standard Send function altogether and support
those applications that use their own user interface and the balance of the
OMI functions to manage everything else. Of course, a complete OMI library
would include everything.


Session Management Functions


To log into OMI, an application opens a session. You tell the session where
the message store database is, the user's name and password, and what
character set to use. Each session has a handle, which the mail program uses
in subsequent calls to OMI. This handle allows a single copy of OMI to support
multiple sessions in a multitasking environment, so theoretically you can have
your mail-aware spreadsheet and word processor programs running at the same
time.
The Session Management functions perform processes related to the user and the
specific OMI session. One of the functions tells the mail program about the
operating parameters of the particular OMI implementation. This allows a mail
program to adapt itself to the local environment. The operating parameters
include the maximum size of a message's text, the text format that is
supported, character sets, and file attachment types. Other Session Management
functions return the user's name, log the user into the OMI system, and
validate the password. If your program uses only the Standard Send function,
you do not need to open a session. The Standard Send function takes care of
the session for you.


Message Creation and Submission Functions


You use these functions to construct a mail message and deposit it into the
OMI system to be delivered. You must have opened a session and interacted with
your user to gather all the components of the mail message. A mail message
consists of lists of recipients, including original, carbon copy, and blind
carbon copy recipients, text and graphics, a file attachment, a subject, a
delivery priority, whether or not the sender requires delivery notices or
return receipts, whether or not the message is to be encrypted, if it should
have a signature stamp, and whether or not the message will be saved in the
sender's message archive. The Message Creation and Submission functions allow
you to identify each of these components and options to OMI for each message.
You submit a message as an original message, a reply to a message you
received, or to forward a message to other recipients.


Message Store Functions


The message store is the message database that the OMI system maintains. The
API specification does not dictate the formats or retrieval techniques that
the data base will use. The message database, also called the "mailbox,"
contains the incoming mail in its inbox and messages that the user has filed
in its archive.
The Message Store functions support the organization of messages into
user-defined categories, also called "folders" in some systems. There are
functions that move, copy, and delete messages among the categories, that
enumerate the messages and categories in a mailbox, and that query the message
store for unread messages.


Message Access and Attribute Functions


These functions retrieve messages and their attributes from the message store.
The application uses these functions to read incoming mail and to reread mail
that the user previously filed in the archive.
An OMI application can use a message signature stamp, which is a binary copy
of something that uniquely identifies the sender. The Message Access and
Attribute functions support the signature with a verification function. OMI
does not define the details of the signature, only the method used to request
its verification.


Address Book Functions


Most electronic mail systems support at least two kinds of address books: the
public address book that defines all the possible recipients for messages that
the system originates, and a private address book for each user who originates
messages. Many systems support distribution lists as well, where the
originating user specifies the name of the list, and the system delivers a
copy of the message to each of the users in the list. Systems that connect to
outside mail applications will associate remote users with the necessary
routing information so that the delivery system or gateway can properly
deliver the message. Some systems include a default destination for
unidentified users so that a remote delivery system can attempt the delivery.
OMI supports the interface with the message delivery system's addressing
capability with functions that retrieve group and individual addressee
information and add, change, and delete entries in address books. An
application can support multiple books, and OMI will report the names of those
books to the application.


Common Object Functions


The last category of functions in the ~ API supports common operations across
the other categories. There are functions to close sessions, get object
attributes, retrieve the names of objects, and locate entries in list objects.


Summary



Any programmer who has worked on electronic mail applications will appreciate
the significance of the OMI announcement. For the first time, applications of
all kinds will be able to exchange data across computer, operating system, and
language boundaries without needing to integrate themselves into complex,
multiple-delivery architectures. Applications that run on several platforms
will have one less platform dependency to manage. Programmers will learn only
one API in order to build mail-aware applications in many environments. And
our software industry will have taken one more step forward in the advance of
standard solutions to common problems.





























































March, 1992
 PROGRAMMING WITH COMMUNICATION PROTOCOL STACKS


Communication protocol stacks enable a high-level interface




Gordon Free


Gordon, a software architect with Traveling Software, can be reached at
Traveling Software Inc., 18702 North Creek Parkway, Bothell, WA 98011; or
through MCI mail as Gordon Free EMS: 402-5717 MBX:HAL/MISSLINKS/GordonF.


In case you haven't noticed, a quiet revolution has been taking place.
Programs that once lived in isolation are now using technologies such as DDE
to talk to other programs. PCs that once sat lonely on the desktop are now
networked together, and systems now come standard equipped with modems. In
this "age of communications," users routinely expect a high degree of
intercommunication, which has placed additional demands on software
developers. Writing reliable communications software is a demanding task and
can eat up a huge chunk of a project's resources. Although there are many
communications libraries available to ease the burden, most operate at only
the lowest level and still require the application writer to know a great deal
about the underlying hardware with which it communicates. Ideally, the
application developer should have to concentrate only on the task of
interfacing with the remote application and not worry about how data gets
there. This requires a fairly high-level interface that is independent of the
underlying transmission media.
The ISO reference model specifies just such an interface, and Traveling
Software has developed a communications engine based on that specification.
This article explores that engine and presents a file transfer program that
uses it.


The ISO Reference Model


The ISO model specifies seven layers of abstraction, each successive one more
removed from the underlying hardware than its predecessor. These seven layers,
shown in Table 1, are referred to as a "protocol stack." Note that each layer
can be completely replaced with a new implementation without affecting other
layers. Typically, the ISO model refers to communications over LANs and WANs,
but it can be extended to almost any connection medium.
Table 1: The seven layers of abstraction as specified in the ISO model

 Layer Description
 ----------------------------------------------------------------------

 Physical Defines how data is electrically moved between two
 physically connected machines.

 Datalink Breaks data up into manageable packets and ensures that
 they arrive intact and in order.

 Network Determines route data will take to reach target machine.

 Transport Maintains virtual connections between local and remote
 machine.

 Session Maintains communication sessions and may include ability to
 reestablish broken connections automatically.

 Presentation Specifies the format in which data will be represented.
 This is especially important when communicating across
 different platforms.

 Application The user application level which determines the contents
 and meaning of data.

The ISO model protocol provides an application with a reliable link to a
remote application. Data is guaranteed to arrive intact and in order. Once the
application submits data for transmission, it can assume the data will arrive
without further intervention.


Blackbird


Blackbird, a communications engine based on the ISO reference model
architecture, was developed by Traveling Software for use in its file transfer
software. The Blackbird library provides the application with a reliable
connection to remote services, regardless of the transmission media connecting
the two machines. Server applications register a service name with Blackbird
which remote clients can query. A client requests a connection to a remote
service and receives a Virtual Connection (VC) handle similar to a file
handle. Communications can then proceed with read and write operations that
use the VC handle. It's called a "virtual connection" because multiple
connections can time-slice the transmission medium simultaneously. It appears
to the application, however, that it has exclusive use of the communications
channel. Internally, Blackbird implements the four lowest layers of the ISO
model. Blackbird handles initialization of the low-level hardware -- splitting
up the application data into manageable pieces (packets) -- and
retransmissions of data corrupted in transit. Table 2 provides a list of the
available Blackbird API functions.
Table 2: Blackbird API routines


 API Function Description
 -------------------------------------------------------------------

 bbAppPoll Give a slice of CPU time to aid Blackbird.
 bbAppPollStatus Set or clear exclusive application polling.
 sGetRemSvc Get remote service information.
 vcConnect Connect to remote service.
 vcDisconnect Disconnect from remote service.
 vcEnd Uninstall Blackbird.
 vcGetPortHandle Get the handle of the port associated with a VC.
 vcInit Initialize Blackbird.
 vcListen Broadcast a new remote service.
 vcModemAbort Abort a modem call in progress.
 vcModemDial Dial out over a modem.
 vcModemHangup Hang up a modem.
 vcPortDetect Detect and initialize available H/W ports.
 vcPortQuery Request communication port configuration info.
 vcPortSet Set communications port configuration.
 vcPortVCDisconnec Disconnect all virtual connections over a port.
 vcQuery Query for list of remote service handles.
 vcRead Read data over a virtual connection.
 vcReadTerminate Terminate a pending read operation.
 vcRemsvcUpdate Request notification of new remote services.
 vcSelfTest Perform a selftest on specified port.
 vcVersion Obtain the Blackbird version string.
 vcWrite Write data over a virtual connection.
 vcWriteTerminate Terminate a pending write operation.

One of Blackbird's strengths is the variety of transmission media it supports.
All this diversity is hidden from the application (because it deals only with
remote services and VC handles). Any number of transmission methods may be
simultaneously managed by Blackbird. For example, Blackbird supports
communications between two computers connected through serial ports using a
null modem cable. These ports can be any of COM1 through COM4. The
application, and hence the user, need not be concerned about which ports are
used (although Blackbird does provide the ability for direct manipulation of
the ports). A proprietary protocol at the physical layer allows a serial
communications rate in excess of 200 KBaud.
Additionally, Blackbird provides even higher transmission rates for machines
connected through parallel ports (LPT1 through LPT3). Because data can be
transmitted over as many as eight wires at once, much higher transfer rates
are possible. The exact rate depends on the CPU and I/O speed of the two
machines. Blackbird also supports communications between two distant machines
connected via modems with data rates between 1200 and 38,400 baud. This mode
of communication does require some additional support from the application (in
particular, modem initialization and dialing).


Function Callback


Depending on the bandwidth of the communications channel used, read/write
operations may not complete in a timely fashion. It is therefore desirable for
the application to continue processing while communications proceed in the
background. At some point, the application will need to be notified that the
operation has completed. This is done through the use of callback routines
which the application provides to Blackbird as a parameter to the read and
write functions.
This approach works especially well with applications structured around an
event-driven architecture (such as Windows, TurboVision, or D-Flat). In this
environment, the callback routine can simply post an appropriate message to
the object or window responsible for handling the event.
The callback paradigm works fine for standard procedural architectures as
well. In this case, additional considerations must be taken into account. The
callback is an asynchronous event which can occur at any time. It will
typically occur as a result of a hardware interrupt, and as such, the
application is limited to tasks it may perform inside a callback routine. Of
course, the golden rule for any interrupt routine is to minimize the amount of
time it spends processing. Because the callback routine is part of the
interrupt handler, throughput is maximized by keeping callback processing
short and to the point.
Interrupts can occur at any time; therefore, applications should not make
assumptions about which stack is in use at the time of a callback--it's
unlikely to be the application's own stack. As such, the application should
not make assumptions about how much free space is left on the stack and should
minimize the amount of local variable space used. With large-model
applications it may be safest not to assume that DS will equal SS during
execution of callback routines.
Most C compilers generate code to test for stack overruns at the start of each
subroutine. Because the current stack in use at the time of callback may not
belong to the application, this test will likely fail, and an erroneous stack
overrun may be reported. This can be fixed by disabling the compiler from
generating the stack-checking code for callback routines. This is usually done
as a command-line option, but has the disadvantage of disabling it for all
routines. A #pragma can be used to disable stack checking for those routines
that don't need it. Remember, stack checking must also be disabled for any
routines called by the callback routine. This includes runtime library
routines. Your compiler documentation should indicate which routines are
available without stack checking. The Microsoft C runtime library routines
which have stack checking enabled include execvp, execvpe, fprintf, fscanf,
printf, scanf, spawnvp, spawnvpe, sprintf, sscanf, system, vprintf, and write.
Callbacks occur asynchronously, so the application has no way of knowing what
operation was in process at the time of the interrupt. It could likely be a
DOS operation such as file I/O or a keyboard input routine. Because DOS is not
reentrant, callback routines should not make DOS calls unless they can ensure
that it is safe to do so. (This same method is used for TSRs.) Note too that
callback routines are themselves subject to interruptions and should therefore
be rewritten as reentrant. Interrupts should be disabled to protect any
critical sections of code.
It is entirely possible for an operation to complete before the initial
request returns to the application. This would result in the application's
callback routine being called before the original read or write call returns!
Because of this, the application should initialize any information needed by
the callback prior to submitting the request. Given all these restrictions, it
is safest to limit callback routines to setting global variables and then
setting a semaphore which the main-line code can poll on a periodic basis.


Applications


Blackbird can be used to support almost any application requiring
machine-to-machine communication. One of the most common applications is to
transfer files from machine to machine. Not surprisingly, Blackbird was
designed with this in mind and provides additional features to support it. In
particular, applications can specify "breathe" routines, which get called
between each packet. This allows timely reporting of progress (perhaps
updating a bar graph) as well as fairly quick aborting of operations.
Another common use for remote communications is electronic mail and messaging.
While this can be accomplished using standard network support, a peer-to-peer
method such as Blackbird allows instantaneous notification and does not
require a dedicated server machine. Also, remote control is an increasingly
popular type of remote application. It allows a user to access a machine via
modem or network. Data traffic tends to occur in bursts and in smaller amounts
than with file transfer. Blackbird provides mechanisms for handling small
amounts of data with less overhead that required for larger data buffers.
Finally, a peer-to-peer library such as Blackbird could be used to extend
Dynamic Data Exchange (such as found in Windows) to include applications
running on remote machines.


A File Transfer Program


To illustrate how you write programs using the Blackbird engine, I wrote a
sample file transfer application, called "FT," which uses Blackbird to
exchange files between two machines. The two machines may be connected via
serial or parallel cables, network, or modem. In the interest of clarity, FT
only transfers files smaller than 64K and does minimal error checking. The
user interface is also rather primitive.
One of the more interesting aspects of FT is the fact that one module serves
as both a server and a client application. While it would be possible to write
separate server and client applications, the chosen approach allows a machine
to act as a server while simultaneously accessing another as a client. This
creates little overhead to the application because Blackbird handles all
routing internally.
Listing One, page 92, presents ft.h. The main portion of FT (see Listing Two,
page 92) simply consists of a small initialization section and a loop which
alternates between checking for user requests from the keyboard and completed
Blackbird events. As with most communications libraries, the first step to
using Blackbird is to initialize its internal structures and hook any
necessary interrupt vectors. This is done by calling vcInit with a machine and
group name. FT gets the machine name from the command line. If a machine name
is provided, FT goes on to register a file transfer service, the availability
of which will be broadcast to all remote machines. This is done by calling
vcListen with the name of the service (FXFR), a group name, and two callback
routines. The first callback routine will be called by Blackbird when a remote
application requests a virtual connection. The second specifies a routine to
be called anytime the connection gets broken.
Applications may query Blackbird for a list of available remote services at
any time. Blackbird calls vcQuery to get a list of remote service handles. It
then calls sGetRemSvc for each handle; see DisplayServices in Listing One. FT
takes advantage of a feature in Blackbird which allows automatic notification
anytime a new service appears or goes away. This is done by calling
vcRemsvcUpdate immediately after calling vcInit. FT provides Blackbird with a
callback routine (in this case cbRemService) which Blackbird will call
whenever the remote service list changes. cbRemService simply sets a global
flag, fUpdateServices, which tells CheckEvents that it needs to display the
list of remote services.



Establishing a Virtual Connection


The user requests FT to establish a virtual connection with a remote machine
by pressing the handle number associated with the remote service. FT allows
only one outgoing virtual connection at a time. It will support any number of
simultaneous remote clients, however (within the limits supported by
Blackbird, which is currently about five).
When a virtual connection is established, each side allocates a buffer for use
in transferring data and stores pertinent information into the svc structure.
A vcRead is the posted to allow reception of remote requests and responses.


Processing User Requests


The user requests a file to be sent to or from the remote server by pressing
the S or R keys, respectively. If the request is for a send operation,
DoUserRequest will call SendFile to read the contents of the file and post a
vcWrite with the target filename and the contents of the file. In the case of
a receive operation, only the filename is sent to the remote.
Blackbird actually sends two buffers of data with each write and read
operation. One is the data buffer provided by the application and the other is
a header packet used for internal routing and sequence checking. Blackbird
allows applications to tack their own data onto the header packet, provided it
does not exceed 220 bytes in length. FT uses this additional space to send the
filename along with its contents all in one operation. In the interest of
keeping things simple, the remote does not report whether the file was
successfully saved or not. Obviously, this would be a desirable enhancement.
Another possible enhancement would provide the ability to query the remote
server for a list of files. This would give FT capabilities similar to those
found in DR-DOS.


Processing Blackbird Events


As discussed previously, Blackbird informs the application of operation
completions through the use of callback routines. Listing Three (page 96)
shows the callback routines used by FT. Notice that a single callback routine
is used for each type of callback.
When a Blackbird event completes, the appropriate callback routine is invoked
and provided with the VC handle corresponding to the completed operation. The
callback routine uses this information to set a bit in the global flags
indicating a completed operation. If Blackbird provided the reason for the
callback in addition to the VC handle, we could get by with even fewer
callback routines.
Back in the main-line section of code, CheckEvents loops through each virtual
connection, looking for Blackbird events. When it finds an event on a
particular VC, it determines the type of event and takes appropriate action.
Notice that it disables interrupts while it makes a copy of the event flags
and then clears them. This ensures that we don't miss an event that might
occur between these two steps.
The only Blackbird event that involves significant action is the case in which
a read operation has completed. This indicates that the remote has responded
to our request for a file or is making a request of its own. CheckEvents
simply looks at the command byte in the header buffer to determine whether it
needs to write the data buffer to disk or satisfy the remote's request for the
contents of a file. The details of these two operations are handled by
SaveFile and SendFile, respectively.
Disconnecting is straightforward, and either side may break a virtual
connection. This is done by calling vcDisconnect with the appropriate VC
handle.


Conclusion


While FT is written to be used with Blackbird, it could very easily be
modified to use a NetBIOS or IPX/SPX protocol. Alternatively, a Blackbird-like
interface could be written to access the various functions of these protocols,
allowing FT to be used as is.


_PROGRAMMING WITH COMMUNCIATION PROTOCOL STACKS_
by Gordon Free


[LISTING ONE]

#ifndef FT_H
#define FT_H

#define HVC_FIRST 2
#define HVC_LAST 5
#define MAX_VCS HVC_LAST-HVC_FIRST+1
#define HDRSIZE MAX_APPHDRDATA+sizeof(BBHDR)

/* Message command values */
#define FT2_SEND 3
#define FT2_RECEIVE 4

/* Bit fields for Blackbird events */
typedef union {
 struct {
 unsigned ListenCallback : 1;
 unsigned ConnectCallback : 1;
 unsigned ReadBreathe : 1;
 unsigned ReadCallback : 1;
 unsigned WriteBreathe : 1;
 unsigned WriteCallback : 1;
 unsigned DisconnectCallback : 1;

 } flags;
 int all;
} EVENT_T;

/* Structure for info on currently open file */
typedef struct _FILEDATA {
 FILE *hStream; /* stream handle */
 char *pszName; /* name of file */
 char *pszOpenMode; /* mode file is opened in "r", "w", etc. */
 long lSize; /* size of file in bytes */
 void *pvBuffer; /* ptr to buffer of file contents */
} FILEDATA_T;

/* Structure for application service, one per VC */
typedef struct _SVCDATA {
 FILEDATA_T fdCurFile; /* current file info */
 char bbhXmit[HDRSIZE]; /* BB hdr for sending */
 char bbhRcv[HDRSIZE]; /* BB hdr for rcving */
 char *pszRemoteName; /* name of remote machine */
 long lBytesSoFar; /* num bytes xferred */
 int fActive; /* TRUE if VC is active */
 short sStatus; /* status of last BB event */
 EVENT_T fEvent; /* BB event flags */
} SVCDATA_T;

/*----- Callback routine prototypes ------*/
void cbRemService (
 unsigned usNumSvcs /* Number of remote services */
);
SHORT cbListen (
 HANDLE hVC, /* Virtual connection handle */
 ULONG ul1, /* Nothing at this time */
 ULONG ul2 /* Status = SHORT1FROMUL( ul2 ) */
);
SHORT cbConnect (
 HANDLE hVC, /* Virtual connection handle */
 ULONG ul1, /* Nothing at this time */
 ULONG ul2 /* Status = SHORT1FROMUL( ul2 ) */
);
SHORT cbDisConnect (
 HANDLE hVC, /* Virtual connection handle */
 ULONG ul1, /* Nothing at this time */
 ULONG ul2 /* Status = SHORT1FROMUL( ul2 ) */
);
SHORT cbRead (
 HANDLE hVC, /* Virtual connection handle */
 ULONG ul1, /* Number of bytes read */
 ULONG ul2 /* Status = SHORT1FROMUL( ul2 ) */
);
SHORT cbWrite (
 HANDLE hVC, /* Virtual connection handle */
 ULONG ul1, /* Number of bytes sent */
 ULONG ul2 /* Status = SHORT1FROMUL( ul2 ) */
);
#endif








[LISTING TWO]

/*====================================================================
 (c) Copyright 1991, Gordon Free
 All Rights Reserved.
--------------------------------------------------------------------
 Filename: FT.C
 Project: DDJ Article
 $Author$
 $Revision$
 $Date$
 Purpose: Sample file transfer program written for Blackbird API.
====================================================================*/

/*----- Includes ------*/
#include <stdio.h>
#include <conio.h>
#include <malloc.h>
#include <string.h>
#include <dos.h>
#include <fcntl.h>
#include <io.h>
#include <sys\types.h>
#include <sys\stat.h>
#include "vcapi.h"
#include "ft.h"

/*----- Local Defines and Typedefs ------ */
#define ESC 0x1B
#define BUFF_SIZE (64*1024-1)

/*------ Global Variables -----*/
int fUpdateServices=FALSE; /* remote service list has changed */
HANDLE hvcServer=HVC_NONE; /* vc handle of remote server */
SVCDATA_T svc[MAX_VCS]; /* array of app service info */
char acMachineName[MAX_MACHID] = "Unknown";

/*---- DisplayServices: prints a report of all available remote services
----*/
void DisplayServices(void)
{
 int nServices; /* num of services reported */
 REMSERVICES remserv; /* service info struct */
 static HANDLE ahRS[10]; /* array of remote svc handles */
 int i;

 puts("\nList of Remote Machines ...");
 puts("---------------------------");

 /* Fill in array of remote service handles */
 nServices = vcQuery(ahRS, HBOUND(ahRS));

 /* Report service handle and machine name for each remote service */
 for (i=0; i<nServices; i++) {
 sGetRemSvc(ahRS[i], &remserv);
 printf(" %d) %s on %s %c\n", ahRS[i], remserv.acMachineName
 , remserv.acPortName

 , remserv.fInUse ? '*' : ' ');
 }
}

/*-- SendFile: read specified file and hand it to Blackbird for delivery --*/
size_t SendFile (
 HANDLE vch, /* handle of virtual connection */
 char *name /* name of file */
)
{
 int size=0; /* number of bytes read */
 int fileOut; /* file stream to read from */
 PBBHDR pbbhXmit; /* ptr to BB header */
 SVCDATA_T *psvc; /* ptr to service struct for this vc */

 printf("Sending %s ...\n", name);
 fileOut = open(name,O_RDONLYO_BINARY);
 if (fileOut > 0) {
 psvc = &svc[vch-HVC_FIRST];

 /* Read contents of file into buffer */
 size = read(fileOut, psvc->fdCurFile.pvBuffer, BUFF_SIZE);
 close(fileOut);

 /* Put name into packet header */
 pbbhXmit = (PBBHDR)&(psvc->bbhXmit[0]);
 strcpy(&(pbbhXmit->bAppData[1]), name);

 /* Indicate what remote is to do with this data */
 pbbhXmit->bAppData[0] = FT2_RECEIVE;
 pbbhXmit->ubBytesToFollow = strlen(name)+2;

 vcWrite(vch,pbbhXmit, psvc->fdCurFile.pvBuffer, size
 , NULL, cbWrite, TWO_SECS);

 } else {
 perror("Error opening file");
 }
 return size;
}

/*----SaveFile: write contents of buffer into file with specified name----*/
int SaveFile (
 char *name,
 long lsize,
 void far *buffer
)
{
 int size;
 int fileIn;
 printf("Saved file = %s\n", name);
 fileIn = open(name,O_WRONLYO_CREATO_TRUNCO_BINARY);
 if (fileIn > 0) {
 size = (size_t) lsize;
 write(fileIn, buffer, size);
 close(fileIn);
 } else {
 perror("Error saving file");
 }

 return 0;
}

/*---CheckEvents: polls each virtual connection looking for events that
 have been flagged by the various callback routines.---*/
void CheckEvents ( void )
{
 EVENT_T evt;
 SVCDATA_T *psvc;
 PBBHDR pbbhRcv;
 HANDLE vch;
 void *buff;
 int i;
 /* Loop through all possible virtual connections */
 for (i=0, psvc=svc; i<MAX_VCS; i++, psvc++) {
 /* Check to see if anything happened */
 if (psvc->fEvent.all) {
 /* Get copy of event flags and then reset them. This is done with
 /* interrupts disabled so that we don't miss any events. */
 _disable();
 evt = psvc->fEvent;
 psvc->fEvent.all = FALSE;
 _enable();

 /* ------------------- Listen --------------------------- */
 /* Somebody connected to us, allocate storage and issue a */
 /* read to get file transfer requests. */
 if (evt.flags.ListenCallback) {
 puts("#Listen Callback");
 pbbhRcv = (PBBHDR)&(psvc->bbhRcv[0]);
 if (psvc->fdCurFile.pvBuffer != NULL) {
 vcRead(HVC_FIRST+i,pbbhRcv, psvc->fdCurFile.pvBuffer
 , BUFF_SIZE, NULL, cbRead, TWO_SECS);
 } else {
 puts("Error allocating memory");
 }
 /* Register a new service for use by other clients */
 buff = malloc(BUFF_SIZE);
 if (buff != NULL) {
 vch = vcListen ("FXFR","DDJ", cbListen, cbDisConnect);
 svc[vch-HVC_FIRST].fdCurFile.pvBuffer = buff;
 } else {
 puts("Error allocating memory");
 }
 }

 /* ------------------- Connect -------------------------- */
 /* We've successfully connected to a remote server */
 if (evt.flags.ConnectCallback) {
 puts("#Connect Callback");
 hvcServer = HVC_FIRST+i;
 }

 /* ------------------- Read --------------------------- */
 /* We've gotten a request, so process it! */
 if ((evt.flags.ReadCallback) && (psvc->sStatus == 0)) {

 pbbhRcv = (PBBHDR)&(psvc->bbhRcv[0]);
 /* Is it a request to receive a file? */

 if (pbbhRcv->bAppData[0] == FT2_RECEIVE) {
 SaveFile(&(pbbhRcv->bAppData[1]), psvc->lBytesSoFar
 , psvc->fdCurFile.pvBuffer);
 /* How about send a file? */
 } else if (pbbhRcv->bAppData[0] == FT2_SEND) {
 SendFile(i+HVC_FIRST, &(pbbhRcv->bAppData[1]));
 /* No comprende */
 } else {
 puts("Huh?");
 }
 vcRead(HVC_FIRST+i,pbbhRcv, psvc->fdCurFile.pvBuffer
 , BUFF_SIZE, NULL, cbRead, TWO_SECS);
 }

 /* ------------------- Write --------------------------- */
 if ((evt.flags.WriteCallback) && (psvc->sStatus == 0)) {
 puts("#Write Complete");
 }

 /* ------------------- Disconnect ---------------------- */
 /* Remote went byebye */
 if (evt.flags.DisconnectCallback) {
 puts("#Disconnect Callback");
 free(psvc->fdCurFile.pvBuffer);
 if (i+HVC_FIRST == hvcServer)
 hvcServer = HVC_NONE;
 }
 }
 }
 if (fUpdateServices) {
 fUpdateServices = FALSE;
 DisplayServices();
 }
}

/*---DoUserRequests: check keyboard for activity and process user's
 command. Returns TRUE if user has not requested to exit.---*/
int DoUserRequests ( void )
{
 HANDLE vch;
 SVCDATA_T *psvc;
 PBBHDR pbbhXmit, pbbhRcv;
 int fContinue=TRUE;
 char filename[80];
 void *buff;
 int c;

 if ( kbhit() ) {
 switch (c = getch()) {

 /* ------------------- Exit ----------------------------- */
 case ESC:
 fContinue = FALSE;
 break;

 /* ------------------- Connect -------------------------- */
 case '0':
 case '1':
 case '2':

 case '3':
 case '4':
 case '5':
 case '6':
 case '7':
 case '8':
 case '9':

 /* Disconnect from current server, if any. */
 /* Buffer will be released when disconnect completes */
 if (hvcServer != HVC_NONE) {
 vcDisconnect(hvcServer);
 }
 /* Allocate buffer for new service */
 buff = malloc(BUFF_SIZE);
 if (buff != NULL) {
 /* Connect to remote server */
 vch = vcConnect(c-'0', cbConnect, cbDisConnect, TWO_SECS);
 svc[vch-HVC_FIRST].fdCurFile.pvBuffer = buff;
 } else {
 puts("Error allocating memory");
 }
 break;

 /* ------------------- Send File ------------------------ */
 case 's':
 case 'S':
 if (hvcServer != HVC_NONE) {
 psvc = &svc[hvcServer-HVC_FIRST];
 printf("\nEnter name of file to send >");
 scanf("%s", filename);
 SendFile(hvcServer, filename);
 } else {
 puts("You must establish a connection first!");
 }
 break;

 /* ------------------- Receive File --------------------- */
 case 'r':
 case 'R':
 if (hvcServer != HVC_NONE) {
 psvc = &svc[hvcServer-HVC_FIRST];

 /* Set up ptrs to BB hdrs for receive and send */
 pbbhXmit = (PBBHDR)&(psvc->bbhXmit[0]);
 pbbhRcv = (PBBHDR)&(psvc->bbhRcv[0]);

 /* Instruct remote server to send specified file */
 printf("\nEnter name of file to receive >");
 scanf("%s", &(pbbhXmit->bAppData[1]));
 pbbhXmit->bAppData[0] = FT2_SEND;

 pbbhXmit->ubBytesToFollow
 = strlen(&(pbbhXmit->bAppData[1]))+2;
 /* Post read before sending request so that we are ready */
 /* for response */
 vcRead(hvcServer,pbbhRcv, psvc->fdCurFile.pvBuffer
 , BUFF_SIZE, NULL, cbRead, TWO_SECS);
 /* Send request for file */

 vcWrite(hvcServer,pbbhXmit, NULL, 0, NULL, cbWrite, TWO_SECS);
 } else {
 puts("You must establish a connection first!");
 }
 default:
 break;
 }

 }
 return fContinue;
}

/*----main: initalize Blackbird and alternate between processing Blackbird
 events and user requests. ----*/
int main (
 int argc,
 char **argv
)
{
 HANDLE vch;
 void *buff;
 puts("Blackbird File Transfer Sample Program ver 1.0");
 puts(" (c) Copyright 1991, Gordon Free.");
 puts(" All Rights Reserved.\n");
 puts(vcVersion());

 /* Parse machine name off command line */
 if (argc >= 2) {
 strncpy(acMachineName, argv[1], sizeof(acMachineName-1));
 acMachineName[sizeof(acMachineName-1)] = '\0';
 }

 /* Initialize Blackbird and request notification of remote services */
 vcInit(acMachineName,"DDJ");
 vcRemsvcUpdate(cbRemService);

 /* Register as a server only if user gave a machine name */
 if (argc >= 2) {
 buff = malloc(BUFF_SIZE);
 if (buff != NULL) {
 vch = vcListen ("FXFR","DDJ", cbListen, cbDisConnect);
 svc[vch-HVC_FIRST].fdCurFile.pvBuffer = buff;
 } else {
 puts("Error allocating memory");
 goto finish;
 }
 }

 /* Alternate between checking for Blackbird events and processing
 user requests. */
 do
 {
 CheckEvents();
 } while (DoUserRequests());
finish:
 puts("Exiting ...");
 vcEnd();
 return 0;
}








[LISTING THREE]

/*--------------------------------------------------------------------
 (c) Copyright 1991, Gordon Free
 All Rights Reserved.
====================================================================
 Filename: FT_CB.C
 Project: DDJ Article
 $Author$
 $Revision$
 $Date$
 Purpose: Callback routines for Blackbird events
====================================================================*/

/*-----Includes-----*/
#include <stdio.h>
#include "vcapi.h"
#include "ft.h"

/*------Global Variables ------*/
extern SVCDATA_T svc[MAX_VCS]; /* service info for each VC */
extern int fUpdateServices; /* set TRUE to display remotes */

/* These routines are called at interrupt time, so no stack checking! */
#pragma check_stack(off)

/* Called whenever remote service list changes */
void cbRemService (
 unsigned usNumSvcs /* Number of remote services */
)
{
 fUpdateServices = TRUE;
}
/* Called when remote machine connects to us as a client */
SHORT cbListen (
 HANDLE hVC, /* Virtual connection handle */
 ULONG ul1, /* Nothing at this time */
 ULONG ul2 /* Status = SHORT1FROMUL( ul2 ) */
)
{
 svc[hVC-HVC_FIRST].fEvent.flags.ListenCallback = TRUE;
 svc[hVC-HVC_FIRST].sStatus = SHORT1FROMULONG(ul2);
 svc[hVC-HVC_FIRST].fActive = TRUE;
 return (FALSE);
}
/* Called when remote server acknowledges our connection request */
SHORT cbConnect (
 HANDLE hVC, /* Virtual connection handle */
 ULONG ul1, /* Nothing at this time */
 ULONG ul2 /* Status = SHORT1FROMUL( ul2 ) */
)
{

 svc[hVC-HVC_FIRST].fEvent.flags.ConnectCallback = TRUE;
 svc[hVC-HVC_FIRST].sStatus = SHORT1FROMULONG(ul2);
 svc[hVC-HVC_FIRST].fActive = TRUE;
 return (FALSE);
}
/* Called anytime a VC is broken */
SHORT cbDisConnect (
 HANDLE hVC, /* Virtual connection handle */
 ULONG ul1, /* Nothing at this time */
 ULONG ul2 /* Status = SHORT1FROMUL( ul2 ) */
)
{
 svc[hVC-HVC_FIRST].fEvent.flags.DisconnectCallback = TRUE;
 svc[hVC-HVC_FIRST].fActive = FALSE;
 return (FALSE);
}
/* Called when we've receive a data buffer from remote */
SHORT cbRead (
 HANDLE hVC, /* Virtual connection handle */
 ULONG ul1, /* Number of bytes received */
 ULONG ul2 /* Status = SHORT1FROMUL( ul2 ) */
)
{
 svc[hVC-HVC_FIRST].fEvent.flags.ReadCallback = TRUE;
 svc[hVC-HVC_FIRST].lBytesSoFar = ul1;
 svc[hVC-HVC_FIRST].sStatus = SHORT1FROMULONG(ul2);

 return (FALSE);
}
/* Called when data has been successfully sent to remote */
SHORT cbWrite (
 HANDLE hVC, /* Virtual connection handle */
 ULONG ul1, /* Nothing at this time */
 ULONG ul2 /* Status = SHORT1FROMUL( ul2 ) */
)
{
 svc[hVC-HVC_FIRST].fEvent.flags.WriteCallback = TRUE;
 svc[hVC-HVC_FIRST].sStatus = SHORT1FROMULONG(ul2);

 return (FALSE);
}
#pragma check_stack()




















March, 1992
PROGRAMMING PARADIGMS


Everything Becomes Irrelevant




Michael Swaine


What a boon to those of us who write about technology a good exploratory data
analysis package would be. It's not that charting trends involves EDA; time
series analysis is, I guess, the relevant statistical tool for trend-watching.
But just spotting trends is relatively easy, and we writers don't feel that
we're doing the job unless we find the megatrends behind the trends, discern
the pattern formed by the tangled threads, and extract the big picture.
The human faculty psychologists of the early part of this century were drawn
to the lure of the single measure that captured all that was important to know
about a person, sometimes calling it IQ, sometimes factor G (for Grail?). They
ended up inventing factor analysis and the whole idea of exploratory data
analysis. Sometimes, like them, we technology watchers wish for a machine that
we could pour all the time-series data of product announcements and
technological innovation into and that would grind out the single-summary,
meaning-of-it-all megatrend.
Because we don't have one of those machines, we strike one flinty observation
against another and look for sparks. It's a living. It's also a vigilance
task, such as watching a radar screen, and after a long stretch of peering one
can start to see bogies that aren't there. Lately I've been noticing a number
of trends that seem to be connected, or maybe they are aspects of a single
trend. Inasmuch as I can't put a name to this megatrend, it probably doesn't
exist. Let me just say of the following items, then, "Here are some things
that are going on. If you think like James Burke, maybe you can find the
underlying theme that connects them." Here's one thought: Distinctions that
once mattered are becoming matters of indifference.


The Universal Document Format


Adobe has this plan. John Warnock, the Xerox PARC alumnus who cofounded Adobe
and invented the PostScript page description language, wants there to be a
universal document format. Any document of any kind that can be printed or
displayed on any device should be printable on every printer and displayable
on every device, and it should be the same document in all essentials,
regardless of the resolution of the device, the availability of fonts, or the
hardware or operating system platform. And Adobe should define the format.
Adobe is well on the way to producing such a format and the products required
to make it work. The project is code-named "Carousel," and is scheduled for
release this year, which of course means next year.
Most of the press materials on Carousel emphasize the publishing implications:
true electronic publishing, magazines on disk. But because every application
displays information on screen and most also print, we're really talking about
every application domain there is. Anybody involved in application development
should know about Carousel.
Carousel is an extension of PostScript, but it goes well beyond PostScript.
PostScript versions for different machines aren't identical, but Carousel's
format is intended to be universal. Only PostScript printers can print
PostScript documents, while Carousel documents will be printable even on
dot-matrix printers. PostScript doesn't know about documents; it creates a
page, rather than a document. Things such as margins, column widths, and tab
settings are not part of the PostScript page description. They will be part of
the Carousel document description.
If you don't have the fonts used to create the document, you won't see the
document properly even if you have PostScript and the creating application.
Using Carousel, you don't need the original fonts. Carousel will support
Adobe's new Multiple Master Fonts, so that all important font attributes get
carried along with the document. On display or printing, you may get a
simulated font, but it will match the original closely enough that character
widths and line breaks, for example, will be preserved at any resolution.
What Carousel needs to succeed is user demand. Adobe will sell to users a
viewer, conversion utility (for the PostScript output of existing aps), and a
set of Multiple Master Fonts. To the developer, it'll license the technology.
If it has a prayer of making this universal, Adobe needs to price the user and
developer packages low. Well, the user package, anyway. Even if the developer
price is daunting, demand from users can nudge developers along, of course.
And users are being nudged right now by an article that is scheduled to appear
in this month's MacUser magazine.


The Increasing Irrelevance of Hardware


Wouldn't it be a comfort to be able to say, like some user, "I'll believe it
when I see it," or "I'll cross that bridge when I come to it"? But developers
and writers on technology, if they want to stay on top of developments, are
forced to believe in things that don't exist and to cross bridges while still
on dry land.
One such landlocked bridge is Taligent, the Apple-IBM joint venture and its
currently eponymous and nonexistent but nevertheless important operating
system. The Taligent folks are serious about the job of creating a seminal
instance of the next generation of personal computer operating systems, and
they'll do it. Assuming that political considerations don't intervene, which
is a big assumption, but one we sort of have to make.
On that assumption, the Taligent story includes this promise: Existing
applications will run on hardware running the Taligent operating system.
Taligent, an object-oriented system, will support objects called
"personalities," one for each supported operating system. These personalities
will be operating system emulators, and the promised emulations are the Mac
operating system, AIX-A/UX or whatever UNIX Taligent recognizes at that time,
and OS/2.
Taligent itself is intended to run on a variety of hardware platforms,
including 680x0, 80x86, and RISC hardware. And it's supposed to be scalable, a
word that promises so much, so it will run not only on different processor
families but also on hardware of differing speeds and capacities. Faster
hardware will be preferable, but any hardware meeting minimal criteria should
be OK. So the hardware doesn't define the system.
Emulation of other machines and operating systems is not a new idea. If it's
practical today, it's because of the power of the hardware. There is some
irony in the fact that the speed and power of the hardware makes the choice of
hardware somewhat irrelevant. But the irrelevance is only technical.
Politically, it is not clear that all emulators will be available for all
hardware platforms.


The Outsourcing of Technology


Michael Kei Stewart has ceased publishing his Developers' Insight newsletter,
closing it out with grace by paying back what he owes his subscribers. His
final issue includes an article on just-in-time software technology, written
by one Sandy Ruby.
Ruby argues that the pace of software development is driving the big software
houses to buy rather than build. Microsoft buys DOS 5.0 utilities from Central
Point Software. Borland buys a Windows interface builder from the Whitewater
Group. Lotus buys word processing technology from Samna.
Traditionally, it's been the smaller companies that buy technology to keep up
with the biggies that can afford to build their own. But more and more often,
the large companies are looking outside for technology. The forces driving
this move, Ruby says, are risk and speed. Speed to be the first company in a
market, or at least to reduce the time lag between your competitor's Excel and
your 1-2-3 for the Mac. The risk is the risk of committing internal resources
to a path that turns out to be the wrong one.
There may be other ways to speed up product development. Maybe OOP, done
right, or some variation on or combination of software engineering
methodologies, will make it possible for big companies to get products to
market fast. OOP seems to be helping Borland get updates cranked out fast, but
that's not the same as turning out new products based on new technology, which
is what keeps the industry (as opposed to a company) alive. But buying is what
the big companies are doing now.
This trend is not bad news for the small developer. The buyers and sellers
both stand to benefit in outsourcing, and the sellers tend to be smaller, more
focused companies.
Smaller companies can benefit from the trend in the role of purchasers, too.
When a small company buys technology from another small company, it may be
able to compete with a larger company that goes it alone. Ruby cites the
example of Clarion licensing JPI's TopSpeed code generator. TopSpeed can
compete technically with the big guys, so Clarion has less to fear from their
future database offerings. We know that, in this rapidly-changing industry,
small companies can compete with large ones. Here Ruby suggests one way.
Although Ruby doesn't use the analogy and I'm not sure it works, isn't there a
parallel between small companies banding together in this way and the
acknowledged superiority of almost any user-selected suite of focused
applications over an integrated package? The big company building it all is
saddled with the problem that the product is only as good as its weakest
development team. Like an integrated package is only as good as its weakest
component. The small firm can choose technologies rather than teams,
eliminating uncertainty as well as delay.


The Outsourcing of Marketing Savvy


In the same issue of Developers' Insight, Bill Lohse talks about small
software developers pooling their resources. Lohse has been involved in sales
and marketing in one area or another of the personal computer field since
1978. He got his start in the industry at IMSI, moving from there to MicroPro
and IUS, going on to found Breakthrough Software, and in recent years working
in magazine publishing as the publisher of several Ziff-Davis publications and
president of Ziff-Davis. His latest venture is Software Venture Partners, a
venture capital company with an interesting premise.
Software Venture Partners invests in startup companies, and requires these
things: that they have great products, that they put together a business deal
with Lohse's company, and that they want him involved in their company one day
a week. Lohse is quite serious about this last item. Software Venture Partners
is limited to backing five companies at a time, so that Lohse can actually
spend one day a week with each company.
What Software Venture Partners offers to entrepreneurs is marketing savvy and,
eventually, a cooperative pool of resources in the areas of marketing, sales,
and tech support.
Lohse knows that software entrepreneurs are reluctant to give up any control
in their businesses. That's why he's skeptical about the idea of a software
cooperative. In such a venture, he says, "it's going to be difficult to
engender the kind of consensus needed. You're going to have to get three or
four companies to act together, without the guidance of someone who can say,
'You ought to listen to me, because I've done this before successfully'."
Software Venture Partners is not a coop, but a way for entrepreneurs to get
capital and savvy marketing help at a time when they need it most without
giving away the store. I first met Lohse when he was with MicroPro almost a
decade ago, and I know he knows what he's doing. His venture is too new,
though, to have a track record. He welcomes calls from developers at
702-588-3171.

Ruby's view of small companies licensing each other's technology does, I
think, suggest a level at which new and small companies can work together
without giving anything up.
Of course, anything a small company can do, a well-managed large company can
also do. But remember, the bigger they get, the better is the chance that,
rather than trying to beat you at your own game, they will hire incompetent
executives at astronomical salaries, blame the Japanese for their failures,
and demand that the Federal government bail them out when they can't compete.




























































March, 1992
C PROGRAMMING


D-Flat Lists and Logs


 This article contains the following executables: DFLT11.ARC D11TXT.ARC


Al Stevens


This month continues the D-Flat saga by describing the LISTBOX window class. A
list box is a window that contains a list of one-line entries, and it is the
base class for the D-Flat pop-down menus. You will also find list boxes as
controls on dialog boxes, and it is possible to create a document window that
is itself a list box. The D-Flat message log is a debugging aid that the
Memopad example program uses, but it also serves as an example of a dialog box
that has a list-box control with multiple-line selection. This column
addresses the LISTBOX class and describes the message log.


The LISTBOX Window Class


A list box is a window of or derived from the LISTBOX class. The user selects
an item from a list box with the mouse or by using the up and down arrow keys
to move the bar cursor to the desired item and pressing the Enter key. When
you use pop-down menus, for example, you are using a list box.
A list box can have more items than the window will display, in which case the
user can scroll through them with the keyboard or with scroll bars. The
LISTBOX class is derived from the TEXTBOX class, so it inherits most of the
text viewing features and messages from the text box and adds a few of its
own.


Extended Selections


A list box can have the MULTILINE attribute, which means that the user can
select multiple entries from the list at one time. This process is called
extended selection in CUA lingo. You can select individual items and groups of
items. An extended-selection list box can have many groups of selected items
marked for selection before the user is ready to process the list. Here's how
it works.
A program adds items to a list box by using the ADDTEXT message. Each of the
text entries in an extended-selection list box will have a space in the first
character. That position is reserved for a mark that identifies the item as
being selected. The mark character is defined in dflat.h by the LISTSELECTOR
global symbol, and it displays a small diamond character. You could predefine
a list-box item by putting the LISTSELECTOR character in the first position of
the item's text entry before you add it to the list box with the ADDTEXT
message.
A user selects items in a group from the extended-selection list box by
putting the bar cursor on the first item, holding the Shift key down, and
moving the cursor to the last item in the group. You can move up and down this
way, changing the range of the group in either direction. The program will
mark each selection in the group with the diamond so you can see how the
selection is progressing. When the group is marked the way you want it, press
the Enter key. The program retrieves the selected items and processes them. If
you release the Shift key and move the cursor, the selected group is
deselected, the diamonds are erased, and the procedure starts over.
To select a group with the mouse, click on the first item, hold the Shift key
down, and click on the last item. You can drag the mouse up and down with the
Shift key held down, and you can use the scroll bar to scroll to the last item
before clicking it.


Add Mode


The extended-selection procedures just described work with one group at a
time. There will be occasions when the user needs to select several groups or
scattered individual selections from the list. To do this, you must enter the
list box's add mode. The Shift+F8 key toggles the add mode. When a list box is
in add mode, the user can make persistent individual selections by moving the
cursor bar and pressing the space bar. The space bar also deselects a selected
item, so you can think of it as a toggle. Moving the cursor bar does not clear
existing selected groups, so the user can select multiple groups as well by
moving to the first item in the new group, selecting the item with the space
bar, and then holding down the Shift key to move to the last item in the
group.
To select multiple items and groups with the mouse, hold down the Ctrl key
while you select items and groups. Release the Ctrl key and click an item to
erase the existing groups.
The user needs to know when a list box is in add mode and when it is not. When
a list box goes into add mode, it tells its parent to display the "Add Mode"
text string in its status bar. Most D-Flat applications will have a status bar
at the bottom of the application window. The D-Flat ADDSTATUS message ripples
upward from parent to parent until a window intercepts and processes it, so
any window can send the ADDSTATUS message to its parent, figuring that the
message will find its way to the top.
You can use the Log Messages dialog box on the Options menu of the Memopad
example program to observe the behavior of an extended-selection list box.


The LISTBOX Source Code


Listing One, page 130, is listbox.c, the source file that implements the list
box within D-Flat. The ListBoxProc function is the window-processing module
for list boxes. For the CREATE_WINDOW message, it sets the selection and
AnchorPoint variables to-1, which is their initial and null value. The
ListBoxProc function processes messages in a switch statement except when the
message needs more than a few lines of code. Then, the message processing for
that message has its own function named for the message. For example, the
KEYBOARD message is processed by the KeyboardMsg function, which is called by
the KEYBOARD case of the switch statement. The KeyboardMsg function itself is
divided into calls to lower functions to process each of the keystroke values.


List-box Keystrokes


Depending on what key the user presses, the KeyboardMsg function calls other
functions to process the keys. The Shift+F8 key toggles the add mode of an
extended-selection list box. When the list-box is in this mode, the user can
preserve existing selections while moving the cursor. The Up, Down, PgUp,
PgDn, Home, and End keys move the list-box cursor. Each of these keys calls
the TestExtended function. If the list box is not in add mode and there are
existing selections, this function clears the existing selections before the
cursor is moved.
Moving the cursor is done by the SCROLL, HORIZSCROLL, SCROLLPAGE, HORIZPAGE,
and SCROLLDOC messages. Most of the processing of these messages is handled by
the window-processing module of the base TEXTBOX class. The Up and Down keys
work differently from their counterparts in a text box, however. The text box
uses the keys to scroll the text within the window. A list box uses the keys
to move the bar cursor up and down, scrolling the text only when the cursor is
on the top or bottom line of the window.
Keys that move the bar cursor also post the LB_SELECTION message to the
window. That message tells the window that the selection cursor has changed.
The space-bar key toggles selections in add mode. If the extended-selection
anchor point is not set, the space-bar key sets it at the current line. This
is the point from which extended-selection groups begin. Then, if the space
bar toggled the selection on, the Extend-Selections function extends the
selected group from the anchor-point line to the current line.
The Enter key sends the LB_SELECTION and LB_CHOOSE messages to the window.
While the LB_SELECTION tells the window that its selection cursor is now on
another line, the LB_CHOOSE message tells the window that the user has chosen
the selected line from the list box to be processed by the application.
Other keystrokes are tested to see if they are the first character of one of
the list-box entries that occur past the current selection. If so, the user is
selecting the next entry with the matching character. That way, if you have
the four names Jones, Smith, Brown, and Green in a list box, the user can
quickly go to the Brown entry by pressing the B key.



Messages from the Mouse


When the user presses the left mouse button in a list box, the LEFT_BUTTON
message calls the LeftButtonMsg function. LEFT_BUTTON messages continue to
come as long as the user holds the mouse button down, so the function ignores
all but the first one as long as the y coordinate does not change. When a new
y-coordinate value arrives, the function checks to see if the user has a shift
key down. If so, an extended selection is underway. If not, the program clears
all existing selections unless the Ctrl key is down, which means the user is
selecting individual and multiple items with single button presses. The
function finishes by sending the LB_SELECTION to the window to tell it the
user selected an item.
The DOUBLE_CLICK message occurs when the user double-clicks the left mouse
button on a list box. This means the user is choosing that item to be
processed, and so the program sends the LB_CHOOSE message to the window.
The BUTTON_RELEASED message arrives when the user releases the left mouse
button. It resets the previous mouse y-coordinate variable to its -1 null
value.


Adding, Reading, and Displaying List-box Text


The ADDTEXT message sets the current selection to the first item in the list
box if no item is currently selected. This lets the list box start out with an
initial selected item. If the first character of the text is the LISTSELECTOR
character, the program increments the SelectCount variable, which counts the
number of selected items in the list box.
The LB_GETTEXT message retrieves the line of text relative to the line number
specified in the second parameter and copies the line to the address specified
in the first parameter. It copies up to and not including the newline
character, and null terminates the receiving field.
The CLEARTEXT message resets the anchor point, current selection, and
extended-selection counter.
The PAINT, paging, and scrolling messages call their counterparts in the base
window class's window-processing module and then call the WriteSelection
function to display the currently selected list entry with the selector bar
cursor colors turned on.


Selecting and Choosing


In CUA parlance, "selecting" is moving the list-box cursor to an item or
marking one or more items in an extended selection, whereas "choosing" is
telling the application to process the current selection. This is an important
distinction. For example, a pop-down menu is a list-box derivative. When you
move a pop-down menu's cursor, you are changing its selection. When you press
the Enter key to execute the menu's command, you are choosing the list-box
item. The program sometimes needs to know about both events, so there are
LB_CHOOSE and LB_SELECTION messages. Both messages send themselves to the list
box's parent window as well. This allows an application window, dialog box, or
menu bar to do something meaningful when the user takes an action on the list
box. The list-box window itself uses the LB_SELECTION message to change the
current selection to the one specified in the first parameter. The
LB_CURRENTSELECTION message returns the current selection line number to the
sender of the message. The LB_SETSELECTION lets the sender position the
selection to a specified line number.


The Message Log


The Memopad example program uses a D-Flat debugging technique to log D-Flat
messages as they occur. This feature lets a programmer view the messages that
passed through the system during a test. There are always a lot of messages
flying around in an event-driven system, and you might not want to look at all
of them, so the message log uses an extended-selection list box from which you
can select the messages you want to log.
Listing Two, page 133, is log.c, the source file that implements the message
log. An application calls the MessageLog function to let you turn logging on
and off and select the messages you want to log. The Memopad application has a
Log Messages command on its Option menu that calls the MessageLog function.
The function executes the modal Log dialog box, which is defined in dialogs.c,
a source file that we discussed in the September 1991 column. The dialog box
has an extended-selection list box to display the messages, a check box to
turn logging on and off, and the usual OK, Cancel, and Help pushbuttons. The
message array is the display of messages. The first character position of each
entry is blank at first. When that position is changed to contain the
LISTSELECTOR character, the corresponding message is selected to be logged.
When the ID_LOGGING check box is on, logging is underway. The
message-dispatching code in message.c calls the LogMessages function every
time a message gets sent. When logging is turned on and the most recent
message is selected in the list, the program writes the message's particulars
to the log file, DFLAT.LOG, which you can view with any text editor, including
one of the document windows of the Memopad program.


How to Get D-Flat Now


The D-Flat source code is on CompuServe in Library 0 of the DDJ Forum and on
M&T Online. If you cannot use either online service, send a formatted 360K or
720K diskette and an addressed, stamped diskette mailer to me in care of Dr.
Dobb's Journal, 411 Borel Ave., San Mateo, CA 94402. I'll send you the latest
version of D-Flat. The software is free, but if you care to, stuff a dollar
bill in the mailer for the Brevard County Food Bank. They help homeless and
hungry families here in my home town. We've collected over $1000 so far from
generous D-Flat "careware" users. If you want to discuss D-Flat with me, use
CompuServe. My CompuServe ID is 71101,1262, and I monitor the DDJ Forum daily.
Next month we'll discuss the D-Flat menu system, including the menu bar,
pop-down menus, and the system menu.


OOPS Unraveled


A lot of programmers have trouble getting their first handle on
object-oriented programming. It's not that OOP is hard to understand; it's
that OOP is hard to explain. There seems to be no better way to learn it than
to get hold of a C++ compiler and design and program some systems. That's fine
if you have time to burn getting up the curve, but most programmers have
deadlines and bosses who get testy if the programmers spend too much time
learning and not enough time coding.
Your urgency to learn notwithstanding, OOP seems to be one of those subjects
such as flying an airplane and catching a fish that needs hands-on experience
before you can say you know how to do it. You just can't get it all from a
book. You should be able to get a reasonable introduction, though, and I've
found a book that does a good job of explaining OOP to programmers. It's
called Object-Oriented Technology: A Manager's Guide, by David Taylor
(Addison-Wesley, 1990), and it will get you started. You can tell from the
title that the book is not aimed at programmers, and that is why it is so good
at explaining OOP to programmers -- it uses plain English and believable
examples. This approach attempts to explain a complex technical methodology to
nontechnical managers -- a futile effort at best -- and in doing so, puts it
in a language that technicians can embrace and understand. Too bad the book
will not serve the audience it targets and does not target the audience it
serves. It's a small book, about the size of K&R, easy to read, with a lot of
illustrations, and without the usual nonsense about families, genera, and
species of fruit and animals that some texts and teachers use when they try to
explain object-oriented class hierarchies. The examples always use classes of
things that you could imagine yourself writing programs about. Good stuff.
Recommended.


_C PROGRAMMING COLUMN_
by Al Stevens


[LISTING ONE]

/* ------------- listbox.c ------------ */

#include "dflat.h"

#ifdef INCLUDE_EXTENDEDSELECTIONS
static int ExtendSelections(WINDOW, int, int);
static void TestExtended(WINDOW, PARAM);
static void ClearAllSelections(WINDOW);
static void SetSelection(WINDOW, int);

static void FlipSelection(WINDOW, int);
static void ClearSelection(WINDOW, int);
#else
#define TestExtended(w,p) /**/
#endif
static void near ChangeSelection(WINDOW, int, int);
static void near WriteSelection(WINDOW, int, int, RECT *);
static int near SelectionInWindow(WINDOW, int);

static int py = -1; /* the previous y mouse coordinate */

#ifdef INCLUDE_EXTENDEDSELECTIONS
/* --------- SHIFT_F8 Key ------------ */
static void AddModeKey(WINDOW wnd)
{
 if (isMultiLine(wnd)) {
 wnd->AddMode ^= TRUE;
 SendMessage(GetParent(wnd), ADDSTATUS,
 wnd->AddMode ? ((PARAM) "Add Mode") : 0, 0);
 }
}
#endif

/* --------- UP (Up Arrow) Key ------------ */
static void UpKey(WINDOW wnd, PARAM p2)
{
 if (wnd->selection > 0) {
 if (wnd->selection == wnd->wtop) {
 BaseWndProc(LISTBOX, wnd, KEYBOARD, UP, p2);
 PostMessage(wnd, LB_SELECTION, wnd->selection-1,
 isMultiLine(wnd) ? p2 : FALSE);
 }
 else {
 int newsel = wnd->selection-1;
 if (wnd->wlines == ClientHeight(wnd))
 while (*TextLine(wnd, newsel) == LINE)
 --newsel;
 PostMessage(wnd, LB_SELECTION, newsel,
#ifdef INCLUDE_EXTENDEDSELECTIONS
 isMultiLine(wnd) ? p2 :
#endif
 FALSE);
 }
 }
}

/* --------- DN (Down Arrow) Key ------------ */
static void DnKey(WINDOW wnd, PARAM p2)
{
 if (wnd->selection < wnd->wlines-1) {
 if (wnd->selection == wnd->wtop+ClientHeight(wnd)-1) {
 BaseWndProc(LISTBOX, wnd, KEYBOARD, DN, p2);
 PostMessage(wnd, LB_SELECTION, wnd->selection+1,
 isMultiLine(wnd) ? p2 : FALSE);
 }
 else {
 int newsel = wnd->selection+1;
 if (wnd->wlines == ClientHeight(wnd))
 while (*TextLine(wnd, newsel) == LINE)

 newsel++;
 PostMessage(wnd, LB_SELECTION, newsel,
#ifdef INCLUDE_EXTENDEDSELECTIONS
 isMultiLine(wnd) ? p2 :
#endif
 FALSE);
 }
 }
}

/* --------- HOME and PGUP Keys ------------ */
static void HomePgUpKey(WINDOW wnd, PARAM p1, PARAM p2)
{
 BaseWndProc(LISTBOX, wnd, KEYBOARD, p1, p2);
 PostMessage(wnd, LB_SELECTION, wnd->wtop,
#ifdef INCLUDE_EXTENDEDSELECTIONS
 isMultiLine(wnd) ? p2 :
#endif
 FALSE);
}

/* --------- END and PGDN Keys ------------ */
static void EndPgDnKey(WINDOW wnd, PARAM p1, PARAM p2)
{
 int bot;
 BaseWndProc(LISTBOX, wnd, KEYBOARD, p1, p2);
 bot = wnd->wtop+ClientHeight(wnd)-1;
 if (bot > wnd->wlines-1)
 bot = wnd->wlines-1;
 PostMessage(wnd, LB_SELECTION, bot,
#ifdef INCLUDE_EXTENDEDSELECTIONS
 isMultiLine(wnd) ? p2 :
#endif
 FALSE);
}

#ifdef INCLUDE_EXTENDEDSELECTIONS
/* --------- Space Bar Key ------------ */
static void SpacebarKey(WINDOW wnd, PARAM p2)
{
 if (isMultiLine(wnd)) {
 int sel = SendMessage(wnd, LB_CURRENTSELECTION, 0, 0);
 if (sel != -1) {
 if (wnd->AddMode)
 FlipSelection(wnd, sel);
 if (ItemSelected(wnd, sel)) {
 if (!((int) p2 & (LEFTSHIFT RIGHTSHIFT)))
 wnd->AnchorPoint = sel;
 ExtendSelections(wnd, sel, (int) p2);
 }
 else
 wnd->AnchorPoint = -1;
 SendMessage(wnd, PAINT, 0, 0);
 }
 }
}
#endif

/* --------- Enter ('\r') Key ------------ */

static void EnterKey(WINDOW wnd)
{
 if (wnd->selection != -1) {
 SendMessage(wnd, LB_SELECTION, wnd->selection, TRUE);
 SendMessage(wnd, LB_CHOOSE, wnd->selection, 0);
 }
}

/* --------- All Other Key Presses ------------ */
static void KeyPress(WINDOW wnd, PARAM p1, PARAM p2)
{
 int sel = wnd->selection+1;
 while (sel < wnd->wlines) {
 char *cp = TextLine(wnd, sel);
 if (cp == NULL)
 break;
#ifdef INCLUDE_EXTENDEDSELECTIONS
 if (isMultiLine(wnd))
 cp++;
#endif
 /* --- special for directory list box --- */
 if (*cp == '[')
 cp++;
 if (tolower(*cp) == (int)p1) {
 SendMessage(wnd, LB_SELECTION, sel,
 isMultiLine(wnd) ? p2 : FALSE);
 if (!SelectionInWindow(wnd, sel)) {
 wnd->wtop = sel-ClientHeight(wnd)+1;
 SendMessage(wnd, PAINT, 0, 0);
 }
 break;
 }
 sel++;
 }
}

/* --------- KEYBOARD Message ------------ */
static int KeyboardMsg(WINDOW wnd, PARAM p1, PARAM p2)
{
 switch ((int) p1) {
#ifdef INCLUDE_EXTENDEDSELECTIONS
 case SHIFT_F8:
 AddModeKey(wnd);
 return TRUE;
#endif
 case UP:
 TestExtended(wnd, p2);
 UpKey(wnd, p2);
 return TRUE;
 case DN:
 TestExtended(wnd, p2);
 DnKey(wnd, p2);
 return TRUE;
 case PGUP:
 case HOME:
 TestExtended(wnd, p2);
 HomePgUpKey(wnd, p1, p2);
 return TRUE;
 case PGDN:

 case END:
 TestExtended(wnd, p2);
 EndPgDnKey(wnd, p1, p2);
 return TRUE;
#ifdef INCLUDE_EXTENDEDSELECTIONS
 case ' ':
 SpacebarKey(wnd, p2);
 break;
#endif
 case '\r':
 EnterKey(wnd);
 return TRUE;
 default:
 KeyPress(wnd, p1, p2);
 break;
 }
 return FALSE;
}

/* ------- LEFT_BUTTON Message -------- */
static int LeftButtonMsg(WINDOW wnd, PARAM p1, PARAM p2)
{
 int my = (int) p2 - GetTop(wnd);
 if (my >= wnd->wlines-wnd->wtop)
 my = wnd->wlines - wnd->wtop;

 if (WindowMoving WindowSizing)
 return FALSE;
 if (!InsideRect(p1, p2, ClientRect(wnd)))
 return FALSE;
 if (wnd->wlines && my != py) {
 int sel = wnd->wtop+my-1;
#ifdef INCLUDE_EXTENDEDSELECTIONS
 int sh = getshift();
 if (!(sh & (LEFTSHIFT RIGHTSHIFT))) {
 if (!(sh & CTRLKEY))
 ClearAllSelections(wnd);
 wnd->AnchorPoint = sel;
 SendMessage(wnd, PAINT, 0, 0);
 }
#endif
 SendMessage(wnd, LB_SELECTION, sel, TRUE);
 py = my;
 }
 return TRUE;
}

/* ------------- DOUBLE_CLICK Message ------------ */
static int DoubleClickMsg(WINDOW wnd, PARAM p1, PARAM p2)
{
 if (WindowMoving WindowSizing)
 return FALSE;
 if (wnd->wlines) {
 RECT rc = ClientRect(wnd);
 BaseWndProc(LISTBOX, wnd, DOUBLE_CLICK, p1, p2);
 if (InsideRect(p1, p2, rc))
 SendMessage(wnd, LB_CHOOSE, wnd->selection, 0);
 }
 return TRUE;

}

/* ------------ ADDTEXT Message -------------- */
static int AddTextMsg(WINDOW wnd, PARAM p1, PARAM p2)
{
 int rtn = BaseWndProc(LISTBOX, wnd, ADDTEXT, p1, p2);
 if (wnd->selection == -1)
 SendMessage(wnd, LB_SETSELECTION, 0, 0);
#ifdef INCLUDE_EXTENDEDSELECTIONS
 if (*(char *)p1 == LISTSELECTOR)
 wnd->SelectCount++;
#endif
 return rtn;
}

/* --------- GETTEXT Message ------------ */
static void GetTextMsg(WINDOW wnd, PARAM p1, PARAM p2)
{
 if ((int)p2 != -1) {
 char *cp1 = (char *)p1;
 char *cp2 = TextLine(wnd, (int)p2);
 while (cp2 && *cp2 && *cp2 != '\n')
 *cp1++ = *cp2++;
 *cp1 = '\0';
 }
}

/* --------- LISTBOX Window Processing Module ------------ */
int ListBoxProc(WINDOW wnd, MESSAGE msg, PARAM p1, PARAM p2)
{
 switch (msg) {
 case CREATE_WINDOW:
 BaseWndProc(LISTBOX, wnd, msg, p1, p2);
 wnd->selection = -1;
#ifdef INCLUDE_EXTENDEDSELECTIONS
 wnd->AnchorPoint = -1;
#endif
 return TRUE;
 case KEYBOARD:
 if (WindowMoving WindowSizing)
 break;
 if (KeyboardMsg(wnd, p1, p2))
 return TRUE;
 break;
 case LEFT_BUTTON:
 if (LeftButtonMsg(wnd, p1, p2) == TRUE)
 return TRUE;
 break;
 case DOUBLE_CLICK:
 if (DoubleClickMsg(wnd, p1, p2))
 return TRUE;
 break;
 case BUTTON_RELEASED:
 py = -1;
 return TRUE;
 case ADDTEXT:
 return AddTextMsg(wnd, p1, p2);
 case LB_GETTEXT:
 GetTextMsg(wnd, p1, p2);

 return TRUE;
 case CLEARTEXT:
 wnd->selection = -1;
#ifdef INCLUDE_EXTENDEDSELECTIONS
 wnd->AnchorPoint = -1;
#endif
 wnd->SelectCount = 0;
 break;
 case PAINT:
 BaseWndProc(LISTBOX, wnd, msg, p1, p2);
 WriteSelection(wnd, wnd->selection, TRUE, (RECT *)p1);
 return TRUE;
 case SCROLL:
 case HORIZSCROLL:
 case SCROLLPAGE:
 case HORIZPAGE:
 case SCROLLDOC:
 BaseWndProc(LISTBOX, wnd, msg, p1, p2);
 WriteSelection(wnd,wnd->selection,TRUE,NULL);
 return TRUE;
 case LB_CHOOSE:
 SendMessage(GetParent(wnd), LB_CHOOSE, p1, p2);
 return TRUE;
 case LB_SELECTION:
 ChangeSelection(wnd, (int) p1, (int) p2);
 SendMessage(GetParent(wnd), LB_SELECTION,
 wnd->selection, 0);
 return TRUE;
 case LB_CURRENTSELECTION:
 return wnd->selection;
 case LB_SETSELECTION:
 ChangeSelection(wnd, (int) p1, 0);
 return TRUE;
#ifdef INCLUDE_EXTENDEDSELECTIONS
 case CLOSE_WINDOW:
 if (isMultiLine(wnd) && wnd->AddMode) {
 wnd->AddMode = FALSE;
 SendMessage(GetParent(wnd), ADDSTATUS, 0, 0);
 }
 break;
#endif
 default:
 break;
 }
 return BaseWndProc(LISTBOX, wnd, msg, p1, p2);
}

static int near SelectionInWindow(WINDOW wnd, int sel)
{
 return (wnd->wlines && sel >= wnd->wtop &&
 sel < wnd->wtop+ClientHeight(wnd));
}

static void near WriteSelection(WINDOW wnd, int sel,
 int reverse, RECT *rc)
{
 if (isVisible(wnd))
 if (SelectionInWindow(wnd, sel))
 WriteTextLine(wnd, rc, sel, reverse);

}

#ifdef INCLUDE_EXTENDEDSELECTIONS
/* ----- Test for extended selections in the listbox ----- */
static void TestExtended(WINDOW wnd, PARAM p2)
{
 if (isMultiLine(wnd) && !wnd->AddMode &&
 !((int) p2 & (LEFTSHIFT RIGHTSHIFT))) {
 if (wnd->SelectCount > 1) {
 ClearAllSelections(wnd);
 SendMessage(wnd, PAINT, 0, 0);
 }
 }
}

/* ----- Clear selections in the listbox ----- */
static void ClearAllSelections(WINDOW wnd)
{
 if (isMultiLine(wnd) && wnd->SelectCount > 0) {
 int sel;
 for (sel = 0; sel < wnd->wlines; sel++)
 ClearSelection(wnd, sel);
 }
}

/* ----- Invert a selection in the listbox ----- */
static void FlipSelection(WINDOW wnd, int sel)
{
 if (isMultiLine(wnd)) {
 if (ItemSelected(wnd, sel))
 ClearSelection(wnd, sel);
 else
 SetSelection(wnd, sel);
 }
}

static int ExtendSelections(WINDOW wnd, int sel, int shift)
{
 if (shift & (LEFTSHIFT RIGHTSHIFT) &&
 wnd->AnchorPoint != -1) {
 int i = sel;
 int j = wnd->AnchorPoint;
 int rtn;
 if (j > i)
 swap(i,j);
 rtn = i - j;
 while (j <= i)
 SetSelection(wnd, j++);
 return rtn;
 }
 return 0;
}

static void SetSelection(WINDOW wnd, int sel)
{
 if (isMultiLine(wnd) && !ItemSelected(wnd, sel)) {
 char *lp = TextLine(wnd, sel);
 *lp = LISTSELECTOR;
 wnd->SelectCount++;

 }
}

static void ClearSelection(WINDOW wnd, int sel)
{
 if (isMultiLine(wnd) && ItemSelected(wnd, sel)) {
 char *lp = TextLine(wnd, sel);
 *lp = ' ';
 --wnd->SelectCount;
 }
}

int ItemSelected(WINDOW wnd, int sel)
{
 if (isMultiLine(wnd) && sel < wnd->wlines) {
 char *cp = TextLine(wnd, sel);
 return (int)((*cp) & 255) == LISTSELECTOR;
 }
 return FALSE;
}
#endif

static void near ChangeSelection(WINDOW wnd,int sel,int shift)
{
 if (sel != wnd->selection) {
#ifdef INCLUDE_EXTENDEDSELECTIONS
 if (isMultiLine(wnd)) {
 int sels;
 if (!wnd->AddMode)
 ClearAllSelections(wnd);
 sels = ExtendSelections(wnd, sel, shift);
 if (sels > 1)
 SendMessage(wnd, PAINT, 0, 0);
 if (sels == 0 && !wnd->AddMode) {
 ClearSelection(wnd, wnd->selection);
 SetSelection(wnd, sel);
 wnd->AnchorPoint = sel;
 }
 }
#endif
 WriteSelection(wnd, wnd->selection, FALSE, NULL);
 wnd->selection = sel;
 WriteSelection(wnd, sel, TRUE, NULL);
 }
}






[LISTING TWO]

/* ------------ log .c ------------ */

#include "dflat.h"

#ifdef INCLUDE_LOGGING


static char *message[] = {
 #undef DFlatMsg
 #define DFlatMsg(m) " " #m,
 #include "dflatmsg.h"
 NULL
};

static FILE *log = NULL;
extern DBOX Log;

void LogMessages (WINDOW wnd, MESSAGE msg, PARAM p1, PARAM p2)
{
 if (log != NULL && message[msg][0] != ' ')
 fprintf(log,
 "%-20.20s %-12.12s %-20.20s, %5.5ld, %5.5ld\n",
 wnd ? (GetTitle(wnd) ? GetTitle(wnd) : "") : "",
 wnd ? ClassNames[GetClass(wnd)] : "",
 message[msg]+1, p1, p2);
}

static int LogProc(WINDOW wnd, MESSAGE msg, PARAM p1, PARAM p2)
{
 WINDOW cwnd = ControlWindow(&Log, ID_LOGLIST);
 char **mn = message;
 switch (msg) {
 case INITIATE_DIALOG:
 AddAttribute(cwnd, MULTILINE VSCROLLBAR);
 while (*mn) {
 SendMessage(cwnd, ADDTEXT, (PARAM) (*mn), 0);
 mn++;
 }
 SendMessage(cwnd, SHOW_WINDOW, 0, 0);
 break;
 case COMMAND:
 if ((int) p1 == ID_OK) {
 int item;
 int tl = GetTextLines(cwnd);
 for (item = 0; item < tl; item++)
 if (ItemSelected(cwnd, item))
 mn[item][0] = LISTSELECTOR;
 }
 break;
 default:
 break;
 }
 return DefaultWndProc(wnd, msg, p1, p2);
}

void MessageLog(WINDOW wnd)
{
 if (DialogBox(wnd, &Log, TRUE, LogProc)) {
 if (CheckBoxSetting(&Log, ID_LOGGING)) {
 log = fopen("DFLAT.LOG", "wt");
 SetCommandToggle(&MainMenu, ID_LOG);
 }
 else if (log != NULL) {
 fclose(log);
 log = NULL;
 ClearCommandToggle(&MainMenu, ID_LOG);

 }
 }
}

#endif

























































March, 1992
STRUCTURED PROGRAMMING


Vuja De




Jeff Duntemann, KG7JF


Wasn't it George Carlin who defined "vuja de" as the strange feeling you get
when you know you've never been somewhere before? I had a serious case of vuja
de recently when I went back to my favorite Hallmark card shop, looking for a
sympathy card for a friend whose front porch had collapsed.
I didn't find the card. But lord almighty, I did discover something else:
Hallmark had bought a copy of Card Shark! People who read this column
regularly may recall my October 1991 musings on a "personal vertical
application" called Card Shark, that would make customized greeting cards on a
typical PC system. I suggested that computer stores were not the places to
sell such a product, but that Hallmark Card Shops would eat it up.
They did--just not quite as I had predicted. There on the wall was a PC in a
pastel plywood coffin beside a rack of "blank inside" cards of various
designs. For a not-so-nominal fee, shoppers could create a customized message
(such as "Sympathy on the collapse of your front porch") and have it
laserprinted on the inside of a card with a picture of a front porch
collapsing.
Well, not quite. The art on the cards isn't customizable yet. I suspect,
however, that they're working on it.


Right Answer, Wrong Question


I find it interesting that Hallmark decided to sell cards rather than
card-creation software. The system is very much what I had envisioned (and I
honestly, truly had never seen the Hallmark system when I wrote October's
column last summer!) except that it hadn't been taken quite as far. I was
right that Hallmark liked the system--so much so that they decided to keep it
to themselves. Was this smart? We'll find out when hordes of hungry
programmers decide to clone the system for the home marketplace. (You get one
guess which side I'm betting on....)
Hallmark's somewhat shortsighted action isn't difficult to understand when you
remind yourself that Hallmark is in the card business, not the software
business. The Hallmark honchos use computers to do things. The granularity
level of their computer thinking is the system; that is, a machine with
software to do a specific thing. To them, buying a whole bunch of computers to
create cards in their stores is a simple extension of the ordinary business
practices they had been using all along. Cutting a deal with a programmer to
resell a software package that creates cards on a home computer would just
plain smell wrong.
The failure of Hallmark Cards is not in their software design, which (from
what I saw while making a card) is quite nice. Their failure lay in the
analysis that produced the system. It was plainly the right answer to the
wrong question. So I think it's time we went back to the issues of software
design for a while; for now, to the difference between analysis and design.


The Inkblot Effect


The difference between analysis and design is a lot like those inkblots that
look like John F. Kennedy. Until some right-brain insight pops in your head,
they're just inkblots. But once you see (or once someone points out) the
president's face in the inkblots, you can never understand how other people
don't see it immediately.
It's this simple: Analysis is the process of describing the problem to be
solved. Design is the process of describing the solution. Programming (the
necessary third leg of a three-legged stool) is the process of implementing
that solution.
Self-taught programmers have a lot of trouble with this triad, in part because
they have no one to point out the faces in the inkblots, and in part because
of an insidious sort of square-cubed law of program complexity. When you write
your first useful programs, they're often little utilities that do one thing
and one thing only. The problem to be solved can be stated in one sentence.
The design of the program is a one-page sort-of-a-flowchart in Burnt Sienna
crayon. The implementation is three hundred lines of Turbo Pascal.
As programming projects grow more ambitious, the three legs of the stool grow
unevenly. The design grows faster than the number of lines of code, because
smart programmers learn early how to create general-purpose software tools
that can contribute to many aspects of an application's functionality.
However, the leg that grows the fastest is neither the code nor the design,
but the statement of the problem. The interconnectedness of the problem's
elements grows as the cube of the number of problem elements, and the
assumptions underlying that interconnectedness are part of the problem.
It took me years to figure this one out.


The Nature of the Question


Simply put (and in most consultant-style programming projects), it's the
analyst's job to describe the way things are done now in the area to be
automated (that is, without taking any future automation into account). This
includes obvious things such as the nature of the information that passes from
hand to hand in the course of getting things done. But it has to include a
whole lot of less obvious things, such as why things are done the way they're
done. This "why" includes all of the subtle assumptions that people on the
inside often take for granted--if they truly understand them at all. Getting
everything is the essence of analysis--and it is murderously difficult.
On the flipside, the analyst has to avoid "optimizing" current processes
consciously or subconsciously. In any process there may be activities that
don't seem necessary or connected in any way with anything else. A hasty,
egotistical, or insufficiently observant analyst may assume that these
puzzling activities are unnecessary, or else part of some unrelated process
and simply write them out of the picture.
The end result of an analysis is a document that will allow an intelligent
outsider to obtain a correct understanding of the current state of a process.
As I'll say again later, it's a how-to-do-it book for the business being
automated.


The Toughest Part of the Job


Analysis is plainly the toughest part of software development. It is the least
amenable to automation and provides very little feedback or self-validation.
You can totally blow an analysis, and nobody will suspect until the company
goes belly-up. It's often very hard to see the wrongness of the question
beyond the glow of a correctly implemented answer.
Programmers are supposed to make lousy analysts. This is true only in that
design and programming are more fun, and we'd rather be doing that (or
probably anything) than analysis. Apart from this very human failing, however,
programmers make ideal analysts, for these reasons:
Programmers are detail oriented. They have to be. Missing one pointer
initialization can blow your session to kingdom come, and this tends to get
your attention. B.S. and hand-waving are instantly detected and punished.
Programmers learn quickly. Again, this is a survival skill. Our industry
evolves at such a breakneck pace that we are forever climbing one silly
learning curve or another--often several at once. A process or system that
sits still long enough to be studied in detail would almost seem an
extravagance to most programmers.
Programmers are structured thinkers. To be rational and graspable, an analysis
must be every bit as structured as a design or a piece of code. Programmers
are used to seeing the structure in code, and are very comfortable imposing a
structure on a software design. Seeing the structure in an existing process or
situation is a similar skill.
Finally, programmers are professional outsiders. One common failing of naive
or badly trained analysts is simply being too close to the process being
studied to take note of all significant elements. The best analysts do not
come from the department being analyzed. Almost by definition (given the job
they do) programmers are outside of virtually every department in a corporate
hierarchy. Many programmers are so wrapped up in their current tasks that they
are generally outsiders even in their own departments. This detachment is a
valuable and little-appreciated asset, if you can get a programmer to set his
or her programming biases aside (that is, if you can get a programmer to
genuinely enjoy analysis enough to put their full mental muscles behind it and
do it well).


Analysis Methodologies



So. Do I recommend analysis methods such as Structured Analysis (Yourdon) or
Object-Oriented Analysis (Yourdon/Coad)? Emphatically not. My reason is that
both methods end up imposing sets of computer-scented biases on the
description of a process or a situation. The problem with centering an
analysis around data flow diagrams is the unstated assumption that data is the
predominant and most important element being described. Worse, those vague,
subtle, human-y interactions that can make or break an automation project
don't fit into bubbles very well. Human systems are best described in wholly
human terms.
This problem exists in Yourdon-style Structured Analysis, but given some care
and restraint it can be dealt with. Not so with Object-Oriented Analysis,
which is as wrong-minded an analysis philosophy as I've ever seen. Right from
the start, OOA imposes a template on a description that assumes a particular
software design methodology, and even a programming paradigm and coding
scheme. Sheesh, OOA encourages analysts to impose classes on a system
description, and insists that everything in a system be assigned to a
computer-friendly cubbyhole such as data or method. Danger, danger, Will
Robinson! Anything that doesn't look like data or methods will either be
pounded into one or the other or else swept behind a file cabinet, to emerge
months later with fangs and brass knuckles.
The whole purpose of OOA (as the authors hint in their book of the same name)
is to fold the analysis stage into the design stage in the interest of
productivity; plainly (if not explicitly) to do away with it altogether. OOA
is an impatient method, and impatience is its own punishment.
Finally, 200 pages of bubble charts can hide a lot of incompetence. They can
be an excellent smokescreen, and to many phony analysts, cranking out endless
incomprehensible charts becomes a substitute for observing objectively,
probing insightfully, and describing things well.


The Duntemann Analysis Method


I'll accept nastygrams on the above subject gracefully, but I'm unlikely to be
swayed. I have my own method, and while I can't present it rigorously in a
single magazine column, I'll try to state it in broad if informal terms.
My method rests on two fundamental principles. Here's Principle #1: Write an
analysis as though it were a book. That is, make your description a text
description. Keep the diagrams to no more than 20-25 percent of the bulk of
the document. Writing up an analysis as a text description means you have the
medium to convey all elements of the process being described, whether they
have computer analogs or not.
Important corollary: If you can't write clearly, you have no damned business
being an analyst. Analysis is fundamentally a communications skill.
Programmers can make terrific analysts, but only if they know how to write in
English as well as C.
And Principle #2: See only the computers that are already there. Where
analysts get in trouble is where they begin mixing analysis and design. The
interface between analysis and design is a tricky one, to which I'll return at
the end of the column. There is a place in an analysis for the analyst to
suggest the shape of a solution, but that can't be done until the problem has
been separately and completely described.
A subtler hazard is to subconsciously ignore aspects of an analysis that don't
plug well into a computer framework. The Updates Clerk may tell you, "We batch
updates for phone verification on Fridays, because on Fridays the salesmen are
in the weekly sales meeting and we can always get an outside line." You may
laugh in sympathy, shake your head, and ignore that vital piece of
information. After the new office automation system is in place, you discover
that there aren't enough phone lines to make it happen, except (sometimes) on
Fridays. You missed a clue because it wasn't really a data flow, and it wasn't
really a process....


The Analysis Document


A good analysis document has these parts: an overview, a structured
description, a recommendations summary, a warnings summary, and a glossary. I
separate the document into these parts for a number of reasons, but perhaps
the least obvious is that when the project as a whole is designed and
implemented, the overview, structured description, and glossary can be readily
edited into the system documentation.
The overview is just that: Look at the problem from a height, give some
history of the evolution of things up to the current day, and describe in the
broadest possible terms what the process being described involves. This is
"orientation" for the outsider. Make it plain that there is a glossary and
that any jargon in the overview will be covered in the glossary. Then
enumerate the several (rule of thumb: no more than ten) largest elements of
the process being described, and the broad relationships between them.
Considered as a process to be analyzed, a small magazine publishing company
might have the following major elements: ad sales, circulation marketing,
circulation fulfillment, editorial planning, art and production, office
management, personnel, and accounting. Your overview would explain briefly
what each of these elements is, how they are different, and how they relate to
one another. Circulation marketing, for example, is the process of gathering
subscribers, whereas circulation fulfillment is the process of getting
magazines into their hands.
Many magazine people lump these two areas together, but I consider them
separate because whereas all magazines are distributed in essentially the same
way, there are radically different kinds of magazine circulation (paid and
controlled, primarily) that require very different mechanisms. This is the
sort of thing you would learn while doing your analysis. Needless to say, you
can't write the overview until you've done a great deal of looking, listening,
and probing.


The Structured Description


The structured description is the largest single part of the analysis
document. Here, you break down each of the major elements of the process into
smaller elements, in hierarchal fashion, describing in text with figures only
when necessary.
Your skills at structuring program code can come in handy here, as long as you
keep in mind that what you are structuring is human activity and not program
code. And as always in analysis, you must resist the temptation to design the
system while you're analyzing the problem.
A lot of the descriptions will deal with inputs and outputs and processes.
Describing these should be straightforward.
That's not enough, however: You must also state why things are done the way
they are. The "why" material is in many respects the real value-added of an
analysis--it contains the constraints that will critically shape the design
later on.
A somewhat simpleminded example involves the shipping class of magazines
versus other mailed materials. Bills and renewal notices are mailed first
class, magazines are mailed second class, and subscription offers are mailed
third class. A naive system designer might think that these divisions exist
strictly for cost reasons, when in fact second class postage is limited to
magazines alone, and magazines with a certain minimum number of editorial
pages to boot. He might include a menu option to mail subscription offers
second class, when in fact the Post Office would forbid the mailing. Postal
rules, not simply mailing costs, dictate the division of mailing materials by
postal class.
Another thing to watch out for in terms of "why" material are
"industry-customary" things that may not be matters of law but are still
outside the control of the process being analyzed. For example, in analyzing a
small magazine publishing company, you might describe that newsstand
distributors are given dollar-for-dollar credit on covers torn from unsold
magazines and returned. A naive system designer might consider this primitive,
and may put together a form to be filled out by the newsstand distributor in
lieu of returning actual torn covers. Some distributors might comply and some
might not--but the important fact is that newsstand distributors have their
ways of doing things and they do not generally take orders from small
publishers!
A good outline processor works very well in creating the structured
description, although in a large analysis you may have to break the file into
a couple of chunks. The mental process I use is an observational sort of
"stepwise refinement" not unlike that championed by Niklaus Wirth. You begin
with one of the major functional areas of the process being described, and
make it a major heading. Then you discern all of the next-level components
within it, and make those minor headings. Then you begin with the first of the
minor headings and discern all of ~~ ~~~~-level components, and so on.
Under each heading, describe the nature of the element being described, at its
level only. In other words, until you hit the "bottom" level in the outline,
you're going to be writing summary information of some kind. My rule is that a
header with subheads should describe only those things held in common by all
of the subheads. Details always sink to the bottom.
Some analysts are pathologically afraid of duplicating description. You'll
find that there are common threads in many elements of a process, and
sometimes it seems you're covering ground you've covered before. And why not?
If a subprocess happens as a separate entity, describe it--even if it's
largely identical to numerous other processes. Identify what differences there
are, but describe the whole thing.


The Recommendations Summary


An analyst's job is to describe the current problem, not to design a solution.
On the other hand, once the analysis is done, the analyst has a pretty fair
grasp of things, and may well have had some inspirations on what's to be done.
If the analyst is a programmer, he or she may well be ~~~~~~~ at the ~~~~~
~~~~~~~ to suggest things, and this part of the document exists for that
purpose.
I insist on this section of the document because it short-circuits the
temptation to design within the structured description, which can be fatal.
The recommendations summary can be as freeform as the analyst wishes. It acts
mostly as an idea cauldron, and gives the designer some seeds to crystallize a
design around.


The Warnings Summary


The warnings summary is a lot more important. In my design philosophy, a
system design is shaped most potently by constraints; that is, things that
cannot be done. Major constraints are often obvious, but minor ones can hide
very well. The ones that hide the best are human-founded constraints, often
imposed informally, sometimes imposed solely by coincidence. Infrastructure
constraints are very important (phone lines, power service, noise, network
access) and are often ignored because they're "outside the bounds of the
system." This is exactly why they are constraints.
Constraints should be mentioned in the structured description, but I feel they
are important enough to be gathered together in a separate portion of the
document. Organize them as you like. A simple list may be enough.


The Glossary


Most processes have acronyms and insider jargon, and the more corporate the
culture, the more acronyms and jargon you're going to find. Back at Xerox we
had TRICCs, TRDRs, RDCs, IMOs, FWSSes, LRSs, and lord knows what else. Beware
of insider biases: Even jargon that might seem obvious within an industry
should be defined for the sake of outsiders who might have a hand in designing
and implementing the system. Every magazine person knows what CPM means. (Cost
Per Thousand.) In a physical therapy office, however, CPM is just as
well-known but stands for Continuous Passive Motion. And if I recall there was
once an operating system...
As you encounter it, place acronyms and jargon in a file, along with a short
description of each. Each time you add some explanatory material on a phrase
or acronym to the structured description, add a pointer to the appropriate
glossary entry that points back to the subsection number containing the
explanatory material:
CPM: Cost Per Thousand. A measure of relative value of advertising media or
mailing lists, given as the cost in dollars per thousand readers or list
names. See 9.2.4.12 and 4.7.7.

If I were called upon to do another analysis, I think I would keep the
glossary in a text database of some sort, and handle it the same way I would
handle the index of a book. After all, this is a book!


The Three Skills of Analysis


That's the nature of the document. Now, how do you create it? There are three
major skills involved:
Observe objectively. In Stranger in a Strange Land, Robert Heinlein gave us
the concept of a Fair Witness. As he put it, a Fair Witness would look at a
red barn and say, "The side of the barn facing me is painted red. I cannot
comment on the color of the other three sides." In other words, as you observe
a process, see what's there. Don't make too many assumptions, and when you do,
verify those assumptions by probing; in other words, when you can't see
clearly, go take a walk around to other side of the barn.
Probe insightfully. The only dumb question is the one you didn't ask. Still, a
clumsy questioner will receive clumsy answers, or, worse, answers to questions
that weren't asked. Frame questions in terms of things you already know, in
the hope that the answer will extend the set of facts you already have. Don't
jump around. Lord knows, take good notes.
Also, have some empathy for the people who provide you with information. The
people at the bottom of a corporate hierarchy are generally overworked,
underpaid, harassed, and without authority or sufficient time to get their
jobs done. Try to get in their way as little as possible, keeping in mind that
they're the only ones in a company who really know how anything works.
Describe things well. Writing well is essential. Don't try to make the
analysis sound weighty or important. Just make it clear. Write as though you
were describing something to a client across your desk. Tell it straight.
Leave out the legalese and academic weasel-talk. Keep a light heart if it
won't get you fired. (And if it does, you were too good to be working there!)
I can't tell you precisely how to put all these elements together. You have to
gather information, organize it, and write it down. The only process that will
work is the one that mirrors the way you think, organize, and express.


Market Analysis


My earlier criticism of Hallmark is in fact a little unfair. What they had to
do before they created their system was market analysis, which is something
like the process analysis I've described here, except that the process doesn't
exist yet. Market analysis requires market research, which is something I've
never had to do.
But at the core of it, I feel that Hallmark's market analysis failed by being
insufficiently detached from their current way of operating. It's a little
like the difference between being a railroad company and a transportation
company. The key is getting stuff from here to there, not the shape of the
thing that carries it.
In the card business, the key is getting a card into the hands of the
consumer. You can sell them a finished card, or else a cardmaker and supplies.
Selling cards may be more profitable now--but you can never discount the
possibility that somebody else will begin selling cardmakers down the road.
Hallmark's in-store card customizer system is an interesting answer. But it's
only one answer, shaped by the choice of the question. That's what analysis
is: choosing the question. We'll soon see how well they chose.










































March, 1992
GRAPHICS PROGRAMMING


Fast 3-D Animation: Meet X-Sharp


 This article contains the following executables: XSHARP.ARC


Michael Abrash


Across the lake from Vermont, a few miles into upstate New York, the Ausable
River has carved out a fairly impressive gorge known as "Ausable Chasm."
Impressive for the East, anyway; you might think of it as the poor man's Grand
Canyon. This summer, I did the tour with my wife and five-year-old, and it was
fun, although I confess that I didn't loosen my grip on my daughter's hand
until we were on the bus and headed for home; that gorge is deep, and the
railings tend to be of the single-bar, rusted-out variety.
New Yorkers can drive straight to this wonder of nature, but Vermonters must
take their cars across on the ferry; the alternative is driving three hours
around the south end of Lake Champlain. No problem; the ferry ride is an hour
well spent on a beautiful lake. Or, rather, no problem -- once you're on the
ferry. Getting to New York is easy, but, as we found out, the line of cars
waiting to come back from Ausable Chasm gets lengthy about mid-afternoon. The
ferry can hold only so many cars, and we wound up spending an unexpected hour
exploring the wonders of the ferry docks. Not a big deal, with a good-natured
kid and an entertaining mom; we got ice cream, explored the beach, looked
through binoculars, and told stories. It was a fun break, actually, and before
we knew it, the ferry was steaming back to pick us up.
A friend of mine, an elementary-school teacher, helped take 65 sixth graders
to Ausable Chasm. Never mind the potential for trouble with 65 kids loose on a
ferry. Never mind what it was like trying to herd that group around a gorge
that looks like it was designed to swallow children and small animals without
a trace. The hard part was getting back to the docks and finding they'd have
to wait an hour for the next ferry. As my friend put it, "Let me tell you, an
hour is an eternity with 65 sixth graders screaming the song 'You Are My
Sunshine.'"
Apart from reminding you how lucky you are to be working in a quiet,
airconditioned room, in front of a gently humming computer, free to think deep
thoughts and eat Cheetos to your heart's content, this story provides a useful
perspective on the malleable nature of time. An hour isn't just an hour--it
can be forever, or it can be the wink of an eye. Just think of the last hour
you spent working under a deadline; I bet it went past in a flash. Which is
not to say, mind you, that I recommend working in a bus full of screaming kids
in order to make time pass more slowly; there are quality issues here, as
well.
In our 3-D animation work so far, we've used floating-point arithmetic.
Floating-point arithmetic, even with a floating-point processor but especially
without, is the microcomputer animation equivalent of working in a school bus:
It takes forever to do anything, and you just know you're never going to
accomplish as much as you want to. This month, it's time for fixed-point
arithmetic, which will give us an instant order-of-magnitude performance
boost. We'll also give our 3-D animation code a much more powerful and
extensible framework, making it easy to add new and different sorts of
objects. Taken together, these alterations will let us start to do some really
interesting animation. Unfortunately, they take a lot of code, so I'll have to
keep the text short. Therefore, without further ado, I give you real real-time
animation.


Fixed Point, Native 386, and More


As of last month, we were at the point where we could rotate, move, and draw a
solid cube in real time. This month's program, shown in Listings One through
Ten (pages 134 through 138), goes a bit further, rotating 12 solid cubes at an
update rate of about 15 frames per second (fps) on a 20-MHz 386 with a slow
VGA. That's 12 transformation matrices, 72 polygons, and 96 vertices being
handled in real time; not Star Wars, granted, but a giant step beyond a single
cube. Run the program if you get a chance; you may be surprised at just how
effective this level of animation is. I'd like to point out, in case anyone
missed it, that this is fully general 3-D. I'm not using any shortcuts or
tricks, like prestoring coordinates or pregenerating bitmaps; if you were to
feed in different rotations or vertices, the animation would change
accordingly.
The keys to this month's performance are three. The first key is fixed-point
arithmetic. In the last two months, we've worked with floating-point
coordinates and transformation matrices. Those values are now stored as 32-bit
fixed-point numbers, in the form 16.16 (16 bits of whole number, 16 bits of
fraction). 32-bit fixed-point numbers allow sufficient precision for 3-D
animation, but can be manipulated with fast integer operations, rather than
slow floating-point processor operations or excruciatingly slow floating-point
emulator operations. Although the speed advantage of fixed-point varies
depending on the operation, the processor, and whether a coprocessor is
present, fixed-point multiplication can be as much as 100 times faster than
the emulated floating-point equivalent. (I'd like to take a moment to thank
Chris Hecker for his invaluable input in this area.)
The second performance key is the use of the 386's native 32-bit multiply and
divide instructions. Real-mode C compilers, such as Borland C++ and Microsoft
C, call library routines to perform multiplications and divisions involving
32-bit values, and those library functions are fairly slow, especially for
division. On a 386, 32-bit multiplication and division can be handled with the
bit of code in Listing Nine--and most of even that code is only for rounding.
The third performance key is maintaining and operating on only the relevant
portions of transformation matrices and coordinates. The bottom row of every
transformation matrix we'll use (at least for the near future) is [0 0 0 1],
so why bother using or recalculating it when concatenating transforms and
transforming points? Likewise for the fourth element of a 3-D vector in
homogeneous coordinates, which is always 1. Basically, transformation matrices
are treated as consisting of a 3 x 3 rotation matrix and a 3 x 1 translation
vector, and coordinates are treated as 3 x 1 vectors. This saves a great many
multiplications in the course of transforming each point.
Just for fun, I reimplemented the animation of Listings One through Ten with
floating-point instructions. Together, the preceeding optimizations improve
the performance of the entire animation--including drawing time and overhead,
not just math--by more than ten times over the code that uses the
floating-point emulator. Amazing what one can accomplish with a few dozen
lines of assembler and a switch in number format, isn't it? Note that no
assembly code other than the native 386 multiply and divide is used in
Listings One through Ten, although the polygon fill code is of course mostly
in assembler; we've achieved 12 cubes animated at 15 fps while doing the 3-D
work almost entirely in Borland C++--and we're still doing sine and cosine via
the floating-point emulator. Happily, we're still nowhere near the upper limit
on the animation potential of the PC.


Drawbacks


The techniques we've used to turbocharge 3-D animation are very powerful, but
there's a dark side to them as well. Obviously, native 386 instructions won't
work on 8088 and 286 machines. That's rectifiable; equivalent multiplication
and division routines could be implemented for real mode (and I may just do
that one of these months, especially if enough of you give me a hard time for
taking the easy way out with 386 instructions), and performance would still be
reasonable. It sure is nice to be able to plug in a 32-bit IMUL or DIV and be
done with it, though. More importantly, 32-bit fixed-point arithmetic has
limitations in range and accuracy. Points outside a 64Kx64Kx64K space can't be
handled, imprecision tends to creep in over the course of multiple matrix
concatenations, and it's quite possible to generate the dreaded divide by 0
interrupt if Z coordinates with absolute values less than one are used. I
don't have space to discuss these issues in detail now, but here are some
brief thoughts. The working 64Kx64Kx64K fixed-point space can be paged into a
larger virtual space. Imprecision of a pixel or two rarely matters in terms of
display quality, and deterioration of concatenated rotations can be corrected
by restoring orthogonality, for example by periodically calculating one row of
the matrix as the cross-product of the other two (forcing it to be
perpendicular to both). 3-D clipping with a front clip plane of -1 or less can
prevent divide overflow.


A New Animation Framework


Listings One through Ten represent not merely faster animation, but also a
mostly complete, extensible, data-driven animation framework. Where earlier
animation code was hardwired to demonstrate certain concepts, this month's
code is intended to serve as the basis for a solid animation package. Objects
are stored, in their entirety, in customizable structures; new structures can
be devised for new sorts of objects. Drawing, preparing for drawing, and
moving are all vectored functions, so that variations such as shading or
texturing, or even radically different sorts of graphics objects, such as
scaled bitmaps, could be supported. The cube initialization is entirely data
driven; more or different cubes, or other sorts of convex polyhedrons, could
be added by simply changing the initialization data in Listing Eight.
The animation framework is not yet complete. Movement is supported only along
the Z axis, and then in a non-general fashion. More interesting movement isn't
supported at this point because of the one gaping hole in the package:
hidden-surface removal. Until this is implemented--and it will be,
soon--nothing can safely overlap. It would actually be easy enough to perform
hidden-surface removal by keeping the cubes in different Z bands and drawing
them back to front, but this gets into sorting and list issues, and is not a
complete solution--and I've crammed as much as will fit into this month's
code, anyway.


Where the Time Goes


The distribution of execution time in the animation code is no longer wildly
biased toward transformation, but sine and cosine are certainly still sucking
up cycles. Likewise, the overhead in the calls to FixedMul() and FixedDiv()
are costly. Much of this is correctable with a little carefully crafted
assembly language and a lookup table; expect that soon. When all that is
firmly in place, we'll take a look at the number of pixels being drawn versus
the bandwidth of display memory; that'll give us an idea of how close we are
to the theoretical limit of VGA animation. Probably not too close, even with
those optimizations; a faster 2-D clipping approach and still faster
polygon-fill code will most likely be in order. (Yes, Virginia, there is an
even faster way to fill polygons!)
Regardless, this month we have made the critical jump to a usable level of
performance and a serviceable framework. From here on out, it's the fun stuff.


X-Sharp


Three-dimensional animation is a complicated business, and it takes an
astonishing amount of functionality just to get off the launching pad: page
flipping, polygon filling, clipping, transformations, list management, and so
forth. There's no way all of that could fit in a single column, and in fact
I've been building toward a critical mass of animation functionality since my
very first column in DDJ. This month builds on code from no less than five
previous columns. The code that's required in order to link this month's
animation program is: Listing One from January (draw clipped line list);
Listings One and Six from July 1991 (mode X mode set, rectangle fill); Listing
Six from September 1991; Listing Four in March 1991 (polygon edge scan); and
the FillConvexPolygon( ) function from Listing One from February 1991. The
struct keywords in FillConvexPolygon( ) must be removed to reflect the switch
to typedefs in the animation header file.
This is the last time I'm going to list all the code needed to build the
animation package, and I am not going to print every change to every module
from now on. The code is simply getting too large to show every bit of it, and
the scope is only going to grow as I add functions such as shading, 3-D
clipping, and hidden surfaces; also, I'm going to be tinkering with stuff such
as converting key code to assembler and modifying functions slightly as
structures change, and I hate to take up valuable space in DDJ for what's
basically fine-tuning and housekeeping.
I think we've reached the point where we can call this an ongoing project and
give it a name. In the spirit of Al Stevens' wonderful D-Flat, I hereby dub
the animation package X-Sharp. (X for mode X, sharp because who wants a flat
animation package?) From now on, I will make the full source for X-Sharp
available, complete with make files, online, and otherwise. It will be
available in the file XSHARPn.ARC in the DDJ Forum on CompuServe and on M&T
Online. Alternatively, you can send me a 360K or 720K formatted diskette and
an addressed, stamped diskette mailer, care of DDJ (411 Borel Ave., San Mateo,
CA 94403-3522), and I'll send you the latest copy of X-Sharp. There's no
charge, but, in the spirit of Al Stevens's "careware," it'd be very much
appreciated if you'd slip in a dollar or so for the folks at the Vermont
Association for the Blind and Visually Impaired, who help the visually
impaired build productive, self-sufficient lives, and are amazingly successful
at it. Imagine for a moment trying to do your work if you lost your sight (and
can be done; I got a request for the te~~ of my book Zen of Assembly Language
in computer-readable form from a blind programmer the other day)--hec~~
imagine trying to cross the street--an~ I suspect you'll understand why it
matters so much. As Al says, it's purely voluntary, but both you and I will
feel good.
I'm available on a daily basis to discuss X-Sharp on M&T Online and Bi~ (user
name MABRASH in both cases).



_GRAPHICS PROGRAMMING COLUMN_
by Michael Abrash


[LISTING ONE]


/* 3D animation program to rotate 12 cubes. Uses fixed point. All C code
tested with Borland C++ 3.0 in C compilation mode and the small model. */
#include <conio.h>
#include <dos.h>
#include "polygon.h"

/* Base offset of page to which to draw */
unsigned int CurrentPageBase = 0;
/* Clip rectangle; clips to the screen */
int ClipMinX = 0, ClipMinY = 0;
int ClipMaxX = SCREEN_WIDTH, ClipMaxY = SCREEN_HEIGHT;
static unsigned int PageStartOffsets[2] =
 {PAGE0_START_OFFSET,PAGE1_START_OFFSET};
int DisplayedPage, NonDisplayedPage;
int RecalcAllXforms = 1, NumObjects = 0;
Xform WorldViewXform; /* initialized from floats */
/* Pointers to objects */
Object *ObjectList[MAX_OBJECTS];

void main() {
 int Done = 0, i;
 Object *ObjectPtr;
 union REGS regset;

 InitializeFixedPoint(); /* set up fixed-point data */
 InitializeCubes(); /* set up cubes and add them to object list; other
 objects would be initialized now, if there were any */
 Set320x240Mode(); /* set the screen to mode X */
 ShowPage(PageStartOffsets[DisplayedPage = 0]);
 /* Keep transforming the cube, drawing it to the undisplayed page,
 and flipping the page to show it */
 do {
 /* For each object, regenerate viewing info, if necessary */
 for (i=0; i<NumObjects; i++) {
 if ((ObjectPtr = ObjectList[i])->RecalcXform 
 RecalcAllXforms) {
 ObjectPtr->RecalcFunc(ObjectPtr);
 ObjectPtr->RecalcXform = 0;
 }
 }
 RecalcAllXforms = 0;
 CurrentPageBase = /* select other page for drawing to */
 PageStartOffsets[NonDisplayedPage = DisplayedPage ^ 1];
 /* For each object, clear the portion of the non-displayed page
 that was drawn to last time, then reset the erase extent */
 for (i=0; i<NumObjects; i++) {
 ObjectPtr = ObjectList[i];
 FillRectangleX(ObjectPtr->EraseRect[NonDisplayedPage].Left,
 ObjectPtr->EraseRect[NonDisplayedPage].Top,
 ObjectPtr->EraseRect[NonDisplayedPage].Right,
 ObjectPtr->EraseRect[NonDisplayedPage].Bottom,

 CurrentPageBase, 0);
 ObjectPtr->EraseRect[NonDisplayedPage].Left =
 ObjectPtr->EraseRect[NonDisplayedPage].Top = 0x7FFF;
 ObjectPtr->EraseRect[NonDisplayedPage].Right =
 ObjectPtr->EraseRect[NonDisplayedPage].Bottom = 0;
 }
 /* Draw all objects */
 for (i=0; i<NumObjects; i++)
 ObjectList[i]->DrawFunc(ObjectList[i]);
 /* Flip to display the page into which we just drew */
 ShowPage(PageStartOffsets[DisplayedPage = NonDisplayedPage]);
 /* Move and reorient each object */
 for (i=0; i<NumObjects; i++)
 ObjectList[i]->MoveFunc(ObjectList[i]);
 if (kbhit())
 if (getch() == 0x1B) Done = 1; /* Esc to exit */
 } while (!Done);
 /* Return to text mode and exit */
 regset.x.ax = 0x0003; /* AL = 3 selects 80x25 text mode */
 int86(0x10, &regset, &regset);
 exit(1);
}






[LISTING TWO]

/* Transforms all vertices in the specified polygon-based object into view
space, then perspective projects them to screen space and maps them to screen
coordinates, storing results in the object. Recalculates object->view
transformation because only if transform changes would we bother
to retransform the vertices. */
#include <math.h>
#include "polygon.h"

void XformAndProjectPObject(PObject * ObjectToXform)
{
 int i, NumPoints = ObjectToXform->NumVerts;
 Point3 * Points = ObjectToXform->VertexList;
 Point3 * XformedPoints = ObjectToXform->XformedVertexList;
 Point3 * ProjectedPoints = ObjectToXform->ProjectedVertexList;
 Point * ScreenPoints = ObjectToXform->ScreenVertexList;

 /* Recalculate the object->view transform */
 ConcatXforms(WorldViewXform, ObjectToXform->XformToWorld,
 ObjectToXform->XformToView);
 /* Apply that new transformation and project the points */
 for (i=0; i<NumPoints; i++, Points++, XformedPoints++,
 ProjectedPoints++, ScreenPoints++) {
 /* Transform to view space */
 XformVec(ObjectToXform->XformToView, (Fixedpoint *) Points,
 (Fixedpoint *) XformedPoints);
 /* Perspective-project to screen space */
 ProjectedPoints->X =
 FixedMul(FixedDiv(XformedPoints->X, XformedPoints->Z),
 DOUBLE_TO_FIXED(PROJECTION_RATIO * (SCREEN_WIDTH/2)));

 ProjectedPoints->Y =
 FixedMul(FixedDiv(XformedPoints->Y, XformedPoints->Z),
 DOUBLE_TO_FIXED(PROJECTION_RATIO * (SCREEN_WIDTH/2)));
 ProjectedPoints->Z = XformedPoints->Z;
 /* Convert to screen coordinates. The Y coord is negated to flip from
 increasing Y being up to increasing Y being down, as expected by polygon
 filler. Add in half the screen width and height to center on screen */
 ScreenPoints->X = ((int) ((ProjectedPoints->X +
 DOUBLE_TO_FIXED(0.5)) >> 16)) + SCREEN_WIDTH/2;
 ScreenPoints->Y = (-((int) ((ProjectedPoints->Y +
 DOUBLE_TO_FIXED(0.5)) >> 16))) + SCREEN_HEIGHT/2;
 }
}






[LISTING THREE]

/* Routines to perform incremental rotations around the three axes. */
#include <math.h>
#include "polygon.h"

/* Concatenate a rotation by Angle around the X axis to transformation in
 XformToChange, placing the result back into XformToChange. */
void AppendRotationX(Xform XformToChange, double Angle)
{
 Fixedpoint Temp10, Temp11, Temp12, Temp20, Temp21, Temp22;
 Fixedpoint CosTemp = DOUBLE_TO_FIXED(cos(Angle));
 Fixedpoint SinTemp = DOUBLE_TO_FIXED(sin(Angle));

 /* Calculate the new values of the six affected matrix entries */
 Temp10 = FixedMul(CosTemp, XformToChange[1][0]) +
 FixedMul(-SinTemp, XformToChange[2][0]);
 Temp11 = FixedMul(CosTemp, XformToChange[1][1]) +
 FixedMul(-SinTemp, XformToChange[2][1]);
 Temp12 = FixedMul(CosTemp, XformToChange[1][2]) +
 FixedMul(-SinTemp, XformToChange[2][2]);
 Temp20 = FixedMul(SinTemp, XformToChange[1][0]) +
 FixedMul(CosTemp, XformToChange[2][0]);
 Temp21 = FixedMul(SinTemp, XformToChange[1][1]) +
 FixedMul(CosTemp, XformToChange[2][1]);
 Temp22 = FixedMul(SinTemp, XformToChange[1][2]) +
 FixedMul(CosTemp, XformToChange[2][2]);
 /* Put the results back into XformToChange */
 XformToChange[1][0] = Temp10; XformToChange[1][1] = Temp11;
 XformToChange[1][2] = Temp12; XformToChange[2][0] = Temp20;
 XformToChange[2][1] = Temp21; XformToChange[2][2] = Temp22;
}
/* Concatenate a rotation by Angle around the Y axis to transformation in
 XformToChange, placing the result back into XformToChange. */
void AppendRotationY(Xform XformToChange, double Angle)
{
 Fixedpoint Temp00, Temp01, Temp02, Temp20, Temp21, Temp22;
 Fixedpoint CosTemp = DOUBLE_TO_FIXED(cos(Angle));
 Fixedpoint SinTemp = DOUBLE_TO_FIXED(sin(Angle));


 /* Calculate the new values of the six affected matrix entries */
 Temp00 = FixedMul(CosTemp, XformToChange[0][0]) +
 FixedMul(SinTemp, XformToChange[2][0]);
 Temp01 = FixedMul(CosTemp, XformToChange[0][1]) +
 FixedMul(SinTemp, XformToChange[2][1]);
 Temp02 = FixedMul(CosTemp, XformToChange[0][2]) +
 FixedMul(SinTemp, XformToChange[2][2]);
 Temp20 = FixedMul(-SinTemp, XformToChange[0][0]) +
 FixedMul( CosTemp, XformToChange[2][0]);
 Temp21 = FixedMul(-SinTemp, XformToChange[0][1]) +
 FixedMul(CosTemp, XformToChange[2][1]);
 Temp22 = FixedMul(-SinTemp, XformToChange[0][2]) +
 FixedMul(CosTemp, XformToChange[2][2]);
 /* Put the results back into XformToChange */
 XformToChange[0][0] = Temp00; XformToChange[0][1] = Temp01;
 XformToChange[0][2] = Temp02; XformToChange[2][0] = Temp20;
 XformToChange[2][1] = Temp21; XformToChange[2][2] = Temp22;
}

/* Concatenate a rotation by Angle around the Z axis to transformation in
 XformToChange, placing the result back into XformToChange. */
void AppendRotationZ(Xform XformToChange, double Angle)
{
 Fixedpoint Temp00, Temp01, Temp02, Temp10, Temp11, Temp12;
 Fixedpoint CosTemp = DOUBLE_TO_FIXED(cos(Angle));
 Fixedpoint SinTemp = DOUBLE_TO_FIXED(sin(Angle));

 /* Calculate the new values of the six affected matrix entries */
 Temp00 = FixedMul(CosTemp, XformToChange[0][0]) +
 FixedMul(-SinTemp, XformToChange[1][0]);
 Temp01 = FixedMul(CosTemp, XformToChange[0][1]) +
 FixedMul(-SinTemp, XformToChange[1][1]);
 Temp02 = FixedMul(CosTemp, XformToChange[0][2]) +
 FixedMul(-SinTemp, XformToChange[1][2]);
 Temp10 = FixedMul(SinTemp, XformToChange[0][0]) +
 FixedMul(CosTemp, XformToChange[1][0]);
 Temp11 = FixedMul(SinTemp, XformToChange[0][1]) +
 FixedMul(CosTemp, XformToChange[1][1]);
 Temp12 = FixedMul(SinTemp, XformToChange[0][2]) +
 FixedMul(CosTemp, XformToChange[1][2]);
 /* Put the results back into XformToChange */
 XformToChange[0][0] = Temp00; XformToChange[0][1] = Temp01;
 XformToChange[0][2] = Temp02; XformToChange[1][0] = Temp10;
 XformToChange[1][1] = Temp11; XformToChange[1][2] = Temp12;
}






[LISTING FOUR]

/* Fixed point matrix arithmetic functions */
#include "polygon.h"

/* Matrix multiplies Xform by SourceVec, and stores the result in DestVec.
Multiplies a 4x4 matrix times a 4x1 matrix; the result is a 4x1 matrix. Cheats
by assuming the W coord is 1 and bottom row of matrix is 0 0 0 1, and doesn't

bother to set the W coordinate of the destination */
void XformVec(Xform WorkingXform, Fixedpoint *SourceVec,
 Fixedpoint *DestVec)
{
 int i;

 for (i=0; i<3; i++)
 DestVec[i] = FixedMul(WorkingXform[i][0], SourceVec[0]) +
 FixedMul(WorkingXform[i][1], SourceVec[1]) +
 FixedMul(WorkingXform[i][2], SourceVec[2]) +
 WorkingXform[i][3]; /* no need to multiply by W = 1 */
}

/* Matrix multiplies SourceXform1 by SourceXform2 and stores result in
 DestXform. Multiplies a 4x4 matrix times a 4x4 matrix; result is a 4x4
matrix.
 Cheats by assuming bottom row of each matrix is 0 0 0 1, and doesn't bother
 to set the bottom row of the destination */
void ConcatXforms(Xform SourceXform1, Xform SourceXform2,
 Xform DestXform)
{
 int i, j;

 for (i=0; i<3; i++) {
 for (j=0; j<4; j++)
 DestXform[i][j] =
 FixedMul(SourceXform1[i][0], SourceXform2[0][j]) +
 FixedMul(SourceXform1[i][1], SourceXform2[1][j]) +
 FixedMul(SourceXform1[i][2], SourceXform2[2][j]) +
 SourceXform1[i][3];
 }
}






[LISTING FIVE]

/* Set up basic data that needs to be in fixed point, to avoid data
 definition hassles. */
#include "polygon.h"

/* All vertices in the basic cube */
static IntPoint3 IntCubeVerts[NUM_CUBE_VERTS] = {
 {15,15,15},{15,15,-15},{15,-15,15},{15,-15,-15},
 {-15,15,15},{-15,15,-15},{-15,-15,15},{-15,-15,-15} };
/* Transformation from world space into view space (no transformation,
 currently) */
static int IntWorldViewXform[3][4] = {
 {1,0,0,0}, {0,1,0,0}, {0,0,1,0}};

void InitializeFixedPoint()
{
 int i, j;

 for (i=0; i<3; i++)
 for (j=0; j<4; j++)
 WorldViewXform[i][j] = INT_TO_FIXED(IntWorldViewXform[i][j]);

 for (i=0; i<NUM_CUBE_VERTS; i++) {
 CubeVerts[i].X = INT_TO_FIXED(IntCubeVerts[i].X);
 CubeVerts[i].Y = INT_TO_FIXED(IntCubeVerts[i].Y);
 CubeVerts[i].Z = INT_TO_FIXED(IntCubeVerts[i].Z);
 }
}






[LISTING SIX]

/* Rotates and moves a polygon-based object around the three axes.
 Movement is implemented only along the Z axis currently. */
#include "polygon.h"

void RotateAndMovePObject(PObject * ObjectToMove)
{
 if (--ObjectToMove->RDelayCount == 0) { /* rotate */
 ObjectToMove->RDelayCount = ObjectToMove->RDelayCountBase;
 if (ObjectToMove->Rotate.RotateX != 0.0)
 AppendRotationX(ObjectToMove->XformToWorld,
 ObjectToMove->Rotate.RotateX);
 if (ObjectToMove->Rotate.RotateY != 0.0)
 AppendRotationY(ObjectToMove->XformToWorld,
 ObjectToMove->Rotate.RotateY);
 if (ObjectToMove->Rotate.RotateZ != 0.0)
 AppendRotationZ(ObjectToMove->XformToWorld,
 ObjectToMove->Rotate.RotateZ);
 ObjectToMove->RecalcXform = 1;
 }
 /* Move in Z, checking for bouncing and stopping */
 if (--ObjectToMove->MDelayCount == 0) {
 ObjectToMove->MDelayCount = ObjectToMove->MDelayCountBase;
 ObjectToMove->XformToWorld[2][3] += ObjectToMove->Move.MoveZ;
 if (ObjectToMove->XformToWorld[2][3]>ObjectToMove->Move.MaxZ)
 ObjectToMove->Move.MoveZ = 0; /* stop if close enough */
 ObjectToMove->RecalcXform = 1;
 }
}






[LISTING SEVEN]

/* Draws all visible faces in specified polygon-based object. Object must have
previously been transformed and projected, so that ScreenVertexList array is
filled in. */
#include "polygon.h"

void DrawPObject(PObject * ObjectToXform)
{
 int i, j, NumFaces = ObjectToXform->NumFaces, NumVertices;
 int * VertNumsPtr;

 Face * FacePtr = ObjectToXform->FaceList;
 Point * ScreenPoints = ObjectToXform->ScreenVertexList;
 long v1, v2, w1, w2;
 Point Vertices[MAX_POLY_LENGTH];
 PointListHeader Polygon;

 /* Draw each visible face (polygon) of the object in turn */
 for (i=0; i<NumFaces; i++, FacePtr++) {
 NumVertices = FacePtr->NumVerts;
 /* Copy over the face's vertices from the vertex list */
 for (j=0, VertNumsPtr=FacePtr->VertNums; j<NumVertices; j++)
 Vertices[j] = ScreenPoints[*VertNumsPtr++];
 /* Draw only if outside face showing (if the normal to the
 polygon points toward viewer; that is, has a positive Z component) */
 v1 = Vertices[1].X - Vertices[0].X;
 w1 = Vertices[NumVertices-1].X - Vertices[0].X;
 v2 = Vertices[1].Y - Vertices[0].Y;
 w2 = Vertices[NumVertices-1].Y - Vertices[0].Y;
 if ((v1*w2 - v2*w1) > 0) {
 /* It is facing the screen, so draw */
 /* Appropriately adjust the extent of the rectangle used to
 erase this object later */
 for (j=0; j<NumVertices; j++) {
 if (Vertices[j].X >
 ObjectToXform->EraseRect[NonDisplayedPage].Right)
 if (Vertices[j].X < SCREEN_WIDTH)
 ObjectToXform->EraseRect[NonDisplayedPage].Right =
 Vertices[j].X;
 else ObjectToXform->EraseRect[NonDisplayedPage].Right =
 SCREEN_WIDTH;
 if (Vertices[j].Y >
 ObjectToXform->EraseRect[NonDisplayedPage].Bottom)
 if (Vertices[j].Y < SCREEN_HEIGHT)
 ObjectToXform->EraseRect[NonDisplayedPage].Bottom =
 Vertices[j].Y;
 else ObjectToXform->EraseRect[NonDisplayedPage].Bottom=
 SCREEN_HEIGHT;
 if (Vertices[j].X <
 ObjectToXform->EraseRect[NonDisplayedPage].Left)
 if (Vertices[j].X > 0)
 ObjectToXform->EraseRect[NonDisplayedPage].Left =
 Vertices[j].X;
 else ObjectToXform->EraseRect[NonDisplayedPage].Left=0;
 if (Vertices[j].Y <
 ObjectToXform->EraseRect[NonDisplayedPage].Top)
 if (Vertices[j].Y > 0)
 ObjectToXform->EraseRect[NonDisplayedPage].Top =
 Vertices[j].Y;
 else ObjectToXform->EraseRect[NonDisplayedPage].Top=0;
 }
 /* Draw the polygon */
 DRAW_POLYGON(Vertices, NumVertices, FacePtr->Color, 0, 0);
 }
 }
}







[LISTING EIGHT]

/* Initializes the cubes and adds them to the object list. */
#include <stdlib.h>
#include <math.h>
#include "polygon.h"

#define ROT_6 (M_PI / 30.0) /* rotate 6 degrees at a time */
#define ROT_3 (M_PI / 60.0) /* rotate 3 degrees at a time */
#define ROT_2 (M_PI / 90.0) /* rotate 2 degrees at a time */
#define NUM_CUBES 12 /* # of cubes */
Point3 CubeVerts[NUM_CUBE_VERTS]; /* set elsewhere, from floats */
/* Vertex indices for individual cube faces */
static int Face1[] = {1,3,2,0};
static int Face2[] = {5,7,3,1};
static int Face3[] = {4,5,1,0};
static int Face4[] = {3,7,6,2};
static int Face5[] = {5,4,6,7};
static int Face6[] = {0,2,6,4};
static int *VertNumList[]={Face1, Face2, Face3, Face4, Face5, Face6};
static int VertsInFace[]={ sizeof(Face1)/sizeof(int),
 sizeof(Face2)/sizeof(int), sizeof(Face3)/sizeof(int),
 sizeof(Face4)/sizeof(int), sizeof(Face5)/sizeof(int),
 sizeof(Face6)/sizeof(int) };
/* X, Y, Z rotations for cubes */
static RotateControl InitialRotate[NUM_CUBES] = {
 {0.0,ROT_6,ROT_6},{ROT_3,0.0,ROT_3},{ROT_3,ROT_3,0.0},
 {ROT_3,-ROT_3,0.0},{-ROT_3,ROT_2,0.0},{-ROT_6,-ROT_3,0.0},
 {ROT_3,0.0,-ROT_6},{-ROT_2,0.0,ROT_3},{-ROT_3,0.0,-ROT_3},
 {0.0,ROT_2,-ROT_2},{0.0,-ROT_3,ROT_3},{0.0,-ROT_6,-ROT_6},};
static MoveControl InitialMove[NUM_CUBES] = {
 {0,0,80,0,0,0,0,0,-350},{0,0,80,0,0,0,0,0,-350},
 {0,0,80,0,0,0,0,0,-350},{0,0,80,0,0,0,0,0,-350},
 {0,0,80,0,0,0,0,0,-350},{0,0,80,0,0,0,0,0,-350},
 {0,0,80,0,0,0,0,0,-350},{0,0,80,0,0,0,0,0,-350},
 {0,0,80,0,0,0,0,0,-350},{0,0,80,0,0,0,0,0,-350},
 {0,0,80,0,0,0,0,0,-350},{0,0,80,0,0,0,0,0,-350}, };
/* Face colors for various cubes */
static int Colors[NUM_CUBES][NUM_CUBE_FACES] = {
 {15,14,12,11,10,9},{1,2,3,4,5,6},{35,37,39,41,43,45},
 {47,49,51,53,55,57},{59,61,63,65,67,69},{71,73,75,77,79,81},
 {83,85,87,89,91,93},{95,97,99,101,103,105},
 {107,109,111,113,115,117},{119,121,123,125,127,129},
 {131,133,135,137,139,141},{143,145,147,149,151,153} };
/* Starting coordinates for cubes in world space */
static int CubeStartCoords[NUM_CUBES][3] = {
 {100,0,-6000}, {100,70,-6000}, {100,-70,-6000}, {33,0,-6000},
 {33,70,-6000}, {33,-70,-6000}, {-33,0,-6000}, {-33,70,-6000},
 {-33,-70,-6000},{-100,0,-6000}, {-100,70,-6000}, {-100,-70,-6000}};
/* Delay counts (speed control) for cubes */
static int InitRDelayCounts[NUM_CUBES] = {1,2,1,2,1,1,1,1,1,2,1,1};
static int BaseRDelayCounts[NUM_CUBES] = {1,2,1,2,2,1,1,1,2,2,2,1};
static int InitMDelayCounts[NUM_CUBES] = {1,1,1,1,1,1,1,1,1,1,1,1};
static int BaseMDelayCounts[NUM_CUBES] = {1,1,1,1,1,1,1,1,1,1,1,1};

void InitializeCubes()
{

 int i, j, k;
 PObject *WorkingCube;

 for (i=0; i<NUM_CUBES; i++) {
 if ((WorkingCube = malloc(sizeof(PObject))) == NULL) {
 printf("Couldn't get memory\n"); exit(1); }
 WorkingCube->DrawFunc = DrawPObject;
 WorkingCube->RecalcFunc = XformAndProjectPObject;
 WorkingCube->MoveFunc = RotateAndMovePObject;
 WorkingCube->RecalcXform = 1;
 for (k=0; k<2; k++) {
 WorkingCube->EraseRect[k].Left =
 WorkingCube->EraseRect[k].Top = 0x7FFF;
 WorkingCube->EraseRect[k].Right = 0;
 WorkingCube->EraseRect[k].Bottom = 0;
 }
 WorkingCube->RDelayCount = InitRDelayCounts[i];
 WorkingCube->RDelayCountBase = BaseRDelayCounts[i];
 WorkingCube->MDelayCount = InitMDelayCounts[i];
 WorkingCube->MDelayCountBase = BaseMDelayCounts[i];
 /* Set the object->world xform to none */
 for (j=0; j<3; j++)
 for (k=0; k<4; k++)
 WorkingCube->XformToWorld[j][k] = INT_TO_FIXED(0);
 WorkingCube->XformToWorld[0][0] =
 WorkingCube->XformToWorld[1][1] =
 WorkingCube->XformToWorld[2][2] =
 WorkingCube->XformToWorld[3][3] = INT_TO_FIXED(1);
 /* Set the initial location */
 for (j=0; j<3; j++) WorkingCube->XformToWorld[j][3] =
 INT_TO_FIXED(CubeStartCoords[i][j]);
 WorkingCube->NumVerts = NUM_CUBE_VERTS;
 WorkingCube->VertexList = CubeVerts;
 WorkingCube->NumFaces = NUM_CUBE_FACES;
 WorkingCube->Rotate = InitialRotate[i];
 WorkingCube->Move.MoveX = INT_TO_FIXED(InitialMove[i].MoveX);
 WorkingCube->Move.MoveY = INT_TO_FIXED(InitialMove[i].MoveY);
 WorkingCube->Move.MoveZ = INT_TO_FIXED(InitialMove[i].MoveZ);
 WorkingCube->Move.MinX = INT_TO_FIXED(InitialMove[i].MinX);
 WorkingCube->Move.MinY = INT_TO_FIXED(InitialMove[i].MinY);
 WorkingCube->Move.MinZ = INT_TO_FIXED(InitialMove[i].MinZ);
 WorkingCube->Move.MaxX = INT_TO_FIXED(InitialMove[i].MaxX);
 WorkingCube->Move.MaxY = INT_TO_FIXED(InitialMove[i].MaxY);
 WorkingCube->Move.MaxZ = INT_TO_FIXED(InitialMove[i].MaxZ);
 if ((WorkingCube->XformedVertexList =
 malloc(NUM_CUBE_VERTS*sizeof(Point3))) == NULL) {
 printf("Couldn't get memory\n"); exit(1); }
 if ((WorkingCube->ProjectedVertexList =
 malloc(NUM_CUBE_VERTS*sizeof(Point3))) == NULL) {
 printf("Couldn't get memory\n"); exit(1); }
 if ((WorkingCube->ScreenVertexList =
 malloc(NUM_CUBE_VERTS*sizeof(Point))) == NULL) {
 printf("Couldn't get memory\n"); exit(1); }
 if ((WorkingCube->FaceList =
 malloc(NUM_CUBE_FACES*sizeof(Face))) == NULL) {
 printf("Couldn't get memory\n"); exit(1); }
 /* Initialize the faces */
 for (j=0; j<NUM_CUBE_FACES; j++) {
 WorkingCube->FaceList[j].VertNums = VertNumList[j];

 WorkingCube->FaceList[j].NumVerts = VertsInFace[j];
 WorkingCube->FaceList[j].Color = Colors[i][j];
 }
 ObjectList[NumObjects++] = (Object *)WorkingCube;
 }
}






[LISTING NINE]

; 386-specific fixed point multiply and divide.
; C near-callable as: Fixedpoint FixedMul(Fixedpoint M1, Fixedpoint M2);
; Fixedpoint FixedDiv(Fixedpoint Dividend, Fixedpoint Divisor);
; Tested with TASM 3.0.
 .model small
 .386
 .code
 public _FixedMul,_FixedDiv
; Multiplies two fixed-point values together.
FMparms struc
 dw 2 dup(?) ;return address & pushed BP
M1 dd ?
M2 dd ?
FMparms ends
 align 2
_FixedMul proc near
 push bp
 mov bp,sp
 mov eax,[bp+M1]
 imul dword ptr [bp+M2] ;multiply
 add eax,8000h ;round by adding 2^(-16)
 adc edx,0 ;whole part of result is in DX
 shr eax,16 ;put the fractional part in AX
 pop bp
 ret
_FixedMul endp
; Divides one fixed-point value by another.
FDparms struc
 dw 2 dup(?) ;return address & pushed BP
Dividend dd ?
Divisor dd ?
FDparms ends
 align 2
_FixedDiv proc near
 push bp
 mov bp,sp
 sub cx,cx ;assume positive result
 mov eax,[bp+Dividend]
 and eax,eax ;positive dividend?
 jns FDP1 ;yes
 inc cx ;mark it's a negative dividend
 neg eax ;make the dividend positive
FDP1: sub edx,edx ;make it a 64-bit dividend, then shift
 ; left 16 bits so that result will be in EAX
 rol eax,16 ;put fractional part of dividend in

 ; high word of EAX
 mov dx,ax ;put whole part of dividend in DX
 sub ax,ax ;clear low word of EAX
 mov ebx,dword ptr [bp+Divisor]
 and ebx,ebx ;positive divisor?
 jns FDP2 ;yes
 dec cx ;mark it's a negative divisor
 neg ebx ;make divisor positive
FDP2: div ebx ;divide
 shr ebx,1 ;divisor/2, minus 1 if the divisor is
 adc ebx,0 ; even
 dec ebx
 cmp ebx,edx ;set Carry if remainder is at least
 adc eax,0 ; half as large as the divisor, then
 ; use that to round up if necessary
 and cx,cx ;should the result be made negative?
 jz FDP3 ;no
 neg eax ;yes, negate it
FDP3: mov edx,eax ;return result in DX:AX; fractional
 ; part is already in AX
 shr edx,16 ;whole part of result in DX
 pop bp
 ret
_FixedDiv endp
 end






[LISTING TEN]

/* POLYGON.H: Header file for polygon-filling code, also includes
 a number of useful items for 3D animation. */
#define MAX_OBJECTS 100 /* max simultaneous # objects supported */
#define MAX_POLY_LENGTH 4 /* four vertices is the max per poly */
#define SCREEN_WIDTH 320
#define SCREEN_HEIGHT 240
#define PAGE0_START_OFFSET 0
#define PAGE1_START_OFFSET (((long)SCREEN_HEIGHT*SCREEN_WIDTH)/4)
#define NUM_CUBE_VERTS 8 /* # of vertices per cube */
#define NUM_CUBE_FACES 6 /* # of faces per cube */
/* Ratio: distance from viewpoint to projection plane / width of
 projection plane. Defines the width of the field of view. Lower
 absolute values = wider fields of view; higher values = narrower */
#define PROJECTION_RATIO -2.0 /* negative because visible Z
 coordinates are negative */
/* Draws the polygon described by the point list PointList in color
 Color with all vertices offset by (X,Y) */
#define DRAW_POLYGON(PointList,NumPoints,Color,X,Y) \
 Polygon.Length = NumPoints; Polygon.PointPtr = PointList; \
 FillConvexPolygon(&Polygon, Color, X, Y);
#define INT_TO_FIXED(x) (((long)(int)x) << 16)
#define DOUBLE_TO_FIXED(x) ((long) (x * 65536.0 + 0.5))

typedef long Fixedpoint;
typedef Fixedpoint Xform[3][4];
/* Describes a single 2D point */

typedef struct { int X; int Y; } Point;
/* Describes a single 3D point in homogeneous coordinates; the W
 coordinate isn't present, though; assumed to be 1 and implied */
typedef struct { Fixedpoint X, Y, Z; } Point3;
typedef struct { int X; int Y; int Z; } IntPoint3;
/* Describes a series of points (used to store a list of vertices that
 describe a polygon; each vertex is assumed to connect to the two
 adjacent vertices; last vertex is assumed to connect to first) */
typedef struct { int Length; Point * PointPtr; } PointListHeader;
/* Describes the beginning and ending X coordinates of a single
 horizontal line */
typedef struct { int XStart; int XEnd; } HLine;
/* Describes a Length-long series of horizontal lines, all assumed to
 be on contiguous scan lines starting at YStart and proceeding
 downward (used to describe a scan-converted polygon to the
 low-level hardware-dependent drawing code) */
typedef struct { int Length; int YStart; HLine * HLinePtr;} HLineList;
typedef struct { int Left, Top, Right, Bottom; } Rect;
/* Structure describing one face of an object (one polygon) */
typedef struct { int * VertNums; int NumVerts; int Color; } Face;
typedef struct { double RotateX, RotateY, RotateZ; } RotateControl;
typedef struct { Fixedpoint MoveX, MoveY, MoveZ, MinX, MinY, MinZ,
 MaxX, MaxY, MaxZ; } MoveControl;
/* Fields common to every object */
#define BASE_OBJECT \
 void (*DrawFunc)(); /* draws object */ \
 void (*RecalcFunc)(); /* prepares object for drawing */ \
 void (*MoveFunc)(); /* moves object */ \
 int RecalcXform; /* 1 to indicate need to recalc */ \
 Rect EraseRect[2]; /* rectangle to erase in each page */
/* Basic object */
typedef struct { BASE_OBJECT } Object;
/* Structure describing a polygon-based object */
typedef struct {
 BASE_OBJECT
 int RDelayCount, RDelayCountBase; /* controls rotation speed */
 int MDelayCount, MDelayCountBase; /* controls movement speed */
 Xform XformToWorld; /* transform from object->world space */
 Xform XformToView; /* transform from object->view space */
 RotateControl Rotate; /* controls rotation change over time */
 MoveControl Move; /* controls object movement over time */
 int NumVerts; /* # vertices in VertexList */
 Point3 * VertexList; /* untransformed vertices */
 Point3 * XformedVertexList; /* transformed into view space */
 Point3 * ProjectedVertexList; /* projected into screen space */
 Point * ScreenVertexList; /* converted to screen coordinates */
 int NumFaces; /* # of faces in object */
 Face * FaceList; /* pointer to face info */
} PObject;

extern void XformVec(Xform, Fixedpoint *, Fixedpoint *);
extern void ConcatXforms(Xform, Xform, Xform);
extern int FillConvexPolygon(PointListHeader *, int, int, int);
extern void Set320x240Mode(void);
extern void ShowPage(unsigned int);
extern void FillRectangleX(int, int, int, int, unsigned int, int);
extern void XformAndProjectPObject(PObject *);
extern void DrawPObject(PObject *);
extern void AppendRotationX(Xform, double);

extern void AppendRotationY(Xform, double);
extern void AppendRotationZ(Xform, double);
extern near Fixedpoint FixedMul(Fixedpoint, Fixedpoint);
extern near Fixedpoint FixedDiv(Fixedpoint, Fixedpoint);
extern void InitializeFixedPoint(void);
extern void RotateAndMovePObject(PObject *);
extern void InitializeCubes(void);
extern int DisplayedPage, NonDisplayedPage, RecalcAllXforms;
extern int NumObjects;
extern Xform WorldViewXform;
extern Object *ObjectList[];
extern Point3 CubeVerts[];


















































March, 1992
PROGRAMMER'S BOOKSHELF


It Takes a Great Deal of History to Produce a Little Literature {*} {*} Henry
James, Life of Nathaniel Hawthorne




Ray Duncan


The history of computing in general, and of personal computing in particular,
is one of my little hobbies, and I'm fairly compulsive about buying books on
this topic whenever they appear. The trade literature is surprisingly sparse
in this area, and the scholarly literature not much better--not too
surprising, I guess, because the intact, working relics of the pioneering era
are even less common. It's pathetic how many of the artifacts of early digital
computing have already been consigned to the junkpiles and paper shredders, in
spite of people such as Gwen Bell at the Boston Computer Museum who have been
making an earnest effort in recent years to recover and safeguard some of the
most precious machines, program listings, and documentation. I'm as guilty of
this heedless destruction of our heritage as anyone. When I owned my first
Imsai 8080 computer, I had a copy of the source code for Gates and Allen's
original 4K ROM Basic, and I threw it away when I sold the Imsai because I
couldn't imagine ever needing it!
The two trade books of choice, if you are interested in this subject at all,
are Steven Levy's Hackers and Paul Freiberger and Michael Swaine's Fire in the
Valley. Both are vivid, fluently written, predominantly accurate accounts, but
have a somewhat different emphasis: Hackers begins with MIT's Tech Model
Railroad Club, the artificial research research lab of John McCarthy, SpaceWar
and semilegendary, compulsive, social-misfit programmers such as Bill Gosper
and Richard Greenblatt. About halfway through the book, Hackers arrives at the
Silicon Valley Homebrew Computer Club and personalities such as Lee
Felsenstein, Jim Warren, Bob Albrecht, Dennis Allison, Ted Nelson, Bill Gates,
Stephen Wozniak, and Steven Jobs--which is pretty much where Fire in the
Valley picks up the story. Fire in the Valley is a relatively practical book
with detailed coverage of early microcomputing retailing (such as it was),
business relationships, operating systems, and "productivity software."
Hackers spends much more time talking about the Apple and Atari game empires
of companies such as Sierra On-line, and hearkens back fondly to a golden age
of "The Hacker Ethic" which, I suspect, existed mai ly in a few idealistic
people's imaginations--much like the Camelot of John F. Kennedy.
The sheer talent of some of the characters in Hackers and Fire in the Valley
can strike awe into your heart, and the sheer melodrama of Hackers may bring a
smile to your lips, but both books are rather sad as well. So few of the
original programmers, engineers, and entrepreneurs portrayed by Levy,
Freiberger, and Swaine have been sufficiently well-rounded, broadly educated,
and financially savvy to maintain control over their own inventions, their own
companies, or even their personal destinies. The fate of Richard Stallman, one
of McCarthy's hacker proteges, is a depressing example. Stallman, widely
admired as a virtuoso coder for his EMACS editor, is apparently content to
remain an angry iconoclast and spend the rest of his life tilting at
windmills. At present, he is furiously rewriting a 20-year-old operating
system so that he can give it away to spite AT&T--while the rest of the world
moves on to new operating system architectures, new programming paradigms, and
new user interfaces.
More Details.
Robert Slater's Portraits in Silicon contains 31 thumbnail sketches of
computer hardware and software pioneers. The book is heavily weighted toward
engineers, mathematicians, and physicists, beginning with Charles Babbage, but
it does include a healthy cross-section of latter-day microcomputer
programmers as well. Portraits in Silicon is worth its purchase price for the
chapter on Konrad Zuse alone. Zuse is a brilliant German engineer who
evidently deserves the title of inventor of the general-purpose, programmable
digital computer at least as much as Atanasoff, Mauchly, or Eckert. Zuse built
his first working machine in 1936-1938, and his fourth machine--constructed in
Berlin on a shoestring budget during 1943-1945--eventually found a home in the
Technical Hochschule in Zurich, Switzerland and remained in use well into the
1950s. Unfortunately for Zuse, but fortunately for the Allies, the German
government was oblivious to the importance of Zuse's work, and residuals of
his designs do not survive in any of the mainstream computer architectures of
today.
Slater's book, while valuable, must be read with caution for several reasons.
First, Slater, who is a Time magazine correspondent, seems to have virtually
no understanding of the technical implications of his material. Most of the
time, due to the excellence of his technical reviewers, he gets away with it,
but occasional paragraphs on hardware specifics are conspicuously confused.
Second, when the stories provided by his interviewees are in obvious conflict,
Slater doesn't want to hurt anyone's feelings or step on anyone's toes, so the
reader ends up with no clear idea of where the truth actually lies. Third, the
historical accuracy of the entire book is thrown into question by blatant
deficiencies in the microcomputer-related chapters. For example, in the
biography of Bill Gates, Slater writes:
IBM asked Gates to design the operating system for the new machine--what would
become the incredibly popular PC...Gates went to work on what would become
MS-DOS--Microsoft Disk Operating System. He filed his report personally in
Boca Raton in September, and in November he got the contract... Gates chose a
small room in the middle of Microsoft's offices in the old National Bank
building in Bellevue, and got to work.
Incredibly, Slater fails to describe the role of Tim Paterson, Seattle
Computer Products, or 86-DOS in the genesis of MS-DOS 1.0--even though this
role is common knowledge throughout the industry. For that matter, he doesn't
even mention any of the Microsoft employees that were involved with the first
few releases of MS-DOS: Bob O'Rear, Chris Peters, Chris Larson, Aaron
Reynolds, and Mark Zbikowski, among others.
Susan Lammers's book, Programmers at Work, must surely be one of the most
readable, sympathetic books about software developers ever published. Lammers
was not interested in muckraking or confrontation, but rather made it her goal
to explore the personalities, attitudes, and work habits behind some of the
great personal computing products of the mid-eighties. There is no pretense at
objectivity or fact-finding in this book--the interviewees were basically
allowed to take the discussions in any direction they wished and edit the
transcripts to present themselves in the most favorable light--but the book is
quite revealing nonetheless. Several of the chapters literally reek of hubris
(I'll let you discover these for yourself), while others--such as the
interviews with Gates, Carr, and Hertzfeld--are unexpectedly disarming.
One of the more intriguing features of Lammers's book is the inclusion of
design jottings or source code by the various programmers she talked to. We
see pages from the manuscript from the original Visicalc user manual by Dan
Bricklin, notes to himself by Robert Carr on the data structures underlying
Framework, excerpts from the source code for 8080 Basic and a couple of
amateurish articles from the MITS user newsletter by Bill Gates, a
full-fledged animation demonstration program for the Macintosh by Andy
Hertzfeld, an elegant page concerning character grey-scaling from one of John
Warnock's notebooks, and a PDP-10 assembly-language program for compiling
wirelists by Charles Simonyi--using Hungarian notation, of course! A lot of
water has passed over the dam since Susan Lammers put this book together in
1986, but it's still a good investment of your time and money.
Accidental Empires, by Robert X. Cringely, is the very antithesis of
Programmers at Work. Accidental Empires is classic yellow
journalism--unsubstantiated anecdotes, mangled facts, inflammatory
speculation, and half-baked freshman psychology all masquerading as history.
This book could just as well have been named Fear and Loathing in Cupertino!
The chief targets of Cringely's vitriol are Bill Gates and Steve Jobs,
although a good many other industry luminaries, ranging from Donald Knuth to
John Warnock, receive glancing swipes of the claws as well. This is not to say
that the book isn't entertaining--entertainment and making money are, after
all, the primary goals of yellow journalism--or even that it doesn't contain
some valuable insights. But Accidental Empires is ultimately doomed to
irrelevance because its author, or authors, didn't have the integrity to offer
their interpretations of our industry under their true names. A pity.


Five Views of the Same Event




From Fire in the Valley


Then the Popular Electronics article came out. Gates' friend Paul Allen ran
through Harvard Square with the article to wave it in front of Gates' face and
say, "Look, it's going to happen! I told you this was going to happen! And
we're going to miss it!" Gates had to admit that his friend was right; it sure
looked as though the "something" they had been looking for had found them. He
immediately phoned MITS, claiming that he and his partner had a BASIC language
usable on the Altair. When Ed Roberts, who had heard a lot of such promises,
asked Gates when he could come to Albuquerque to demonstrate it, Gates looked
at his childhood friend, took a deep breath, and said, "Oh, in two or three
weeks." Gates put down the receiver, turned to Allen and said: "I guess we
should go buy a manual." They went straight to an electronics shop and
purchased Adam Osborne's manual on the 8080.


From Hackers


Not long before MITS began shipping Altairs to computer-starved Popular
Electronics readers, Ed Roberts had gotten a phone call from two college
students named Paul Allen and Bill Gates. The two teenagers hailed from
Seattle. Since high school the two of them had been hacking computers... The
Altair article, while not impressing them technically, was exciting to them:
it was clear microcomputers were the next big thing, and they could get
involved in all the action by writing BASIC for this thing. They had a manual
explaining the instruction set for the 8080 chip, and they had the Popular
Electronics article with the Altair schematics, so they got to work writing
something that would fit in 4K of memory. Actually, they had to write the
interpreter in less than that amount of code, since the memory would not only
be holding their program to interpret BASIC into machine language, but would
need space for that program that the user would be writing. It was not easy,
but Gates in particular was a master at bumming code, and with a lot of
squeezing and some innovative use of the 8080 instruction set, they thought
they'd done it. When they called Roberts, they did not mention they were
placing the call from Bill Gates' college dorm room. Roberts was cordial, but
warned them that others were thinking of an Altair BASIC; they were welcome to
try, though. "We'll buy from the first guy who shows up with one," Roberts
told them.


From Accidental Empires


Like the Buddha, Gates' enlightenment came in a flash. Walking across Harvard
Yard while Paul Allen waved in his face the January 1975 issue of Popular
Electronics announcing the Altair 8800 microcomputer from MITS, they both saw
instantly that there would really be a personal computer industry and that the
industry would need programming languages. Although there were no
microcomputer software companies yet, 19-year-old Bill's first concern was
that they were already too late. "We realized that the revolution might happen
without us," Gates said. "After we saw that article, there was no question of
where our life would focus."
"Our life?" What the heck does Gates mean here--that he and Paul Allen were
joined at the frontal lobe, sharing a single life, a single set of
experiences? In those days, the answer was "yes." Drawn together by the idea
of starting a pioneering software company and each convinced that he couldn't
succeed alone, they committed to sharing a single life--a life unlike that of
most other PC pioneers because it was devoted as much to doing business as to
doing technology. Gates was a businessman from the start; otherwise, why would
he have been worried about being passed by? There was plenty of room for
high-level computer languages to be developed for the fledgling platforms, but
there was only room for one first high-level language. Anyone could
participate in a movement, but only those with the right timing could control
it.


From Portraits in Silicon


After Gates entered Harvard in the fall of 1973, Allen challenged him to
developed a BASIC interpreter for the Intel 8008, but Gates soon decided that
the 8080 instruction set was not powerful enough for BASIC. Allen next urged
that they start a microcomputer firm. The two had already spent $360 to
purchase one of the very first microcomputer chips. The turning point in their
young careers came when they read the January 1975 issue of Popular
Electronics. The Altair microcomputer, based on the 8080 chip, made by an
Albuquerque, New Mexico firm called MITS, and selling for $350, appeared on
the cover. Allen was the first to see the article. He noticed a copy of the
magazine at the newsstand and hastily tracked down Gates. Here was the first
truly cheap computer! Allen ran through Harvard Square waving the article in
front of Gates, issuing a friendly warning that the train was leaving, and if
the two of them didn't get to work, they would not be aboard. Gates' problem
was whether to stick to his present studies in pursuit of the legal career his
parents wished for him, or give full attention to computers. The latter won
out; the two young men wanted to make sure they wouldn't miss what was
happening. "We realized," Gates recalls, "that the revolution might happen
without us. After we saw that article, there was no question of where our life
would focus." Allen proposed to Gates that the two try to write a BASIC--the
simple, high-level computer programming language--for the Altair. At least one
minicomputer firm had insisted that it was impossible to write a high-level
language that would run on a personal computer. But the two young men wanted
to give it a try. They informed MITS of their plan.


From Programmers at Work (Gates speaking to Lammers)



The really great programs I've written have all been ones that I have thought
about for a huge amount of time before I ever wrote them. I wrote a BASIC
interpreter for a minicomputer in high school. I made massive mistakes in that
program, and then I got to look at some other BASIC interpreters. So by the
time I sat down to do Microsoft BASIC in 1975, it wasn't a question of whether
I could write the program, but rather a question of whether I could squeeze it
into 4K and make it super fast... Paul Allen had brought me the magazine with
the Altair, and we thought, "Geez, we'd better get going, because we know
these machines are going to be popular." And that's when I stopped going to
classes and we just worked around the clock. The initial program was written
in about three and a half weeks. We ended up spending about eight weeks before
I had it fully polished the way that I really liked it. And then I later went
back and rewrote it. No great programmer is sitting there saying, "I'm going
to make a bunch of money," or "I'm going to sell a hundred thousand copies."
Because that kind of thought gives you no guidance about the problems. A great
programmer is thinking, Should I rewrite this whole subroutine so that four
people, instead of three, could call it? Should I make this program ten
percent faster? Should I really think through what the common case in here is
so I know how to order this check?





























































March, 1992
PRINTING FROM WINDOWS 3


Writing an abort procedure




Michael J. Young


Michael is an author of computer programming books and a Microsoft Windows
developer. He can be reached at 20 Sunnyside Avenue, Suite A, Mill Valley, CA
94941, or through CompuServe, ID 75156.2572.


Although the final results can be gratifying, writing code to print under
Microsoft Windows is not a simple task. One of the most confusing elements,
and yet one of the most important, is the "abort procedure." This article
explains why a Windows print routine needs to have an abort procedure, and
describes exactly what goes on during a print job when an abort procedure has
been installed. It also provides the code for a sample abort procedure.


Windows Printing


Under Windows, printing is similar to sending output to a window. You first
obtain a handle to a device context for the printer; you can then display text
or graphics on the printer using the standard Windows GDI (Graphics Device
Interface) functions, such as TextOut or Ellipse. When printing, however, you
also need to manage the print job by making calls to the Windows Escape
function, which provides an entire family of printer services; you specify the
particular service you want by passing a code as the second parameter.
The basic steps for printing under Windows are as follows:
1. Call CreateDC to obtain a handle to a device context for the printer.
2. Initiate the print job by calling Escape, passing the STARTDOC function
code.
3. Display the desired text or graphics on the page by calling standard
Windows GDI functions, passing the handle obtained in step 1.
4. When the page is complete, call the NEWFRAME Escape function to signal
Windows to begin printing the page.
5. Repeat steps 3 and 4 for any additional pages in the document.
6. Terminate the print job by calling the ENDDOC Escape function.
7. Call DeleteDC to release the printer device context.
When you make the NEWFRAME Escape call, the Windows GDI creates one or more
temporary print files on disk, based upon the output commands you have issued,
and activates the Windows Print Manager. The Print Manager copies each of the
print files to the printer, and deletes each file after it has been copied.
There is a serious problem, however, with the printing procedure just
outlined. A typical Windows print job can be lengthy, during which time,
Windows message processing is blocked. The ramifications of this are that: 1.
The program performing the printing cannot process mouse or keyboard input
messages, so there is no way for the user to signal the program to stop the
print job once it has begun; and 2. other Windows programs are prevented from
running. The user, therefore, cannot accomplish useful work in other
applications while printing is taking place. Furthermore, the Print Manager
itself is unable to run and cannot begin copying print files to the printer
until the program completes the entire printing routine and returns to
retrieve another message. As a result, the generation of the printed copy is
delayed. Also, the GDI may run out of disk space for storing print files
(which can be quite large). Because the Print Manager is not allowed to run,
it cannot free disk space by deleting the print files that it has already
copied to the printer.
Figure 1 illustrates the basic flow of program control during printing. As you
can see in this figure, messages cannot be dispatched and other programs
cannot run until the entire print routine is finished.
The ideal solution to these problems would be to have the Windows GDI
periodically call the program during processing of the NEWFRAME Escape, so
that the program could process messages and yield control to other
applications on a regular basis. Fortunately, there is a way to do exactly
that: Install an abort procedure.


The Abort Procedure


An abort procedure is a function you define within your program. If you
install an abort procedure before initiating the print job, the GDI will call
it periodically while processing any subsequent NEWFRAME Escape calls.
The abort procedure given here has the form: BOOL FAR PASCAL AbortFunc (HDC
HPrnDC, short Code);. The function AbortFunc represents the function name; you
can choose any name you want. The HPrnDC parameter is the handle to the
printer device context. The Code parameter indicates whether or not the GDI
has run out of disk space for creating temporary print files. It will have one
of two values: 0 if there is sufficient disk space, or SP_OUTOFDISK if the GDI
has run out of disk space but more disk space could be made available by
allowing the Print Manager to run. If the abort procedure returns a nonzero
value, the print job will continue. If it returns 0, the print job will be
cancelled.


A Sample Abort Procedure


The abort procedure given here is designed to allow other programs to run
during the print job, and also to permit the user to stop the print job at any
time by clicking a button displayed in the program window, or by pressing the
Esc key. The definition for the abort procedure, AbortProc, is contained in
Example 1.
Example 1: The definition for the abort procedure

 BOOL FAR PASCAL AbortProc (HDC HPrnDC,short Code)
 {
 MSG Msg;

 while (!UserAbort && PeekMessage (&Msg,NULL,O,O,PM_REMOVE))
 {
 TranslateMessage (&Msg);
 DispatchMessage (&Msg);
 }

 return (!UserAbort);

 } // end AbortProc

The heart of the abort procedure example is a message loop, which extracts and
dispatches messages until the program's message queue is empty. To avoid
blocking the print job, the procedure calls PeekMessage--which returns
immediately even if no message is available--rather than GetMessage. When
there are no more program messages, PeekMessage returns FALSE, which causes
the abort procedure to return control to the GDI so that printing can
continue.
PeekMessage is one of the Windows functions that yields control to other
programs in the system. Therefore, each time the GDI calls the abort procedure
during the print job, other applications, including the Print Manager, are
allowed to run. The user can therefore work within other programs while
printing continues in the background. Also, the Print Manager can start
copying print files to the printer--and then deleting these files--while the
GDI is still processing print commands, accelerating the generation of printed
output and alleviating any lack of disk space. (Consequently, the abort
procedure need not perform any special action if the Code parameter indicates
an out-of-disk-space condition.)
Also, because the abort procedure dispatches all program messages (by calling
DispatchMessage), the program can continue to receive input, allowing the user
to terminate the print job by clicking the "Cancel Printing" button, or by
pressing Esc.
Figure 2 illustrates the main flow of program control during printing with the
abort procedure installed. Compare this illustration with Figure 1. Notice
that the abort procedure returns a value based upon the current value of
UserAbort, which is a global program variable. It is set to FALSE immediately
before the print job starts; if the user clicks the "Cancel Printing" button
or presses Esc, the appropriate message-handling routine within the program's
window procedure must set it to TRUE. The abort procedure returns TRUE if
UserAbort is FALSE, so that the print job keeps running; it returns FALSE if
UserAbort is TRUE, causing the GDI to terminate printing of the current page
and to return immediately from the NEWFRAME Escape call. Also, if UserAbort is
TRUE, the abort procedure immediately terminates the message loop to provide a
speedy response to the user's input.
The abort procedure is called from within the Windows environment, so you must
export it by including its name in the list of functions following the EXPORTS
statement in the linker definition file.


Supporting Code for the Abort Procedure


Example 2 contains the general code for installing the sample abort procedure
and for managing the print job. This code should be contained within the
branch of the main window procedure that processes the program's print
command.
Example 2: The code for installing the sample abort procedure and for managing
the print job

 HWND HAbortButton;
 HANDLE HInst;
 HDC HPrnDC;
 BOOL InPrintJob = FALSE;
 FARPROC PtrAbortProc;
 BOOL UserAbort;
 . . .
 if (InPrintJob)
 return (NULL);
 HPrnDC = GetPrnDC ();
 ShowWindow (HAbortButton,SW_SHOW);
 UserAbort = FALSE;
 InPrintJob = TRUE;
 PtrAbortProc = MakeProcInstance (AbortProc,HInst);
 Escape (HPrnDC, SETABORTPROC, 0, (LPSTR)PtrAbortProc,NULL);
 Escape (HPrnDC, STARTDOC, 10, (LPSTR) "Print Demo", 0L);

 // Calls to GDI output functions, such as TextOut and Ellipse go here.
 Escape (HPrnDC, NEWFRAME,0,0L,0L);
 Escape (HPrnDC, ENDDOC,0,0L,0L);
 FreeProcInstance (PtrAbortProc);
 InPrintJob = FALSE;
 ShowWindow (HAbortButton,SW_HIDE);
 DeleteDC (HPrnDC);

Before starting the print job, the global flag InPrintJob is set to TRUE, and
after the print job it is set back to FALSE. (It is initialized to TRUE.) This
flag can be used throughout the program to determine whether a print job is in
progress. Remember that the abort procedure dispatches program messages, and
therefore the window procedure may be invoked during printing. Whether or not
a print job is in progress affects the way certain messages must be processed.
For example, the print routine in Example 2 returns immediately if InPrintJob
is TRUE to prevent starting another print job while one is already in
progress. As another example, the window procedure should process the WM_CLOSE
message, destroying the window only if InPrintJob is FALSE.
Before starting the print job, the routine calls ShowWindow to display the
"Cancel Printing" push-button control that allows the user to abort the print
job. This button should have been created--without making it visible--by
calling CreateWindow within the program instance initialization routine. The
routine calls ShowWindow again after printing to hide the button, as it is no
longer needed.
Before starting the print job, the routine sets UserAbort to FALSE. This flag
should be set to TRUE by the window procedure if the user clicks the "Cancel
Printing" button or presses Esc during printing.
The Windows MakeProcInstance function is called to obtain the procedure
instance address of the abort procedure, AbortProc. After the print job, this
address (stored in the pointer PtrAbortProc) is released by calling
FreeProcInstance. Note that the call to FreeProcInstance must come after the
ENDDOC Escape call.
The SETABORTPROC Escape function is called to install the abort procedure.
Note that this function must be passed the procedure instance address of the
abort procedure obtained in the previous step. The SETABORTPROC Escape call
must be made before initiating the print job with the STARTDOC Escape.
For simplicity, the example listing does not check error return codes. In
general, if the Escape function fails, it returns a negative error code, and
the print routine should exit immediately, without calling the ENDDOC Escape.
(The GDI automatically aborts the print job.) Note that--contrary to the SDK
documentation--if the abort procedure returns FALSE to abort the print
routine, the NEWFRAME Escape function does not return an error code (unless an
actual error occurred or the Print Manager is not installed). If the print
routine processes more than one page, it should therefore perform the
following actions immediately after each call to the NEWFRAME Escape. If the
Escape function returns an error code, terminate immediately without calling
the ENDDOC Escape, otherwise, check the UserAbort flag. If it is TRUE, call
the ENDDOC Escape and terminate immediately, without attempting to print
additional pages.


Conclusion


The examples in this article assumed that printing takes place directly from
the program window; you could use a similar approach to print from a modal
dialog box, displaying the "Cancel Printing" button within this box and
processing input messages received during printing within the dialog
procedure.
To enhance the routines presented here, you could display an hourglass cursor
while the mouse pointer is within the program window. To do this, simply
process the WM_MOUSEMOVE message, setting the cursor to an hourglass (by
calling SetCursor) only if InPrintJob is TRUE.


_PRINTING FROM WINDOWS 3_
by Michael J. Young


[Example 1]

BOOL FAR PASCAL AbortProc (HDC HPrnDC,short Code)
 {
 MSG Msg;

 while (!UserAbort && PeekMessage (&Msg,NULL,0,0,PM_REMOVE))
 {
 TranslateMessage (&Msg);
 DispatchMessage (&Msg);
 }
 return (!UserAbort);

 } // end AbortProc


[Example 2]

HWND HAbortButton;
HANDLE HInst;
HDC HPrnDC;
BOOL InPrintJob = FALSE;
FARPROC PtrAbortProc;
BOOL UserAbort;
 .
 .
 .
if (InPrintJob)
 return (NULL);
HPrnDC = GetPrnDC ();
ShowWindow (HAbortButton,SW_SHOW);
UserAbort = FALSE;
InPrintJob = TRUE;
PtrAbortProc = MakeProcInstance (AbortProc,HInst);
Escape (HPrnDC,SETABORTPROC,0,(LPSTR)PtrAbortProc,NULL);
Escape (HPrnDC,STARTDOC,10,(LPSTR)"Print Demo",0L);

// Calls to GDI output functions, such as TextOut and Ellipse go here.
Escape (HPrnDC,NEWFRAME,0,0L,0L);
Escape (HPrnDC,ENDDOC,0,0L,0L);
FreeProcInstance (PtrAbortProc);
InPrintJob = FALSE;
ShowWindow (HAbortButton,SW_HIDE);
DeleteDC (HPrnDC);

















March, 1992
OF INTEREST





The Quantasm Floating Point Library (QFPL) is an assembly language library
from Quantasm for performing general floating-point operations. Though
reentrant and ROMable for 8Ox86 embedded systems, the library is also suitable
for adding floating-point capabilities to MS-DOS applications requiring speed
and compactness. QFPL allows you to add floating-point capabilities to
assembly language programs or replace C library emulator routines. It is
faster and smaller than most C floating-point libraries.
Additional QFPL features include: single-and double-precision IEEE floating
point format; the ability to convert to and from ASCII; the ability to convert
to and from integers and long integers; customizable error handling; and the
ability to add, subtract, multiply, divide, negate, and find absolute values
and powers of ten.
Zoran Milenovic of Zoran Computing in Chicago told DDJ, "QFPL is unbelievably
fast--much faster than C libraries--and also small. And that's all that's
important."
QFPL's price is $99.95; with source code, $199.95. Reader service no. 20.
Quantasm Corp. 19855 Stevens Creek Blvd. Cupertino, CA 95014 408-244-6826
NNET-EVAL is a new NeuralTool technology platform evaluation package from
D.I.P. The NeuralTool platform includes software and hardware that uses neural
nets targeted for industrial control applications. Applications which require
user expertise to set up and tune conventional controls can take advantage of
a neural net's ability to learn from the measured system dynamics. In some
applications, neural nets have been applied to systems which cannot be
successfully defined or controlled using traditional approaches.
NNET-EVAL is written in C++ and includes a set of neural network object
classes that will perform closed loop control over user-defined analog control
points. The user may control setpoints, limits, and learning coefficients and
load or store network profiles. A graphical trending package is included to
provide a visual indication of performance. NNET-EVAL can be linked with
user-supplied I/O interfaces, thus allowing the neural network to monitor and
control factory floor devices. A hardware evaluation kit is also available
from D.I.P. for testing sensors and actuators in a neural net environment.
The price of NNET-EVAL is $895. Reader service no. 21.
D.I.P. P.O. Box 9550 Moreno Valley, CA 92552 714-924-1730
Soft-ICE/W, a systems and applications debugger for Windows' enhanced mode, is
now shipping from Nu-Mega Technologies. Soft-ICE/W can debug applications,
device drivers, and virtual device drivers for Windows and applications, TSRs,
and loadable device drivers for DOS, all at the source level, separately or
concurrently. Soft-ICE/W provides hardware-style break points on memory
locations, memory ranges, I/0 port accesses, and interrupts.
The debugger features back trace ranges for tracking down UAEs and system
crashes. The back trace range logs every program instruction executed in a
programmer-specified range, allowing you to single-step backward to find the
exact location of the problem.
Soft-ICE/W retails for $386. Reader service no. 22.
Nu-Mega Technologies Inc. P.O. Box 7780 Nashua, NH 03060-7780 603-889-2386
Interactive Software Engineering has announced that MS-DOS, OS/2, and Atari
TOS versions of the Eiffel language are now available. The new release,
Eiffel/S, was developed by SIG Computer (Braunfels, Germany) and is a compiler
for Eiffel that translates to C source code. The current version implements
Eiffel 3 and features a set of libraries, basic classes, I/O classes,
persistence classes, and data structures.
Eiffel/S runs on 286, 386, and 486 machines with at least 2 Mbytes of RAM. The
DOS version sells for $595, OS/2 for $1200, and Atari TOS for $185. Reader
service no. 23.
Interactive Software Engineering 270 Storke Road, Suite #7 Goleta, CA 93117
805-685-1006
The InControl Toolbox for Windows, a set of DLLs that allow you to include
professional input validation for Windows 3 applications, has been released by
MantaSoft Partners. InControl Toolbox defines 13 new classes of controls for
Windows. There are two controls to display the time and date, and 11 input
controls for controlling the information entered by the user. The input
controls include formatted and numeric controls, which allow entry of zip
codes, phone numbers, social security numbers, dates, integers, floating-point
numbers, and dollar amounts. A third group, the free format controls, accept
any input and try to determine whether that input corresponds to a legal
value.
InControl Toolbox retails for $179; with source code it costs $249. Reader
service no. 24.
MantaSoft Partners P.O. Box 203551 Austin, TX 78720 512-335-3497
Now shipping from TaraVisual is apE III, a UNIX-based visualization and
animation technology for converting computer data into 2- and 3-D
representations. apE III facilitates transformation of scientific, business,
or medical data into photo-realistic images and animations. Applications
include computational fluid dynamics, MRI and CAT scan imaging, plasma
physics, theoretical modeling, atmospheric analysis, terrain mapping, creative
design, and real-time animations.
The environment is icon based, with an enhanced user interface. Interactive
data manipulation and coloring features allow patterns to stand out for
analysis; new modules simplify the description of data types, formats, and
grids for importing data; and new axis and labeling features with fonts are
included.
Marty Lobdell of the Ohio Supercomputer Center in Columbus, Ohio, told DDJ
that she chose apE III "because of its superior visualization capabilities and
simple method of transferring data to the still store for taping, or to the
Solitaire for slides and photos."
apE III costs $1495, or $3795 with source code. Academic discounts are
available. Reader service no. 25.
TaraVisual Corp. 929 Harrison Ave. Columbus, OH 43215 800-458-8731
Quintus has announced a Prolog runtime generator that allows you to run
applications developed on UNIX and VAX workstations on 386 and 486 machines.
The generator includes a basic development system with a 7-port debugger and
an incremental compiler; a Prolog compiler; and a link editor, which links the
compiled files into a standard 32-bit extended DOS executable. Quintus's
Prolog runtime generator is integrated with Watcom C and Microsoft Windows
(through the C interface). It has a 32-bit address space that supports virtual
memory up to 32 Mbytes.
Each generator costs $4000. Reader service no. 26.
Quintus Corp. 2100 Geng Road, Suite 101 Palo Alto, CA 94303 415-813-3800
Version 2.0 of rtNET is now available from Project:artie. rtNET is a toolkit
that allows Microsoft Basic programmers to access all Novell Netware API
functions. Applications can be created that include the following
capabilities: reading and updating bindery; attaching and detaching from
servers, queues, and printers; finding and listing users on a network; sending
messages; determining redirection lists; examining and controlling printer
queues; getting server statistics; determining the status of NetBIOS, SHARE,
and IPX; changing passwords; and more.
Complete source code for all routines and several Netware-related utilities
are included. rtNET costs $195 and is compatible with QuickBasic 4.5, PDS 7.0,
and PDQ 3.0. Reader service no. 27.
Project:artie 3232 McKinney, Suite 1100 Dallas, TX 75204-2429 214-871-3102
WindEXE, an optimization tool for Windows 3.0 development, has been released
by IC Technologies. WindEXE uses the operational signature of code to modify
code segmentation in such a way that the number of intersegment calls is
reduced and the segments themselves are of uniform size. These changes improve
an application's disk swapping capabilities, and thus, its performance.
WindEXE does not require source code modification on the part of the
programmer and supports Borland C++ and Turbo Pascal for Windows, Microsoft C
and Fortran, and TopSpeed C, C++, Modula-2, and Pascal. WindEXE is priced at
$350. Reader service no. 28.
IC Technologies Corp. 6400 Riverside Drive, Building E Dublin, OH 43017
614-798-1091
Software Development Systems has released FreeForm (5.1), a C symbolic
debugging environment for ROMable 68000 series applications. FreeForm allows
you to incorporate the file system and tools of your host PC or UNIX system
directly with your debugging sessions. The debugging process continues while
you perform other tasks, such as editing source files, recompiling, or sending
e-mail.
Five key commands allow you to choose the program to debug; set a breakpoint;
start a program or continue it from a breakpoint; read the value of a
variable; and write the value of a variable.
FreeForm is available for UNIX and DOS platforms. Prices begin at $1795.
Reader service no. 29.
Software Development Systems Inc. 4248 Belle Lane Downers Grove, IL 60515
800-448-7733
Library Technologies has added EMS/XMS/virtual memory capabilities to C-Heap,
its library for Microsoft C and Borland C/C++ compilers. Over 50 new functions
allow you to use EEMS 3.2, LIM 4.0, XMS 2.0, and virtual memory without a 64K
limit on memory block size. There is an automatic swapping mechanism to
relieve the programmer of memory management tasks. The functions use EMS, XMS,
or disk space, according to availability, but you can specify certain
memory-allocation requirements if you wish.
Virtual memory I/0 is through up to four disk caches, the allocation of which
you may specify in order to increase speed. With high-level functions, "dirty
bits" can be cleared to avoid writes to EMS, XMS, or virtual memory when the
data hasn't been changed, thus reducing execution time.
All functions are compatible with malloc( ), and C-Heap supports all but the
tiny memory model. The price is $199, or $399 with source code. Reader service
no. 30.
Library Technologies P.O. Box 56031 Madison, WI 53705-9331 800-767-4214 or
608-274-4224
New from Software Ingenuities is Style, a general-purpose C++ class library.
Style manages all associations and links between C++ objects. The
implementation is transparent to the programmer and allows you to develop
applications quickly. You specify all classes and the associations between
them with a few declarations. Bidirectional links between C++ objects are
built using a small set of functions. Style manages the objects in a
RAM-resident database.
With a single call, you can access distantly linked objects; collect objects
that meet certain criteria; destroy selected objects; save and load objects
from disk files; print selected objects or links between them; and search for
a specific object.
Traversal functions navigate through the object database and perform
pre-written or programmer-defined operations on the objects. Because traversal
functions separate locating objects from performing operations on them, they
are easily portable between applications.
Style supports Borland and Zortech C++; prices start at $250. Reader service
no. 31.
Software Ingenuities Inc. P.O. Box 1586 Ballwin, MO 63022 314-391-7772
TurboPower Software has released Object Professional for C++ (OPC), a library
of text-mode user interface objects. A straight port of the objects from
Object Professional for Turbo Pascal, OPC includes text editors, dialog boxes,
scrolling data-entry screens, and help systems. OPC features ready-made user
interfaces and built-in calls that allow you to customize its behavior. OPC's
utilities support interactive design and testing, and they automatically
generate source code.
Object Professional for C++ costs $249. Reader service no. 32.
TurboPower Software P.O. Box 49009 Colorado Springs, CO 80949-9009
719-260-6641
Now available from Quinn-Curtis is the Huge Virtual Array and Numerical
Analysis Toolbox, which allows you to make use of all system memory available
when working with large numeric arrays. The toolbox uses low memory first,
then extended memory, and finally hard disk memory. The virtual array size is
limited only by the size of the hard disk.
Also included are numerical analysis tools with extensive matrix math support,
solutions of large systems of linear equations, curve fitting, multiple and
stepwise regression, Fourier analysis, Eigenvalues and Eigenvectors, and
general statistics.

The toolbox is available for Turbo Pascal, Turbo C, Borland C++, and Microsoft
C and Quick C. The price is $300. Reader service no. 33.
Quinn-Curtis 35 Highland Circle Needham, MA 02194 617-449-6155




























































March, 1992
SWAINE'S FLAMES


Lies and Videotape




Michael Swaine


Monday evening, 8:30 P.M., January 13, 1992. I'm on Third Street just outside
the Moscone Center in San Francisco. The gaudy tower of the Marriott hotel
squats like some Brobdingnagian jukebox over the city tonight. It fits. It's
showtime, and San Francisco has been invaded by the Hollywood virus of glitter
and glitz.
I'm in town for MacWorld Expo, one of two such expos that showcase Mac
products and technology each year. (Until last week there were three, in
Frisco, Beantown, and the Big Apple, but this year the Big Apple expo took the
Big Sleep.) The night before last I disguised myself as a guy who would wear a
tuxedo and helped hand out Eddy awards at a ceremony that looks more like the
Academy Awards show every year. Hollywood.
I could lie and tell you I'm writing this on a portable computer as I walk
down Third, my camcorder over my shoulder, ready to grab the gritty facts. I'd
only be lying a little; I've got the portable and the camcorder, but they're
both in the hotel just now. The governor of California can invent composite
poor people for his State of the State speech and the president of Apple
Computer can hire a stooge to put on a charade to show that it's easier to
make a Mac multimedia-ready than a PC. But those of us who report on the mean
streets of the new Hollywood are held to a higher standard than governors and
company presidents. Sort of like the movie producers are held to a higher
standard of verisimilitude than commissions appointed to investigate
assassinations.
The expo is lousy with multimedia-supporting software and hardware. The quick
path to multimedia on the Mac is QuickTime, Apple's system software
architecture for integrating and compressing sound, video, and animation.
QuickTime was released this week, and every multimedia-supporting app or board
I saw was taking that path.
There are over a hundred QuickTime-supporting products here at the show, from
existing apps that now recognize the QuickTime movie format and display movies
to video cards that save captured video in the QuickTime movie format. More
new multimedia tools have been coming out, mostly from Macromind-Paracomp, the
major multimedia company in the Mac universe (and with a growing presence in
the Windows universe). I find the Infini-D animation package particularly
dazzling, with its morphing capability. (Morphing is the smooth visual
transformation of one 3-D object into another, as in Terminator 2.) The
Premiere video editing software from Adobe and the VideoSpigot board from
SuperMac also look hot.
The multimedia theme of the show takes two unofficial tracks. Most of the
experts showing how to do multimedia are delivering pre-QuickTime advice. I
sat in on some of their sessions today and I'll sit in on more tomorrow and
Wednesday, hoping for some advice on what to do with the footage I'm
collecting. But most of the products claiming multimedia support are
QuickTime-compatible.
Yes, I have used the camcorder. The standard gag to throw at anyone carrying a
camcorder here at the show, I discover, is "Send me a copy of the QuickTime
movie." With no prompting, a journalist friend says she's been wowing 'em in
the office with her QuickTime movies. It's so easy, she says. It's the future
of multimedia on the Mac, I think.
But QuickTime will also be implemented on other platforms, according to Apple.
Earlier today, Scully demonstrated this with a prototype QuickTime
implementation for Windows, taking a disk out of a Mac, inserting it in to a
(486) Windows machine, and running a Mac-created movie on it without a hitch.
Apple stands a better chance of doing cutting-edge work for Windows than
Microsoft does of doing ditto for the Mac. Apple, not trusting the wall
between app and system groups at Microsoft, has stripped away Microsoft's
most-favored developer status. These days, Apple is about as likely to reveal
its plans to the company Apple employees often call the "Evil Empire" as Nina
Totenberg is to reveal her sources for the Clarence Thomas story.
I can reveal the source for the best Clarence Thomas stories: Jean-Louis
Gassee. He's collecting the best; maybe when he's got enough he'll take them
to his publisher. Or to Hollywood.
I'm going to wait for the QuickTime movie.






































April, 1992
April, 1992
EDITORIAL


Born in the USA




Jonathan Erickson


The clamor to "Buy American" is understandable, especially when heard from
out-of-work auto workers who've weathered a long winter and face a bleak
spring. I have less sympathy, however, when I hear it from CEOs who, with one
hand, are filling their pockets with megabuck bonuses (I mean, who do these
guys think they are--baseball players?), while passing out pink-slips with the
other.
But "Buy American" is a simple solution to a knotty problem--one I wouldn't
even try to analyze here even if I fully understood it. However, I do know
that "Buy American" isn't the remedy.
To illustrate, what's more all-American than driving to a movie, then stopping
for a hamburger afterwards? Well, if you drive a Mercury Tracer to a Cineplex
Odeon theater to watch a Columbia Pictures film before eating at Burger King,
you'll be driving a Mexican-assembled car to a Canadian-owned theater to watch
a Japanese-owned film and eating at a British-owned fast-food emporium. From
the pens that we write with (Bic's parent company is in France) to the ice
cream we eat on hot summer afternoons (Haagen-Dazs is British), it's getting
nigh impossible to find 100 percent "American-made." I'm not suggesting that
this is bad--I'm simply saying that somewhere up or down every food chain
there hangs a non-U.S. skeleton. In short, "Buy American" is a mythical
solution to an economic conundrum.
Because auto manufacturing is an overused example of typical American
businesses, let's look at a different sort of enterprise, this magazine for
instance.
For starters, DDJ is owned by M&T Publishing, the parent company of which is
Markt & Technik, one of the largest technical publishers in Germany. Monica
Berg, our Managing Editor, is also German--and she has the accent to prove it.
Technical Editor Ray Valdes was born in Bolivia and lived in six continents
before settling down in the U.S. Associate Editor Tami Zemel was born in
England, grew up in Kansas, and went to college in Israel, where she majored
in (naturally) French. My fling with internationalism involved a few years in
Canada that resulted in a college diploma, a son, and at least one case of
frostbite. Then there's Editor-at-large Michael Swaine, widely believed to
exist in Cyberspace. But to my mind the most "foreign" of all is Senior
Technical Editor Mike Floyd--a native Californian.
The magazine you're now reading was printed on paper made from Canadian trees,
the phototypesetting machines used to put the words on the paper are from
Germany, and the cameras used to shoot the photos on the cover are made in
Sweden and Holland. Nevertheless, the creativity behind DDJ and the principles
the magazine represents were born in the U.S. of A.
Slightly more than 20 percent of DDJ's readers reside outside the U.S. (see,
for instance, the "Letter" from reader Yuri Elik of St. Petersburg, Russia on
page 8), we often publish articles written by non-U.S. programmers, our
articles are commonly reprinted around the world, and there's even a Russian
edition of Dr. Dobb's Journal. When I recently logged on to M&T Online,
readers from both New Zealand and Hong Kong were online.
The international flavor of DDJ isn't unique within the software industry.
Walk into just about any software house in the U.S. and you'll be able to
speak with someone in a language other than English. Furthermore, development
tools for "internationalizing" application software are a coming thing.
My point is that protectionist principles don't work (if they ever did)
because the U.S. isn't (and never was) an economic island. The health of our
economic system depends on others, and they depend on us. If you doubt this,
ask General Electric, which last year recorded $2.6 billion in imports and
$8.16 billion in exports.
Much more so than processes such as automobile manufacturing or home building,
software development defies boundaries--technical, creative, or political. The
best way programmers can meet these challenges is by coming up with better
ideas, working smarter, and using the most innovative tools and techniques to
write the best software.


The GATT Gotcha


Continuing on the international front, negotiations edge forward with the
General Agreement on Tariffs and Trade (GATT), the phenomenally broad
international trade treaty that, if accepted, will among other things dictate
the rules governing software copyrights and patents.
The draft treaty covers all aspects of international trade, and will be
presented to the U.S. Senate as a package deal. Because major parts of the
treaty will be quite beneficial to the U.S., there's real pressure to accept
the whole enchilada. However, the draft requires all countries that accept the
agreement to have patents that cover software techniques. It would also rule
out any proposal in the U.S. to protect software from patents or revamp the
patent system.
Thus, sweeping changes in U.S. intellectual property law will be forced upon
us and cast in stone, without any consideration by the House of
Representatives, and without an opportunity for the Senate to consider them
individually on their merits.
Chew on that the next time you bite into a British burger.





























April, 1992
LETTERS







C the Light


Dear DDJ,
I read your interview with Tom Pittman ("Programming Paradigms," January 1992)
and I'm amazed by the people who don't like C. Tom says, "I saw an interview
with one of the designers of C that said C was not intended for large
projects. Absolutely right." I think he was referring to the quote by Dennis
Ritchie in the September, 1990 issue of BYTE where Ritchie said, "One of the
basic criticisms that can be made of the [C] language is that it doesn't help
much in doing large projects (i.e., it was not designed with big monolithic
programs in mind). It was designed in an environment of multiple small
programs that interact only by fairly restrictive means."
I'm not sure he was responding to what the designers meant. Mr. Ritchie told
me the following in personal communication:
One might also say that the real problems with large monolithic projects is
that they are so monolithic, and we need to find better ways of cutting them
into pieces, and that this problem is not closely related to any particular
programming language.
Nevertheless, I have to admit that C was not designed with an eye towards
writing very large programs; for example, it doesn't encourage careful control
over name visibility and modularization. It's fine for writing limited-size
tools that interact in an environment like UNIX, using extra-language
mechanisms like file I/O. But very large single programs suffer from the
problem of "change one #include file, recompile everything" -- which is a real
problem if recompilation takes a day.
One of the contributions I think we made was to demonstrate how far the small
program/tool approach could go. But it doesn't necessarily cover everything;
there really seem to be gigantic programs that people have to write.
I really think we have work to do when we consider what "large" means. Large
is in the eye of the individual's scope and talents. Studies have shown the
quantity of lines of code differ by a factor of 5 between 12 experienced
programmers. [Editor's Note: see Sackman, H., W.J. Erikson, and E.E. Grant,
"Exploratory Experimental Studies Comparing Online and offline Programming
Performance," CACM, 11, 1 (January, 1968).] This implies the programmer(s) is
enormously important when figuring our size.
Mr. Pittman's attitude is, "Why should I use a brain-dead language like C?"
I've found C very easy to understand, modify, write, and maintain quality code
with. (It is very difficult to do anything with bad C code.) Programmers with
outstanding intellects program in this "brain-dead language." C does not
impose restraints on the programmer; it lets the programmer do things the
computer is capable of.
Marty Leisner
Rochester, New York


Time 2 Review CTime2


Dear DDJ,
I read Kenneth Roach's "Using the Real-Time Clock" (DDJ, June 1991) with
interest because I like to peek inside the PC and understand how the hardware
works. I often work on time-related applications, and I didn't have all those
details about RTC status registers at hand, so this topic was especially
important to me. Thank you, Kenneth.
But alas! There is nothing perfect on earth. While reading the listings, I
found, to my surprise, that the CTime2 function reads RTC ports, even though
it must "emulate C language ctime( ) function." The point is that ctime( )
must only convert supplied values, irrespective of current time. On rereading
the code, I found that values taken from ports are not used, except from the
Bias variable, which is used for leap-year processing. (I didn't try this
code, but it seems to me that it wouldn't always work right, even when used to
convert current date/time.)
So I've written my own ctime2( ) and tested it in Microsoft's QuickC 2.01 and
C 6.0 as well as Borland Turbo C++ and Borland C++. I calculate time and day
of week the same way Mr. Roach did. Then I determine the number of leap years
between 1970 and a given date and delete all February 29s. This makes it
possible to pretend leap years don't exist (considering February 29 a special
case), which dramatically simplifies calculations. Day and month can also be
found much more easily than in CTime2 using arrays containing numbers of days
from the beginning of the year to the beginning of the each month.
I was thinking about writing Pascal equivalents, but I was too lazy--and
besides, I think C code is much more interesting because it shows usage of C
time-zone and daylight variables. To my surprise, I found that my function
turned out to be much faster than its library prototype, as you can see in
Table 1.
Table 1: Yuri's ctime2() versus ctime() test results

 QuickC
 ctime2() called 20586 times
 ctime() called 6543 times
 ctime2() 315% faster than ctime()

 Borland C++
 ctime2() called 7624 times
 ctime() called 2707 times
 ctime2() 282% faster than ctime

 Turbo C
 ctime2() called 7660 times
 ctime() called 6673 times
 ctime2() 345% faster than ctime()

 MSC 6.0
 ctime2() called 22987 times
 ctime() called 2706 times
 ctime2() 284% faster than ctime()

I found it curious that in these experiments, Microsoft compilers (even
QuickC) generated code that ran faster than Borland's. To try to understand
why, I used Borland's Turbo Profiler to determine that lines performing long
arithmetic take more than the half of ctime2( ) execution time. (I was
profiling the code generated by Borland C++.) When I changed test programs,
removing as much long arithmetic as possible, Borland compilers showed
significant improvement. Moreover, when debugging my ctime2( ), I found a
minor error in the Turbo C++ library ctime( ): It didn't correctly determine
the beginning and end of the daylight-savings period. This error was corrected
in Borland C++.
Yuri Elik
St. Petersburg, Russia
Editor's note: The source code to Yuri's implementation of CTime2 is available
electronically.



Are the Emperor's New Threads Really New?


Dear DDJ,
Judging from your magazine, the object-oriented paradigm is now firmly
entrenched in the minds, if not the hearts, of "professional" programmers.
Back in the days of yore, when reusable code was called a library, I accepted
the concept and the value of code reusability as being intuitively obvious.
Now that the concept has been elevated to the level of an abstract art, I'm
not sure anymore. From the humble perspective of a "nonprofessional"
programmer struggling to keep pace with the ultramodern programming
vernacular, much of the object-oriented paradigm appears to be a polymorphic
artifice inherited from the marketing departments of large software
corporations that are repackaging their old methods in an attempt to
encapsulate the resources of a large class of the computing community. OOPS,
did I say something bad?
Ernie Deel
Atlanta, Georgia


Generic Swap Macro


Dear DDJ,
I am a student at the Colorado School of Mines, and just got into C
programming about a year ago. I was thumbing through my old issues and came
across C language question #36: "How can I write a generic macro to swap two
values?"
Throughout my short exposure to C, this has been one of the most frequently
asked questions. Unfortunately, the answer has always been something like,
"There is no good way to write a generic swap macro." Usually one is told to
write a swap macro for a specific data type such as integers and floats, or to
forget a macro altogether. I have always felt that with all the tools that C
provides, there should be a way to write such a macro. The problem is that
swapping requires a temporary variable, and different data types occupy
different amounts of memory. After some work, I realized what is needed is an
inline variation of the memcpy function. See the macro in Example 1 for my
approach.
Example 1: Greg Renzelman's generic swap macro

 #define SWAP (a,b)\
 {\
 unsigned int i;\
 char tmp, *aptr=(char *) & (a), *bptr=(char *) & (b);\
 for (i=0; i<sizeof((a)); ++i, ++aptr, ++bptr) \
 {tmp=*aptr; *aptr=*bptr; *bptr=tmp;} \
 }

With this single macro, I was able to swap integers, floats, pointers, arrays,
and structures. The only restriction I see this macro has is that the elements
to be swapped occupy the same amount of memory. The two elements to be swapped
don't even have to be of the same data type.
I would appreciate hearing any ideas and comments about my idea and
implementation of a generic swap macro.
Greg Renzelman
Golden, Colorado


When One Pixel is Better than Two


Dear DDJ,
I needed a good circle-drawing algorithm, and I remembered there had been one
in DDJ. Looking in my back issues, I found the article by Tim Paterson,
("Circles and the DDA," July 1990), as well as an improvement by V.
Venkataraman ("Letters," July 1991). However, after implementing both versions
I had problems. It seems that no effort was made to eliminate the drawing of
the same pixel twice. This becomes quite a problem when you are XORing a
circle onto the screen. In addition, the pixel plotting can be speeded up
significantly.
Looking at the previous implementations of the plot8 function, you can see
that it calls the plot function eight times with the x-y coordinates in all
eight combinations of +,-, and reversed. Unfortunately, the plot function
performs the aspect ratio scaling. As a result, all four combinations of the
coordinates (x,y, x,-y, -x,y, and -x,-y) are scaled. This is unnecessary, as
only the sign has changed in each different case, not the magnitude. Notice
that this is also true for the four combinations of y,x.
This is how I handled aspect scaling in my version of the plot8 function:
First, two new variables are needed. This allows us to swap the x and y values
efficiently. I chose the obvious: x1, y1, x2, and y2. Since the SetAspect
function uses a scaling value of 65,536 to avoid floating-point
multiplication, I was able to gain a small speed improvement by casting
XAspect and YAspect as (int) when I check to see if scaling is applied.
I eliminated the gaps in round ellipses (circles) by plotting only one pixel
in cases where xn or yn is 0, as well as those where x is equal to y. I chose
to compare x2 and y2, as the addressing is more efficient in my C compiler
(Borland C++).
Now true circles are running quicker, and the gaps have been eliminated, but
there are still ellipses to contend with. When drawing an ellipse, some pixels
near the "skinny" ends of the ellipse will be drawn twice. Fortunately, these
are easy to catch as they result from the scaling, and will occur in
sequential fashion. It is only necessary to keep track of the previous
"scaled" pixels to ensure there are no gaps. I then added static variables
x_last and y_last. It might be better to make these global variables, or place
them in the circle function.
At this point, my changes to the circle function are basically complete.
However, one more issue needs to be addressed. When drawing real circles to a
screen on a 80x86 machine, rather than just testing the speed of algorithms,
you have the video adapter to contend with. Since the PC only has a 64K
footprint to draw in, the video adapters use various registers to control
video "modes." The OUT instruction (used to set the register value) takes up
quite a few CPU cycles, especially if the mode registers are SET and RESET for
each pixel. Additionally, most video adapters are still "on the bus." This
eats up an enormous number of cycles for each OUT to the controller register,
and for each pixel drawn. We can't get around this, but we can design our code
to use the least number of OUT instructions.
I also changed the circle function by adding a call to setGCregs immediately
before the main loop and a call to resetGCregs immediately after. These
functions set and reset the Graphics Controller registers on my VGA board.
This caused some trickle down into the rest of my code. You can see in plot8
where the XOR command is now missing from my SetPixel function.
Table 2 shows the execution times of my five circle drawing programs. I used a
16-MHz 386SX PC with a Cardinal VGA732a running in 800x600x16 mode for my
tests. In reality, only the timings for circle1, circle4, and circle5 should
be compared, as the other two are intermediate steps. Notice that in going
from circle1 (the benchmark) to circle4, where all gaps are removed and plot8
is improved, we only managed a 1.33 times improvement in speed. It was only
when we improved the video adapter register management that we realized the
2.46 times total speed improvement in circle5.
Table 2: Comparison of Robert Stewart's circle-drawing routines

 circle1 57.62 sec benchmark
 circle2 44.05 sec 1.31xbenchmark
 circle3 44.10 sec 1.31xbenchmark
 circle4 43.00 sec 1.34xbenchmark
 circle5 23.13 sec 2.49xbenchmark

One last thing. The first time I ran circle5, it took over 24 seconds. I was
able to reduce it to 23 seconds just by reducing bus memory wait states and
bus I/O waits. The timings for the other circle programs were similarly
improved, as shown in Table 2.
Robert W. Stewart
El Segundo, California
Editor's note: Robert's plot8 source code and executables are available
electronically.



Forth Fans


Dear DDJ,
In Jack Woehr's "Forth: A Status Report" (October, 1991), I read that the 0=
vs. NOT issue is still a hot one.
I have always thought it to be counter-intuitive to state "0=IF" instead of
"NOT IF" when I want to either do something on a FALSE condition or handle the
FALSE condition in an IF...ELSE... construct first. I circumvented the issue
by providing the words -IF, -UNTIL, and -WHILE, which can be implemented as :
-IF (? --) COMPILE 0= [COMPILE] IF ; IMMEDIATE, or more efficiently, when one
can change the nucleus, as: -IF (? --) COMPILE -BRANCH >MARK; IMMEDIATE, where
-BRANCH is the "negative" counterpart of ?BRANCH.
In this area there is a second pitfall. When you want to do something, and two
values are nonzero, "AND IF" will not do the trick (e.g., "1 2 AND" will give
FALSE). One of the logical TRUEs has to be converted to "all bits set" with
"0<> AND," where 0<> is implemented as:<> (? -- ?) 0= 0= ; (or more
efficiently with an in-nucleus solution.
Dick Burggraaff
Kwintsheul, The Netherlands
Jack responds: It is indeed as Mr. Buggraaff states: The semantic ambiguity of
NOT caused in the past many Forth programmers to hand-roll unambiguous system
extensions. It is this apparently irreconcilable ambiguity in the minds of the
community which caused X3J14 to omit NOT from the draft proposed Standard and
to leave in its place(s) the words 0= (boolean) and INVERT (bitwise).
And now to the concern expressed by William Higinbotham ("Letters," February
1992) that the required word set for an ANS Standard Forth system be of
minimal size is most enthusiastically shared by the ANS X3/X3J14 Technical
Committee. It is for this reason that the definition of an "ANS Forth
Programming System" states that:
An ANS Forth System implements the entire Core word set as defined in this
document. An ANS Forth system may also provide words from any of the optional
word sets and extensions words sets, provided these words behave as described.
[X3J14 dp-ANS2 4.1 (emphasis added)]
Mr. Higinbotham says, "I felt the standard should be on the Core Word Set."
This is precisely the effect of the proposed Standard. I apologize if my
article left this matter unclear, and thank Mr. Higinbotham for his comment.


Sound Off


Dear DDJ,
This is in response to Theron Wierenga's January, 1992 letter ("Looking for
Free Speech"). I have tested a couple of speech-via-PC-speaker programs. They
are SW-TALK, by Orlando Dare (CSS Inc., 3005 Glenmore Ave., Baltimore, MD
21214) and the December, 1991 Windows/DOS Developer's Journal article,
"Writing for the PC Speaker," by Robert Bybee (5011 Broughham Ct., Stone
Mountain, GA 30087).
Both perform pretty much the same on my system, although implementation may
differ somewhat. (Dare is a bit tight-lipped about how he does it, but Bybee
reveals all.) On my system, voice reproduction using either of them contains
about the same amount of noise and distortion (readily noticeable), but speech
is generally intelligible.
You really need a digital-to-analog converter in front of an amplifier/
speaker combination. The Speech Thing (COVOX Inc., 675-D Conger St., Eugene,
OR 97402) produces clear speech with no noise or distortion. It even reads
your text files to you in a pleasant, inflected voice. I highly recommend it.
Incidentally, Theron's use of the term "AM" is inappropriate; AM (amplitude
modulation) refers to a method of impressing a radio-frequency carrier with
audio (hence "AM radio"). I assume when he says "AM signal," he really means
"analog audio" -- a bit redundant, as "digital audio" is really pulse-width
modulation by audio.
He's right when he says that pulse-width modulation can (in principle) do a
pretty good job of producing speech even when fed directly to a loudspeaker;
but without a D-A converter, speech quality is highly dependent on the
electro-mechanical characteristics for the speaker. Let's hope that the next
generation of personal computers has a D-A converter ahead of a decent
speaker!
Homer B. Tilton
Tucson, Arizona
Dear DDJ,
A few months ago I met a problem similar to Theron Wierenga's. I was wondering
whether it was possible to port a SUNSPARC speech record coded with an 8-bit
u-law format to the PC and let the better-than-none PC speaker play. In the
beginning, I intended to use the same approach Theron did, but I could not
find a solution.
The solution I finally found was this: Port 0x42 of PC is a frequency counter
connected to the speaker. Instead of software filtering and generating 0s and
1s, you may as well let this job be done by hardware. All you need is to load
the 8-bit AM value into this port and you are hands-free.
I know this is not the exact answer Theron wants, but I think it may be the
ultimate solution given a bare PC without any add-on hardware.
Patrick Shu-Pui Ko
Hong Kong
Editor's note: Additional sources of PC sound that were brought to our
attention are Real Sound from Access Software (4910 W. Amelia Erhart, Salt
Lake City, UT 84116), TurboSound from Silicon Shack (5120 Campbell Ave., Suite
112, San Jose, CA 95130; 408-446-4521), and The Audio Solution (P.O. Box 11688
Clayton, MO 63105; 314-567-0267).


Hamming Had it First


Dear DDJ,
In the February 1992 "Programmer's Bookshelf" column, Andrew Schulman implies
that Claude E. Shannon was the inventor of error detecting and correcting
codes. It is a common misunderstanding. Although Shannon and Marcel J.E. Golay
were both early researchers in the field, credit is clearly due to Richard
Hamming of Bell Laboratories, whose first memorandum on the subject was dated
July 27, 1947. Those who doubt may verify this in the first chapter of Thomas
M. Thompson's From Error-correcting Codes Through Sphere Packings to Simple
Groups (The Carus Mathematical Monographs, Volume 21).
Darren Allen
Atlanta, Georgia


UART Musings


Dear DDJ,
Part of Jeff Duntemann's column is nowadays devoted to UART programming. What
a splendid subject! Many of us have at one time or another spent a lot of time
finding out what all the various registers are supposed to do. The endless
books one had to go through...
One time, while testing a program, I noticed that the transmission of
characters could stop for some unknown error. After contact with the
distributor, I received a leaflet pointing out some of the problems I ran
into. It concerns Application note 493 from National Semi-conductor entitled,
"A comparison of the INS8250, NS16450, and NS16550A series of UARTs." It gives
a number of problems and suggests solutions for them. With it, some, but
certainly not all of my problems were solved.
I'd be grateful if Jeff were to cover these issues in a forthcoming columns.
Alan Vlieger
Hillegom, The Netherlands


/* Elik Listing 1: datetime.h

Header file for ctime2.c


*/

#ifndef DATE_TIME_H #define DATE_TIME_H

#define SECS_PER_MIN 60
#define SECS_PER_HOUR (SECS_PER_MIN*60)
#define SECS_PER_DAY (SECS_PER_HOUR*24L) /* long constant! */
#define HOURS_PER_DAY 24
#define DAYS_PER_WEEK 7
#define DAYS_PER_YEAR 365
#define LEAP_PERIOD (365*4+1)

#define BASE_YEAR 1970
#define UNTIL_1980 (365*10+2) /* 01-01-1970 to 01-01-1980 */
#define UNTIL_FEB29 (365*2+31+29-1) /* 01-01-1970 to 29-02-1972 */
#define UNTIL_2000 (365*30+7+31+28-1) /* 01-01-1970 to 28-02-2000 */
#define THURSDAY 4
#define APRIL (90-1)
#define OCTOBER (273-1)

/*
Similar structure (struct tm) is defined in <time.h>, but I prefer to use my
own - I think it makes code clearer. (And, also, my structure is slightly
shorter).
*/ struct date_time {
short yr, /* year */
 mo, /* month (0-11) */
 dy, /* day */
 dw, /* day of week (0-6) */
 hr, /* hours */
 mn, /* minutes */
 sc; /* seconds */
};

typedef int boolean; typedef unsigned short word; typedef unsigned long dword;

char *ctime2 (const time_t *time); time_t dt2time (struct date_time *dtp);

#endif


/* Elik Listing 2: ctime2.c

Correcte versio o ctime b U.ELI  DD #17 Jun 1991 
'Usin th Rea-Tim Clock b Kennet Roach p.26 listing o 
p.88 )

*/

#include <stdio.h> #include <time.h> #include "datetime.h"

static word nearest_sunday (word days_yr, word year);

/*
Replacement to C library ctime() function. Working more than 300% (QuickC
2.01) faster than ctime().
*/ char *ctime2 (const time_t *tptr) {

static char time_str[26] = "www mmm dd HH:MM:SS yyyy\n"; static char
*months[12] = {
"Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov",
"Dec"
}, *wdays[7] = {

"Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat" }; static unsigned short
month_days[] = {
0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334, 365 };

struct date_time dt; /* we must fill this structure
and then convert it to string */ dword time; word days; register char *chptr;
register word i;

time = *tptr - timezone; /* because C time() function returns GMT value */
days = (word)(time / SECS_PER_DAY); /* days since 1970 */

if (days < UNTIL_1980)
return NULL; /* on dates before 1980, ctime() returns NULL */

/* first, calculating time - it's simple */ time %= SECS_PER_DAY; /* seconds
this day */ dt.hr = (word)(time / SECS_PER_HOUR); i = (word)(time %
SECS_PER_HOUR); /* seconds this hour */ dt.mn = i / SECS_PER_MIN; dt.sc = i %
SECS_PER_MIN;
/* day of week - it's simple too */ dt.dw = (days + THURSDAY) %
DAYS_PER_WEEK;

/* then calculating date - it's little bit more complex */ i = days -
UNTIL_FEB29; /* days since Feb 29 1972 */ days -= (i / LEAP_PERIOD + 1); /*
delete all Feb 29s */ /* now we can calculate everything as if leap years
doesn't exist */

dt.yr = days / DAYS_PER_YEAR + BASE_YEAR; if ((i % LEAP_PERIOD) == 0) { /*
today Feb 29th special case */
dt.mo = 1; /* Feb */ dt.dy = 29;
} else {
days %= DAYS_PER_YEAR; /* days this year */ if (
daylight /* if a daylight-saving-time zone is specified */

&& days >= nearest_sunday (
dt.yr>1986 ? APRIL+1 : APRIL+24, dt.yr) && days < nearest_sunday (OCTOBER+25,
dt.yr) /* and this day belongs to summer period */ /* (1980-1986: Apr 24 - Oct
25 */ /* after 1986: Apr 1 - Oct 25) */
) {
/* then it's one hour later */ if (++dt.hr == HOURS_PER_DAY) {
days++; dt.dw = (dt.dw+1) % DAYS_PER_WEEK; dt.hr = 0;
} } for (i=1; month_days[i] <= days; i++)
; dt.mo = --i; /* months must be 0 - 11 */ dt.dy = days - month_days[i] + 1;
}

/* now fill the string */

/* To use sprintf() is the simpliest way (for programmer, of course),
but it's executing VERY slowly, especially in QuickC and MSC. Code below may
look cumbersome, but it's MUCH quicker.

sprintf (time_str, "%3s %3s %02d %02d:%02d:%02d %04d\n",
wdays[dt.dw], months[dt.mo], dt.dy, dt.hr, dt.mn, dt.sc, dt.yr); */

chptr = wdays[dt.dw]; time_str[0] = chptr[0]; time_str[1] = chptr[1];
time_str[2] = chptr[2];

chptr = months[dt.mo]; time_str[4] = chptr[0]; time_str[5] = chptr[1];
time_str[6] = chptr[2];

time_str[8] = (char)(dt.dy / 10 + '0'); time_str[9] = (char)(dt.dy % 10 +
'0');

time_str[11] = (char)(dt.hr / 10 + '0'); time_str[12] = (char)(dt.hr % 10 +
'0');

time_str[14] = (char)(dt.mn / 10 + '0'); time_str[15] = (char)(dt.mn % 10 +
'0');

time_str[17] = (char)(dt.sc / 10 + '0'); time_str[18] = (char)(dt.sc % 10 +
'0');

time_str[20] = (char)(dt.yr / 1000 + '0'); i = dt.yr % 1000;
time_str[21] = (char)(i / 100 + '0'); i %= 100;
time_str[22] = (char)(i / 10 + '0'); time_str[23] = (char)(i % 10 + '0');

return time_str; }

/*
Returns nearest Sunday after (and including) given date. <days_yr> must be
AFTER February!

*/ static word nearest_sunday (word days_yr, word year) { word dwk, /* day of
week (0 - 6) */
yr=year-1970; /* years since 1970 */

dwk = ((days_yr + yr * DAYS_PER_YEAR + (yr+2) / 4) /* days since 1970 */
- 1 + THURSDAY) % DAYS_PER_WEEK; return days_yr + ((DAYS_PER_WEEK-1)-dwk);
}

/* Elik Listing 3: compiler.h
Macros to determine used compiler.

*/

#ifdef _QC
#define COMPILER "QuickC" #else
#ifdef _MSC_VER
#define COMPILER "MSC" #else
#ifdef __BORLANDC__
#define COMPILER "BorlandC++" #else
#ifdef __TURBOC__
#define COMPILER "TurboC" #else
#define COMPILER "Unknown compiler" #endif
#endif #endif
#endif

/* Elik Listing 4: cmpctim1.c

Test no.1 for ctime2() function by U.ELIK

*/

#include <stdio.h> #include <time.h> #include "datetime.h" #include
"compiler.h"

#define TEST_TICKS (5*18+1)

static unsigned long far *tickptr = (unsigned long far *)0x0000046C;

unsigned long test (char * (*fun) (const time_t * )) { unsigned long count=0,
start; time_t t;

time (&t); start = *tickptr; while (*tickptr == start); for
(start+=(TEST_TICKS+1); *tickptr < start; t++) {
(*fun) (&t); count++;
} return count;
}

unsigned long percent (unsigned long c1, unsigned long c2) { unsigned long
ret;

c1 *= 100; ret = c1 / c2; if (c1 % c2 >= 50)
ret++; return ret;
}

void main (int argc, char *argv[]) { unsigned long n1, n2;

printf ("\nTesting %s code...\n", COMPILER); printf ("ctime2() called %ld
times\n", n2=test (ctime2)); printf ("ctime() called %ld times\n", n1=test
(ctime)); if (n1>n2)
printf ("ctime() %ld%% faster than ctime2()\n", percent (n1, n2)); else
printf ("ctime2() %ld%% faster than ctime()\n", percent (n2, n1)); }

/* Elik Listing 5: cmpctim2.c

Test no.2 for ctime2() function by U.ELIK


*/

#include <stdio.h> #include <time.h> #include "datetime.h" #include
"compiler.h"
#define TEST_TICKS (5*18+1)

static unsigned long far *tickptr = (unsigned long far *)0x0000046C;

unsigned test (char * (*fun) (const time_t * )) { unsigned count=0; unsigned
long start; time_t t;

time (&t); start = *tickptr; while (*tickptr == start); for
(start+=(TEST_TICKS+1); *tickptr < start; ) {
(*fun) (&t); count++;
} return count;
}

unsigned percent (unsigned long c1, unsigned long c2) { unsigned ret;

c1 *= 100; ret = (unsigned)(c1 / c2); if (c1 % c2 >= 50)
ret++; return ret;
}

void main (int argc, char *argv[]) { unsigned n1, n2;

printf ("\nTesting %s code...\n", COMPILER); printf ("ctime2() called %u
times\n", n2=test (ctime2)); printf ("ctime() called %u times\n", n1=test
(ctime)); if (n1>n2)
printf ("ctime() %u%% faster than ctime2()\n", percent (n1, n2)); else
printf ("ctime2() %u%% faster than ctime()\n", percent (n2, n1)); }





































April, 1992
THE FAST WAVELET TRANSFORM


Beyond Fourier transforms


 This article contains the following executables: WAVELET.ARC


Mac A. Cody


Mac is a digital systems manager for Sunair Electronics Inc. in Ft.
Lauderdale, Fla. He designs modems for shortwave radios using DSP and embedded
C programming. He can be contacted on GENIE MAIL as M.CODY1.


The Fourier transform has been used as a reliable tool in signal analysis for
many years. Invented in the early 1800s by its namesake, Jean-Baptiste-Joseph
Fourier, the Fourier transform has become a cornerstone of modern signal
analysis.
The Fourier transform has proven incredibly versatile in applications ranging
from pattern recognition to image processing. Nevertheless, it suffers from
certain limitations. Recently, a new kind of transform, the wavelet transform,
has been shown to be as powerful and versatile as the Fourier transform, yet
without some of its limitations.
The wavelet transform is the result of work by a number of researchers.
Initially, a French geophysicist, Jean Morlet, came up with an ad hoc method
to model the process of sound waves traveling through the earth's crust.
Unlike Fourier analysis, he did not use sine and cosine curves, but simpler
ones which he called "wavelets." Yves Meyer, a mathematician, recognized this
work to be part of the field of harmonic analysis, and came up with a family
of wavelets that he proved were most efficient for modeling complex phenomena.
This work was improved upon by two American researchers, Stephane Mallat of
New York University and Ingrid Daubechies of Bell Labs. Since 1988, there has
been a small explosion of activity in this area, as engineers and researchers
apply the wavelet transform to applications ranging from image compression to
fingerprint analysis. The wavelet transform has even been implemented in
silicon, in the form of a chip from Aware Inc.
But before discussing how the wavelet transform works, let's first examine its
predecessor.


How the Fourier Transform Works


A function in the time domain is translated by the Fourier transform into a
function in the frequency domain, where it can be analyzed for its frequency
content. This translation occurs because the Fourier transform expands the
original function in terms of orthonormal basis functions of sine and cosine
waves of infinite duration. The Fourier coefficients of the transformed
function represent the contribution of each sine and cosine wave at each
frequency.
The Fourier transform works under the assumption that the original time-domain
function is periodic in nature. As a result, the Fourier transform has
difficulty with functions having transient components, that is, components
localized in time. This is especially apparent when a signal has sharp
transitions. Another problem is that the Fourier transform of a signal does
not convey any information pertaining to translation of the signal in time.
Applications that use the Fourier transform often work around the first
problem by windowing the input data so that the sampled values converge to 0
at the endpoints. Attempts to solve the second problem, such as the
development of the short-time Fourier transform, have met with marginal
success.
In recent years, new families of orthonormal basis functions have been
discovered that lead to transforms which overcome the problems of the Fourier
transform. These basis functions are called "wavelets," and unlike the sine
and cosine wave of the Fourier transform, they need not have infinite
duration. They can be nonzero for only a small range of the wavelet function.
This "compact support" allows the wavelet transform to translate a time-domain
function into a representation that is localized not only in frequency (like
the Fourier transform) but in time as well. This ability has brought forth new
developments in the fields of signal analysis, image processing, and data
compression.
This article provides a basic understanding of wavelets and their use in
representing signals through the wavelet transform. A description and
implementation of a computationally efficient fast wavelet transform is also
presented. It does for the wavelet transform what the Fast Fourier Transform
(FFT) does for the Discrete Fourier Transform (DFT).


Meeting the Wave Head On


Before implementing a fast wavelet transform, a brief explanation of wavelets
and the wavelet transform is in order. There are two fundamental equations
upon which wavelet calculations are based: the scaling function (also called
the "basic dilation equation" or "fundamental recursion"), see Example 1(a);
and the basic (or "primary wavelet") function, see Example 1(b), where Z is
the set of integers and the a[k] are the wavelet coefficients. Both functions
are two-scale difference equations. They are the prototypes of a class of
orthonormal basis functions of the form shown in Example 1(c), where the
parameter j controls the dilation or compression of the function in time scale
and amplitude. The parameter k controls the translation of the function in
time. The set of basis functions formed by phi(t) and psi(t) is a system of
scaled and translated wavelets.
Now, not just any set of a[k] can be used as wavelet coefficients. Certain
wavelet conditions must be satisfied for a set of a[k] to qualify as wavelet
coefficients. These conditions are shown in Example 1(d) where a[k] is the
conjugate of a[k]. (a[k] can be complex.)
Wavelet systems can, as a result, be either real or complex valued. Most
references have used real-valued wavelet systems. Wavelet systems may or may
not have compact support. (As mentioned earlier, wavelets have compact support
if, and only if, they have a finite number of nonzero coefficients.) It is
compact support that allows wavelets to localize in both time and frequency,
so we will deal with wavelets of this type only.
Several techniques have been used to create wavelet systems. These include
cubic splines, complex exponentials, and parameter-space constructions. The
wavelet coefficient generation routine in Listing Two uses formulas to
generate a parameter space for the general, real-valued wavelet system with
two to six coefficients. These parameter-space formulas were developed by
David Pollen of Aware Inc. and are listed in Table 1. The parameter space
modeled by the formulas includes the wavelet systems introduced by Ingrid
Daubechies of Bell Laboratories and the original wavelet basis, the Haar
basis.
Once a wavelet system is created, it can be used to expand a function g(t) in
terms of the basis functions shown in Example 2(a), with the coefficients
calculated by inner products, as seen in Example 2(b). If the wavelet system
has compact support and an upper limit is placed upon the degree of dilation
j, then the expansion equation becomes that shown in Example 2(c).
The expansion coefficients c(l) represent the approximation of the original
signal g(t) with a resolution of one point per every 2{J} points of the
original signal. The expansion coefficients d(j,k) represent details of the
original signal at different levels of resolution. These coefficients
completely and uniquely describe the original signal and can be used in a way
similar to the Fourier transform. The wavelet transform, then, is the process
of determining the values of c(l) and d(j,k) for a given g(t) and wavelet
system.


Digital Filers, Trees, and Fast Transforms


The Fourier transform was made practical to implement on computers by the
development of the Fast Fourier Transform by J.W. Cooley and J.W. Tukey in the
mid-sixties. (Actually, it appears that Cooley and Tukey rediscovered the FFT.
An unpublished treatise, written by Carl Friedrich Gauss in the early 1800s,
describes an algorithm similar to the one developed by Cooley and Tukey.
Gauss's algorithm seems to predate Fourier's 1807 work on harmonic analysis!
You can draw your own conclusions.) The FFT eliminated redundancies that exist
in the Discrete Fourier Transform. While a DFT of length N requires N{2}
multiplications and additions, the FFT requires only N log[2] N such
operations.
The expansion equation naturally leads to a recursive algorithm for the
wavelet transform, given certain assumptions. First, the function g(t) is
taken as a sequence of discrete points, Y[J,p], sampled at 2{J} points per
unit interval. These points can be viewed as the inner products of phi[J,p]
and g(t). That is, the sample points are an approximation, or c(l)
coefficients, of the continuous function g(t). This allows c(l) and d(j,k)
terms to be calculated by direct convolution of g(t) samples with the
coefficients a[k].
Daubechies has discovered that the wavelet transform can be implemented with a
specially designed pair of Finite Impulse Response (FIR) filters called a
"Quadrature Mirror Filter" (QMF) pair. FIR filters and QMFs come from the
field of digital signal processing. A FIR filter performs the dot product (or
sum of products) between the filter coefficients and the discrete samples in
the tapped delay line of the filter. The act of passing a set of discrete
samples, representing a signal, through a FIR filter is a discrete convolution
of the signal with the filter's coefficients; see Figure 1.
QMF filters are used in digital speech analysis and spectral decomposition.
Briefly, QMFs are distinctive because the frequency responses of the two FIR
filters separate the high-frequency and low-frequency components of the input
signal. The dividing point is usually halfway between 0 Hertz (DC) and half
the data Sampling rate (the Nyquist frequency).
The outputs of the QMF filter pair are decimated (or de-sampled) by a factor
of two; that is, every other output sample of the filter is kept, and the
others are discarded. The low-frequency (low-pass) filter outputs is fed into
another identical QMF filter pair. This operation can be repeated recursively
as a tree or pyramid algorithm (Figure 2), yielding a group of signals that
divides the spectrum of the original signal into octave bands with
successively coarser measurements in time as the width of each spectral band
narrows and decreases in frequency; see Figure 3.
Mallat has shown that the tree or pyramid algorithm can be applied to the
wavelet transform by using the wavelet coefficients as the filter coefficients
of the QMF filter pairs. The same wavelet coefficients are used in both
low-pass and high-pass (actually, band-pass) filters. The low-pass filter
coefficients are associated with the a[k] of the scaling function phi. The
output of each low-pass filter is the c(l), or approximation components, of
the original signal for that level of the tree. The high-pass filter is
associated with the a[k] of the wavelet function psi. (Note the alternating
sign change.) The output of each high-pass filter is the d(j,k), or detail
components, of the original signal at resolution 2{j}. The c(l) of the
previous level are used to generate the new c(l) and d(j,k) for the next level
of the tree. Decimation by two corresponds to the multiresolutional nature
(the j parameter) of the scaling and wavelet functions; see Figure 4 .
The reverse fast wavelet transform essentially performs the operations
associated with the forward fast wavelet transform in the opposite direction.
The expansion coefficients are combined to reconstruct the original signal.
The same a[k] coefficients are used as in the forward transform, but in
reverse order. The process works down the branches of the tree combining the
approximation and detail signals into approximation signals with higher levels
of detail; see Figure 5. Instead of decimation, the signals are interpolated:
0s are placed between each approximation and detail sample, and the signals
are then passed through the low-pass and high-pass filters. The intermediate 0
values are replaced by "estimates" derived from the convolutions. The filters'
outputs are then summed to form the approximation coefficients for the
next-higher level of resolution; see Figure 6. The final set of approximation
coefficients at the tree's top level in the reverse transform is a
reconstruction of the original signal data points.
The fast wavelet transform is actually more computationally efficient than the
Fast Fourier Transform. As mentioned previously, an FFT of length N (where N
is an integral power of 2) takes on the order of N log[2] N operations. A fast
wavelet transform of length N requires approximately N operations--the best
possible. The FIR filters are linear processing elements. In the tree
algorithm, each consecutive level performs only half as many operations as its
predecessor.


Catch a Wave...



Wavelet filter coefficients are stored in arrays of type double, six elements
in length. Unused array elements are set to 0. Storage for the sampled data
signal is also an array of type double; see Figure 7(a). The only constraint
on the length of the array is that the number of elements must be an integer
multiple of the number of sample points per unit interval, 2J. (J, once again,
is the number of levels in the wavelet transform to be calculated.) An
additional five storage locations are allocated to allow for convolution
overrun. Transform coefficient storage consists of a dynamically created array
of arrays of type double; see Figure 7(b). The pairs of arrays store the
approximation and detail coefficients for each given level. The length of each
pair of arrays is set according to the length of the array containing the
sampled data signal and the number of levels in the transform. If the length
of the input data array is N+5, then the topmost pair of arrays has a length
of N/2+5 data elements, the next pair has a length of N/4+5 data elements, and
so on.
Listings One and Two, page 100, provide a set of C functions for implementing
the forward and inverse fast wavelet transforms, the generation of wavelet
filter coefficients, and data-structure management. Headers for the functions
are provided in Listing Three; see page 101. The functions are written in
Turbo C 2.0 and conform to standard C conventions. The code should be portable
to most compilers and host systems.
The functions in Listing One provide utilities for managing the storage
structure for wavelet transform coefficients. BuildTreeStorage uses calloc to
create a dynamic storage structure (array of arrays) for wavelet coefficients
and intermediate results. DestroyTreeStorage uses free to delete an existing
wavelet-coefficient storage structure. TreeCopy copies wavelet coefficients
from one storage structure to another. Only the c(l) (approximation)
coefficients on the last level are copied along with all the d(j,k) (detail)
coefficients. TreeZero sets all elements of the transform coefficient storage
structure to 0. ZeroTreeDetail sets a given number of detail-coefficient
levels to 0. The routine starts with the top level of detail coefficients and
works down the required number of levels.
WaveletCoeffs, see Listing Two, implements the formulas to generate a
parameter space for the general, real-valued wavelet system with two to six
coefficients. The formulas are listed in Table 1. Varying the values of alpha
and beta yields an infinite variety of wavelet coefficient sets (a[k]).
MakeWaveletFilters takes the coefficients created by WaveletCoeffs and creates
the low-pass and high-pass filters for either the forward or the inverse fast
wavelet transform.
To make the implementation of the forward and inverse fast wavelet transforms
easier to comprehend, the algorithms have been coded into several simple
routines. These routines represent different levels of the algorithms
functionality.
DotP performs the dot product for the forward fast wavelet transform. Each
coefficient of the wavelet filter is used in calculating a dot product.
ConvolveDec2 implements convolution and decimation for the transform. The dot
products of the wavelet filter coefficients and the sample data in the input
array are calculated via calls to DotP. As shown in Figure 4,there is a shift
of two samples between each dot product, thus accomplishing the decimation.
The dot products become the transform coefficients of the branch of the next
level. Special test conditions are taken for data samples at the beginning and
end of the input array.
DecomposeBranches decomposes a level of the forward fast wavelet transform.
The tree's current approximation level is transformed into two branches
containing the detail and approximation decimations for the next level. This
is accomplished with two calls to ConvolveDec2. One call passes the low-pass
wavelet filter coefficients to generate the approximation coefficients for the
next-lower transform level. The second call passes the high-pass wavelet
filter coefficients to generate the detail coefficients for the next-lower
transform level.
WaveletDecomposition is the driver routine for the forward fast wavelet
transform algorithm. The code performs the necessary number of decompositions
for the desired number of levels via calls to DecomposeBranches. For the first
pass of the loop, InData is the sampled input signal. For subsequent loops,
the approximation coefficients of the previously calculated level are used.
DotpEven and DotpOdd use the even- and odd-numbered filter coefficients to
calculate dot products for the inverse fast wavelet transform. ConvolveInt2
implements the convolution, interpolation, and summation operations for the
inverse fast wavelet transform. The dot products of the wavelet filter
coefficients and groups of samples in the input array are calculated via calls
to DotpEven and DotpOdd. Alternating application of DotpEven and DotpOdd on
the same subset of data points emulates interpolation of the data with 0s
inserted between each data point, as shown in Figure 7. Special test
conditions are taken for data samples at the beginning and end of the array.
ReconstructBranches performs reconstruction of an inverse wavelet transform
level. ConvolveInt2 is called twice to convolve and sum the approximation and
detail branches to form the next, more-detailed level of approximation
coefficients. The first call of ConvolveInt2 convolves the low-pass wavelet
filter with the level's approximation branch. The second call convolves the
high-pass wavelet filter with the level's detail branch and sums the results
of both convolutions. WaveletReconstruction is the driver routine for inverse
fast wavelet transform. It performs the necessary number of reconstructions
for the number of wavelet coefficient levels using ReconstructBranches.


Making Waves...


To experiment with the wavelet transform, use the WAVEDEMO program, which
employs the functions in Listings One and Two. WAVEDEMO is written in Turbo C
2.0 and uses Borland's BGI graphics to provide support for CGA, EGA, VGA, and
Hercules graphics adapters. The display shows the original signal and the
expansion coefficients at decreasing resolutions. An alternate display shows
the expansion coefficients and the reconstructed signal. The program can
perform up to six levels of coefficient expansion. The screen acts as a window
that can be scrolled over the displayed coefficients. Another display shows
the signal data and expansion coefficients in numerical format. A menu
interface allows the user to generate wavelet coefficients; decompose and
reconstruct signals; view and modify the expansion coefficients; and store and
retrieve signal data and expansion coefficients from disk files. Because of
its size, the program is available electronically; see page 3. The electronic
version includes the WAVEDEMO executable and source, sample input data and
wavelet coefficient files, and documentation.


Wave Goodbye!


The wavelet transform allows for simultaneous analysis of time and frequency
in signals. An infinite variety of wavelets enables the user to tailor that
analysis to extract this information of interest. An efficient implementation
for computers has been made possible through the fast wavelet transform
algorithm. The software in this article now makes this tool available to the
masses.
Special thanks to Mr. Karl Jagler of Aware Inc. (Cambridge, Mass.), for giving
me the insights and understanding into implementing the fast wavelet
transform. His time and patience were at least as important in helping me
understand wavelets as the reference articles and algorithm examples he sent
me.


Bibliography


Elliot, Douglas F. ed. Handbook of Digital Signal Processing: Engineering
Applications. San Diego, Calif.: Academic Press, 1987.
Mallat, Stephane G. "A Theory for Multiresolution Signal Decomposition: The
Wavelet Representation." IEEE Transactions on Pattern Analysis and Machine
Intelligence (July, 1989).
Resnikoff, H.L. "Foundations of Arithmeticum Analysis: Compactly Supported
Wavelets and the Wavelet Group." Aware Report No. AD890507.1.
Resnikoff, H.L. and C.S. Burrus. "Relationships Between the Fourier Transform
and the Wavelet Transform." Aware Report No. AD900609.
Strang, Gilbert. "Wavelets and Dilation Equations: A Brief Introduction." SIAM
Review (December, 1989).
The Ultra Wave Explorer User's Manual. Cambridge, Mass.: Aware Inc., 1989.


_THE FAST WAVELET TRANSFORM_
by Mac A. Cody


[LISTING ONE]

#define WAVE_MGT
#include <alloc.h>
#include "wave_mgt.h"
double **BuildTreeStorage(int inlength, int levels)
{
 double **tree;
 int i, j;
 /* create decomposition tree */
 tree = (double **) calloc(2 * levels, sizeof(double *));
 j = inlength;
 for (i = 0; i < levels; i++)
 {
 j /= 2;
 if (j == 0)
 {
 levels = i;
/* printf("\nToo many levels requested for available data\n");

 printf("Number of transform levels now set to %d\n", levels); */
 continue;
 }
 tree[2 * i] = (double *) calloc((j + 5), sizeof(double));
 tree[2 * i + 1] = (double *) calloc((j + 5), sizeof(double));
 }
 return tree;
}
void DestroyTreeStorage(double **tree, int levels)
{
 char i;
 for (i = (2 * levels - 1); i >= 0; i--)
 free(tree[i]);
 free(tree);
}
void TreeCopy(double **TreeDest, double **TreeSrc, int siglen, int levels)
{
 int i, j;
 for (i = 0; i < levels; i++)
 {
 siglen /= 2;
 for (j = 0; j < siglen + 5; j++)
 {
 if ((i + 1) == levels)
 TreeDest[2 * i][j] = TreeSrc[2 * i][j];
 else
 TreeDest[2 * i][j] = 0.0;
 TreeDest[(2 * i) + 1][j] = TreeSrc[(2 * i) + 1][j];
 }
 }
}
void TreeZero(double **Tree, int siglen, int levels)
{
 int i, j;
 for (i = 0; i < levels; i++)
 {
 siglen /= 2;
 for (j = 0; j < siglen + 5; j++)
 {
 Tree[2 * i][j] = 0.0;
 Tree[(2 * i) + 1][j] = 0.0;
 }
 }
}
void ZeroTreeDetail(double **Tree, int siglen, int levels)
{
 int i, j;
 for (i = 0; i < levels; i++)
 {
 siglen /= 2;
 for (j = 0; j < siglen + 5; j++)
 Tree[(2 * i) + 1][j] = 0.0;
 }
}







[LISTING TWO]

/* WAVELET.C */
#include <math.h>
typedef enum { DECOMP, RECON } wavetype;
#include "wavelet.h"
void WaveletCoeffs(double alpha, double beta, double *wavecoeffs)
{
 double tcosa, tcosb, tsina, tsinb;
 char i;
 /* precalculate cosine of alpha and sine of beta to reduce */
 /* processing time */
 tcosa = cos(alpha);
 tcosb = cos(beta);
 tsina = sin(alpha);
 tsinb = sin(beta);
 /* calculate first two wavelet coefficients, a = a(-2) and b = a(-1) */
 wavecoeffs[0] = ((1.0 + tcosa + tsina) * (1.0 - tcosb - tsinb)
 + 2.0 * tsinb * tcosa) / 4.0;
 wavecoeffs[1] = ((1.0 - tcosa + tsina) * (1.0 + tcosb - tsinb)
 - 2.0 * tsinb * tcosa) / 4.0;
 /* precalculate cosine and sine of alpha minus beta to reduce */
 /* processing time */
 tcosa = cos(alpha - beta);
 tsina = sin(alpha - beta);
 /* calculate last four wavelet coefficients c = a(0), d = a(1), */
 /* e = a(2), and f = a(3) */
 wavecoeffs[2] = (1.0 + tcosa + tsina) / 2.0;
 wavecoeffs[3] = (1.0 + tcosa - tsina) / 2.0;
 wavecoeffs[4] = 1 - wavecoeffs[0] - wavecoeffs[2];
 wavecoeffs[5] = 1 - wavecoeffs[1] - wavecoeffs[3];
 /* zero out very small coefficient values caused by truncation error */
 for (i = 0; i < 6; i++)
 {
 if (fabs(wavecoeffs[i]) < 1.0e-15)
 wavecoeffs[i] = 0.0;
 }
}
char MakeWaveletFilters(double *wavecoeffs, double *Lfilter,
 double *Hfilter, wavetype transform)
{
 char i, j, k, filterlength;
 /* find the first non-zero wavelet coefficient */
 i = 0;
 while(wavecoeffs[i] == 0.0)
 i++;
 /* find the last non-zero wavelet coefficient */
 j = 5;
 while(wavecoeffs[j] == 0.0)
 j--;
 /* form decomposition filters h~ and g~ or reconstruction filters h and g.
 Division by 2 in construction of decomposition filters is for normalization
*/
 filterlength = j - i + 1;
 for(k = 0; k < filterlength; k++)
 {
 if (transform == DECOMP)
 {
 Lfilter[k] = wavecoeffs[j--] / 2.0;

 Hfilter[k] = (double) (((i++ & 0x01) * 2) - 1) * wavecoeffs[i] / 2.0;
 }
 else
 {
 Lfilter[k] = wavecoeffs[i++];
 Hfilter[k] = (double) (((j-- & 0x01) * 2) - 1) * wavecoeffs[j];
 }
 }
 /* clear out the additional array locations, if any */
 while (k < 6)
 {
 Lfilter[k] = 0.0;
 Hfilter[k++] = 0.0;
 }
 return filterlength;
}
double DotP(double *data, double *filter, char filtlen)
{
 char i;
 double sum;
 sum = 0.0;
 for (i = 0; i < filtlen; i++)
 sum += *data-- * *filter++; /* decreasing data pointer is */
 /* moving backward in time */
 return sum;
}
void ConvolveDec2(double *input_sequence, int inp_length,
 double *filter, char filtlen, double *output_sequence)
/* convolve the input sequence with the filter and decimate by two */
{
 int i;
 char shortlen, offset;
 for(i = 0; (i <= inp_length + 8) && ((i - filtlen) <= inp_length + 8); i +=
2)
 {
 if (i < filtlen)
 *output_sequence++ = DotP(input_sequence + i, filter, i + 1);
 else if (i > (inp_length + 5))
 {
 shortlen = filtlen - (char) (i - inp_length - 4);
 offset = (char) (i - inp_length - 4);
 *output_sequence++ = DotP(input_sequence + inp_length + 4,
 filter + offset, shortlen);
 }
 else
 *output_sequence++ = DotP(input_sequence + i, filter, filtlen);
 }
}
int DecomposeBranches(double *In, int Inlen, double *Lfilter,
 double *Hfilter, char filtlen, double *OutL, double *OutH)
 /* Take input data and filters and form two branches of have the
 original length. Length of branches is returned */
{
 ConvolveDec2(In, Inlen, Lfilter, filtlen, OutL);
 ConvolveDec2(In, Inlen, Hfilter, filtlen, OutH);
 return (Inlen / 2);
}
void WaveletDecomposition(double *InData, int Inlength, double *Lfilter,
 double *Hfilter, char filtlen, char levels, double **OutData)
/* Assumes input data has 2 ^ (levels) data points/unit interval. First InData

is input signal; all others are intermediate approximation coefficients */
{
 char i;
 for (i = 0; i < levels; i++)
 {
 Inlength = DecomposeBranches(InData, Inlength, Lfilter, Hfilter,
 filtlen, OutData[2 * i], OutData[(2 * i) + 1]);
 InData = OutData[2 * i];
 }
}
double DotpEven(double *data, double *filter, char filtlen)
{
 char i;
 double sum;
 sum = 0.0;
 for (i = 0; i < filtlen; i += 2)
 sum += *data-- * filter[i]; /* decreasing data pointer is moving */
 /* backward in time */
 return sum;
}
double DotpOdd(double *data, double *filter, char filtlen)
{
 char i;
 double sum;
 sum = 0.0;
 for (i = 1; i < filtlen; i += 2)
 sum += *data-- * filter[i]; /* decreasing data pointer is moving */
 /* backward in time */
 return sum;
}
void ConvolveInt2(double *input_sequence, int inp_length, double *filter,
 char filtlen, char sum_output, double *output_sequence)
 /* insert zeros between each element of the input sequence and
 convolve with the filter to interpolate the data */
{
 int i;
 if (sum_output) /* summation with previous convolution if true */
 {
 /* every other dot product interpolates the data */
 for(i = (filtlen / 2) - 1; i < inp_length + filtlen - 2; i++)
 {
 *output_sequence++ += DotpOdd(input_sequence + i, filter, filtlen);
 *output_sequence++ += DotpEven(input_sequence + i + 1, filter, filtlen);
 }
 *output_sequence++ += DotpOdd(input_sequence + i, filter, filtlen);
 }
 else /* first convolution of pair if false */
 {
 /* every other dot product interpolates the data */
 for(i = (filtlen / 2) - 1; i < inp_length + filtlen - 2; i++)
 {
 *output_sequence++ = DotpOdd(input_sequence + i, filter, filtlen);
 *output_sequence++ = DotpEven(input_sequence + i + 1, filter, filtlen);
 }
 *output_sequence++ = DotpOdd(input_sequence + i, filter, filtlen);
 }
}
int ReconstructBranches(double *InL, double *InH, int Inlen,
 double *Lfilter, double *Hfilter, char filtlen, double *Out)

 /* Take input data and filters and form two branches of have
 original length. length of branches is returned */
{
 ConvolveInt2(InL, Inlen, Lfilter, filtlen, 0, Out);
 ConvolveInt2(InH, Inlen, Hfilter, filtlen, 1, Out);
 return Inlen * 2;
}
void WaveletReconstruction(double **InData, int Inlength, double *Lfilter,
 double *Hfilter, char filtlen, char levels, double *OutData)
 /* assumes that input data has 2 ^ (levels) data points per unit interval */
{
 double *Output;
 char i;
 Inlength = Inlength >> levels;
 /* Destination of all but last branch reconstruction is the next
 higher intermediate approximation */
 for (i = levels - 1; i > 0; i--)
 {
 Output = InData[2 * (i - 1)];
 Inlength = ReconstructBranches(InData[2 * i], InData[(2 * i) + 1],
 Inlength, Lfilter, Hfilter, filtlen, Output);
 }
 /* Destination of the last branch reconstruction is the output data */
 ReconstructBranches(InData[0], InData[1], Inlength, Lfilter, Hfilter,
 filtlen, OutData);
}
double CalculateMSE(double *DataSet1, double *DataSet2, int length)
{
 /* calculate mean squared error of output of reconstruction as
 compared to the original input data */
 int i;
 double pointdiff, topsum, botsum;
 topsum = botsum = 0.0;
 for (i = 0; i < length; i++)
 {
 pointdiff = DataSet1[i] - DataSet2[i];
 topsum += pointdiff * pointdiff;
 botsum += DataSet1[i] * DataSet1[i];
 }
 return topsum / botsum;
}






[LISTING THREE]

/* WAVE_MGT.H */
double **BuildTreeStorage(int inlength, int levels);
void DestroyTreeStorage(double **tree, int levels);
void TreeCopy(double **TreeDest, double **TreeSrc, int siglen, int levels);
void TreeZero(double **Tree, int siglen, int levels);
void ZeroTreeDetail(double **Tree, int siglen, int levels);
/* WAVELET.H */
void WaveletCoeffs(double alpha, double beta, double *wavecoeffs);
char MakeWaveletFilters(double *wavecoeffs, double *Lfilter,
 double *Hfilter, wavetype transform);

double DotP(double *data, double *filter, char filtlength);
void ConvolveDec2(double *input_sequence, int inp_length,
 double *filter, char filtlen, double *output_sequence);
int DecomposeBranches(double *In, int Inlen, double *Lfilter,
 double *Hfilter, char filtlen, double *OutL, double *OutH);
void WaveletDecomposition(double *InData, int Inlength, double *Lfilter,
 double *Hfilter, char filtlen, char levels, double **OutData);
double DotpEven(double *data, double *filter, char filtlength);
double DotpOdd(double *data, double *filter, char filtlength);
void ConvolveInt2(double *input_sequence, int inp_length, double *filter,
 char filtlen, char sum_output, double *output_sequence);
int ReconstructBranches(double *InL, double *InH, int Inlen,
 double *Lfilter, double *Hfilter, char filtlen, double *Out);
void WaveletReconstruction(double **InData, int Inlength, double *Lfilter,
 double *Hfilter, char filtlen, char levels, double *OutData);
double CalculateMSE(double *DataSet1, double *DataSet2, int length);














































April, 1992
YOUR OWN HANDPRINTING RECOGNITION ENGINE


Algorithms for putting pen to pad


 This article contains the following executables: HANDPRINT.URC


Ron Avitzur


Ron completed undergraduate physics at Stanford in 1990. While there, he
developed Milo and the math engine used by FrameMaker, the only page-layout
program which can symbolically evaluate derivatives. He can be reached at P.O.
Box 6692, Stanford, CA 94309.


This article discusses the design and implementation of a writer-dependent
recognizer for handprinted text. This recognition engine forms the basis of a
pen-based interface to a symbolic math program. My recognizer is distinguished
by its small size and straightforward implementation, which is nevertheless
able to achieve high character accuracy. The program currently runs on the
Macintosh, but the core code is highly portable and platform independent.


History of a Communications Bottleneck


When I was an undergraduate doing physics problem sets, I had to generate
pages and pages of simple, but tedious, algebra--by hand. I found that the
symbolic mathematics programs--such as Macsyma, Reduce, and SMP--were largely
useless in handling students' simple problems. Although symbolic math programs
can do powerful computations instantaneously, their command languages are
designed around the limitations of 30-year-old teletype terminals. The whole
process of typing formulas, interpreting results, and producing finished
documents took longer by computer than by hand.
While in school, I explored these limitations by developing a microcomputer
math program with a mouse-based user interface tailor-made for my needs (a
Macintosh program called "Milo"). But when I returned to my studies, I
discovered that I could still work faster by hand, because communicating with
the computer through a narrow input pipe is much too slow--even with a mouse
assisting the keyboard input. You can write formulas faster by hand because
mathematical notation is intrinsically two-dimensional, and because many
different symbols are required (more than there are keys on the keyboard or
convenient choices on a menu bar).
I determined that the problem could be solved using a pen-based interface, and
so I wrote a gesture recognizer and built it into an experimental version of
Milo. While this itself is not a product, I do license the core engine (which
handles mathematical entry, editing, type-setting, and computation) for use as
an embedded component inside other applications.


Design Constraints


A primary characteristic of my recognizer is that it is designed for
interactive programs. This means it has to work fast--at the same time that
the user is moving the stylus across the digitizer. It also means that I can
rely on the dynamics (stroke order) as well as the statics (point positions)
of handprinted characters. (In this context, "handprinted" refers to distinct,
separate characters, as opposed to connected, "handwritten" characters in
cursive.)
Another issue is that a recognition system suitable for math has different
constraints than one for English words. The symbol set for mathematics
consists of hundreds of characters, rather than just a few dozen. There are
many special symbols, including Greek characters and operator symbols.
Recognizers designed for standard alphabetic text increase word accuracy by
relying on context and dictionary lookup. Math formulas do not have words, per
se, and require a high character accuracy. Fortunately, math notation is
printed, not written in cursive, and is therefore easier to recognize than
handwritten text.
To better handle large character sets, the recognizer described here is writer
dependent--that is, the writer must both train the system and adapt to it. The
training process stores a number of features for each character in the set.
During recognition, these stored features are matched against the features of
the character at hand. One feature of the recognizer is that it not only maps
user-defined shapes to characters in a text or math font, but that it can also
map shapes to arbitrary function codes. This allows the system to support
gestures for operations such as simplify, delete, and move.
Another important design goal was to keep the implementation short and simple.
Finding bugs in small amount of code is hard enough--imagine how it is with a
large recognizer. Because recognizers are evaluated on a statistical basis
(such as x percent accuracy), it is possible for subtle bugs to remain hidden
for a long time, reducing accuracy without crashing the program.
A related and important design goal is that of inspectability--of code, data,
and algorithms. I can't stress this enough. Many of the bugs I discovered
during development were subtle. They did not cause crashes or seriously
degrade performance, but they made it impossible to improve performance
because I did not understand which data values were actually being computed
for each stroke. I then settled on a scheme which maintains a human-readable
representation (ASCII) of the 32-bit hash values that characterize each
stroke. Although slightly less efficient in time and space, this makes it
possible to inspect the character data while the program is running, and thus
understand and verify the recognition process at each step in its execution.
Figure 1 shows some sample data describing the character O. The meaning of
these strings will be explained in the next section.
Figure 1: A text dump of feature data for the letter O

 b: WSENW NWSENENW
 c: DCBA ADCBA ADCBAD ADBA
 d: 7654321 07654321 1076543210 076543210 1764321
 e: 101232 21012321 210123210 2101231 2101232 210232
 f: 0123210 01320



The Recognition Process


Recognition begins immediately after a pen-up event. To allow multistroke
letters, segmentation (dividing the input stream into distinct strokes) is
based on timing. If a stroke (that is, the path between pen-down and pen-up;
an array of x,y points) is recognized unambiguously, the system immediately
displays the chosen character. If a stroke can correspond both to a
single-stroke character and to the beginning of a multistroke character, the
system waits for one-sixth of a second before returning a guess. If a new
stroke starts before the time-out, the recognizer considers both strokes to be
part of a single letter. For example, a hyphen (-) will be recognized unless
the writer continues to mark, say, an equal sign (=) or a plus sign (+) during
the time-out interval. An eight (8), on the other hand, will be recognized
immediately after pen-up.
Ink is collected as fast as the tablet allows. Each stroke may contain from 10
to 100 points. The first step is a preprocessor that throws away most of these
points in order to speed up subsequent calculations. I experimented with many
preprocessing methods. In the end, I chose the fastest and simplest, because
profiling showed that 50 percent of the time was spent in preprocessing. Also,
I was spending too much debugging time on that portion of the code. The method
I used for developing this routine was to write a letter in a window and see
all the ink, then filter those points out and have the program immediately
display the letter using only the points remaining. If I could still clearly
recognize the letter by eye, I wasn't throwing away too much information. The
routine Process3( ) copies the array of points from Ink[] into the array P[],
ignoring any points less than Height/8 (in the y direction) and Width/8 (in
the x direction) away from the previous point. This filtering step is shown in
the top part of Figure 2. The lower part of the figure shows the result of the
feature-extraction process on the sample letter O.
Every character is described by five different features: three direction lists
and two position lists. The features are calculated by what I call "hash
functions," shown numbered 1 to 5 in Figure 2.
The first feature, a direction list, consists of a string that describes the
path taken by the strokes comprising a character. For example, the letter O
can be written as a counterclockwise circle, which would be described by the
string WSENW. This means that the pentip generally begins moving to the west
(+/-45 degrees), then south, and so on.
More Details.
The second direction list is a string that partitions the input space into
four regions, rotated 45 degrees from compass directions. The third direction
list is a string partitioning the input space into eight compass directions.
These three direction lists are packed into a 32-bit integer so that searching
and matching can be done quickly. Four possible values (such as N, E, S, and
W) can be encoded in 2 bits. A 32-bit integer can therefore contain up to 16
encoded values, in the case of the first two direction lists. The third
direction list requires 3 bits per character to encode the eight possible
compass directions. This means that up to ten possible values can be stored in
a 32-bit integer. This encoding limits the complexity of strokes that can be
stored and distinguished.
The recognizer will look at each of these hash values and generate a list of
possible matches. Each function gets a weighted vote and can add multiple
candidates to the list of possible matches. Rather than compare each stroke
with each trained pattern, the training data is sorted by hash value so that a
binary search quickly finds matches.
Direction information is almost sufficient, but not quite. Direction alone
cannot distinguish between h and n, for example, or b and p, or 6 and 0 in my
handwriting. During development, I initially looked only at the first hash
function and achieved about 70 percent accuracy. The other direction functions
fixed some problems but added mostly redundant information. Looking at the
positions brought down the error rate to under 10 percent. So after a list of
possible candidates are chosen and voted upon by the direction functions, the
two position functions vote.



The Code


The code for the recognition engine, including a simple shell that allows for
testing and training, consists of six files. Due to space limitations, only
the most important excerpts are shown in the accompanying listing. The
complete code is available electronically from DDJ in the form of a Lightspeed
C 5.0 project for the Macintosh. Most of the files are system independent,
using ANSI-C libraries for memory management and file I/O.
While the code will work with a mouse, it works better with a digitizing
tablet. I developed the code using a Wacom 510C digitizer (Wacom Technology,
Vancouver, Wash.), which has a cordless pen, 500 lines-per-inch resolution,
and a transfer rate of 120 points per second. The digitizer is connected
through a 9600 baud serial port. The system also works well with Wacom's
HD-648A LCD integrated tablet that also has a cordless pen and comparable
resolution.
Here's a summary of the routines available electronically. The file
TestShell.c contains main( ), which calls the usual Macintosh initialization
routines, opens a window, and goes into the main event loop. The module
FileIO.c consists of two routines, ReadPatterns( ) and SavePatterns( ), which
read and write a user's handwriting data to disk. The file RecognizerUtil.c
contains miscellaneous routines for initializing the tablet, defining an ATAN2
table (to cut down on floating-point usage), and manipulating the List data
structures. The file Trainer.c contains DoFastTrainer-Dialog( ), which brings
up the modal dialog (shown in Figure 3) and adds each letter trained to the
"patterns" global variable. To bring up the training dialogue in the runtime
shell, you must type command-L.
If you want to compile the recognizer code to run on another platform, such as
Microsoft Windows, you'll need to change the routines which initialize the
tablet (in RecognizerUtil.c), track the mouse (in Analyzer.c), set up the
windows (in TestShell.c), and handle the training dialog (in Trainer.c).
The heart of the computation is in two files, Analyzer.c and Recognizer.c. The
module Analyzer.c is responsible for collecting ink and reducing the data
stream to hash values. The routines in Recognizer.c compare the calculated
hash values to a stored list of patterns. Listing One (page 103) contains the
principal routines from analyzer.c, as well as the most important data
structures.
The StrokeData structure is where all the action takes place. During pen
movement, points are accumulated into the member field Ink[]. On pen-up, the
routine Simplify( ) filters out most of the points and collects about 10 or 20
into member field P[]. Then Analyze( ) fills in the StrokeData fields. Member
field T[] is a list of directions (-180..180) between the points in P[]. Xmax,
Xmin, Ymax, and Ymin together define the bounding box for this stroke.
Similarly, XmaxT, XminT, YmaxT, and YminT define the cumulative bounding
box--the union of bounding boxes for this stroke and all previous strokes in
this letter. IsDot is a Boolean special case that is set true if the bounding
box is smaller than a threshold. The array S[5] contains the feature set--the
five direction and position lists described earlier.
After these fields are computed, the matches list is filled with guesses. For
the first stroke of a letter, the features S1, S2, and S3 are compared to all
the letters in patterns; weighted guesses are then kept in the matches field
of that stroke. For further strokes in a letter, we look only at letters
proposed in the matches field of the previous stroke.


Future Improvements


Because my interest is in math systems, I developed this only until I was
convinced the problem is solvable. To proceed further, I would suggest writing
an ink collector to store many samples of each letter, allowing the user the
view the ink directly and delete mistakes. Then one could experiment with
different kinds of functions by training on half the strokes in the ink set
and testing on the other half.
To improve accuracy, various strategies come to mind. You could define other
character features and provide the hash functions to calculate them. The
current implementation relies on position and its first derivative (angle). A
natural extension is to look at the integral of the curve (area) or its second
derivative (curvature). The aspect ratio of the bounding box would make a
useful single number to include in the weighting scheme. The direction
functions are sensitive to rotation, so it would work better to subtract out
the initial angle when computing the string, and use the initial angle as a
separate number. I would suggest trying functions which compare points
separated along the curve, because these functions examine only local
information.


Try Your Hand at Recognition


The handprint recognizer presented in Ron Avitzur's article does a lot with a
small amount of code. The implementation is fast and straightforward, and
works surprisingly well in practice. But here's an invitation to readers to
see if you can do better.
To this end, we're proud to announce the first ever DDJ Recognition Contest.
We're just now in the process of building a general-purpose test harness to
exercise your code. This test harness will be implemented on both the
Macintosh and Windows 3 platforms. The API that your recognizer uses to run on
this harness will likely consist of one function call--with "ink" (stylus data
points) passed as input, and a simple text string as output.
The test harness will run each recognition engine over the same set of stored
inputs. For this purpose, the folks at GO Corp. have generously offered their
extensive database of handwriting samples. We're now in the process of
establishing a standard data format for ink samples, in case you want to
implement your own recognition engine from scratch.
If you're interested in this project, please contact us via electronic or
regular mail for full details. We'll also publish more on this contest in our
next issue, and establish an end-of-summer deadline for your contest entries.
Winners will be chosen at the end of this year, and we'll publish excerpts
from the winning entries. The complete code will, of course, be available
online.
Programmers, start your engines!
--editors



_YOUR OWN HANDPRINTING RECOGNITION ENGINE_
by Ron Avitzur



[LISTING ONE]

/*****************************************************************
A Writer-Dependent Hand-printing Recognizer -- by Ron Avitzur, 1991.
This is not the complete code. See the accompanying article for
a description of other necessary modules.
*****************************************************************/

typedef struct { short num_items; void *items[]; } *List;
typedef struct { short code; List strokes; } GesturePattern
typedef struct { char is_dot; List s[5]; } StrokePattern;
typedef struct {
 Point Ink[MAX_POINTS],
 P[MAX_N];
 short Ink_Num,
 N,
 T[MAX_N],
 IsDot;
 unsigned long S[5];
 double Aspect_Ratio;
 long Xmax,Xmin,

 Ymax,Ymin,
 Height,Width,
 XmaxT,XminT,
 YmaxT,YminT,
 HeightT,WidthT;
 long start_time,end_time;
 List matches;
 } StrokeData,
 *StrokePtr;

ProcPtr HashFunctions[] = { Fxn1,Fxn2,Fxn3,Fxn4,Fxn5 };

char Bits[] = {2,2,3,2,2};

#define DOT_THRESHHOLD (Wacom?80:4)
#define PT_SEP_POST (Wacom?40:4)

/****************************************************************/
void Analyze(register StrokePtr theStroke) {
 char i,s[100];
 Simplify();
 for (i = 0; i < 5; i++) {
 HashFunction[i](s,theStroke);

 ConvertStringToLong(s,&S[i],Bits[i]);
 }
 theStroke->IsDot =
 ((theStroke->Height < DOT_THRESHHOLD
 &&
 theStroke->Width < DOT_THRESHHOLD)
 N == 1);
 }
/****************************************************************/
void Simplify(StrokePtr theStroke,Point *Ink,short N) {
 Point Q[MAX_POINTS];
 short min_dx = theStroke->Width / 8,
 min_dy = theStroke->Height / 8;
 if (theStroke->Aspect_Ratio < 0.2) min_dy = theStroke->Height;
 if (theStroke->Aspect_Ratio > 5.0) min_dx = theStroke->Width;
 theStroke->N = Process3(theStroke->P,Ink,N,min_dx,min_dy);
 ComputeT(theStroke);
 }
/****************************************************************/
void ComputeT(StrokePtr theStroke) {
 register short *T = theStroke->T;
 register Point *P = theStroke->P;
 register short i,N = theStroke->N;
 for (i = 0; i < N - 1; i++)
 T[i] = ATAN2(P[i+1].v-P[i].v,P[i+1].h-P[i].h);
 }
/****************************************************************/
short Process3(register Point *P, register Point *Q,
 short num,short xd, short yd) {
 register short i,n;
 register short dx,dy;
 n = 0;
 P[0] = Q[0];
 for (i = 1; i < num - 1; i++) {
 dx = Q[i].h - P[n].h; dx = ABS(dx);

 dy = Q[i].v - P[n].v; dy = ABS(dy);
 if (dx + dy < PT_SEP_POST) continue;
 if (dx < xd && dy < yd) continue;
 n++;
 P[n] = Q[i];
 }
 dx = Q[num - 1].h - P[n].h;
 dy = Q[num - 1].v - P[n].v;
 if (ABS(dx) + ABS(dy) > PT_SEP_POST)
 n++;
 P[n] = Q[num - 1];
 return n + 1;
 }
/****************************************************************/
/* These five lines determine what the features actually are. */
#define Feature1(t) ('0' + ((t + 10 + 45 + 180) / 90) % 4)
#define Feature2(t) ('0' + ((t + 10 + 00 + 180) / 90) % 4)

#define Feature3(t) ('0' + ((t + 10 + 22 + 180) / 45) % 8)
#define Feature4(p) ('0' + (4*((p).h - theStroke->XminT) /theStroke->WidthT))
#define Feature5(p) ('0' + (4*((p).v - theStroke->YminT) /theStroke->HeightT))

/****************************************************************/
#define FOO(name,fxn,type,array,end) \
 void name(char *s,StrokePtr theStroke) \
 { \
 register short i,d,n = 0; \
 register type *T = theStroke->array; \
 s[0] = fxn(*T++); \
 i = theStroke->N - end; \
 while (i-- > 0) { \
 d = fxn(*T++); \
 if (s[n] != d) \
 s[++n] = d; \
 } \
 s[++n] = 0; \
 }
FOO(Fxn1,Feature1,short,T,2)
FOO(Fxn2,Feature2,short,T,2)
FOO(Fxn3,Feature3,short,T,2)
FOO(Fxn4,Feature4,Point,P,1)
FOO(Fxn5,Feature5,Point,P,1)
/****************************************************************/
void ConvertStringToLong(char *s,unsigned long *np,short bits) {
 unsigned long n = 0;
 short i,len = strlen(s);
 s[len] = s[len-1];
 if (len > 32/bits) len = 32/bits;
 for (i = 0; i <= len; i++)
 n = (n << bits) + s[len - i] - '0';
 *np = n;
 }










April, 1992
RED-BLACK TREES


Being Partly balanced can be good enough




Bruce Schneier


Bruce has an MS in Computer Science and has worked in cryptography and data
security for a number of public and private concerns. He can be reached at 730
Fair Oaks Ave., Oak Park, IL 60302.


Red-black trees are a variation of classic binary search trees that use an
efficient mechanism for keeping the tree in balance. This article describes
the basic concepts of red-black trees and presents the pseudocode for the
insert and delete operations.


Binary Trees


Binary search trees have traditionally been a useful way to store dynamic
data. If the tree is balanced, you can do searches, insertions, and deletions
much faster than if the data were in a linked list. If the tree is large and
ugly, however, its performance may be only marginally better than that of a
linked list; see Figure 1.
The operations to balance a binary tree are relatively easy to implement, but
can consume much execution time. If there are a lot of insertions into and
deletions from the binary tree, the performance advantage over a linked list
may not amount to much. Recently, a number of tricks have appeared in academic
literature. These are intended to make sure a binary tree stays balanced
without the grief of standard balancing operations.
A red-black tree is a form of binary tree in which each node is assigned a
color: either red or black. By constraining the particular coloring of the
nodes during inserts and deletes, you can make sure the binary tree stays
mostly balanced. Search operations are completed faster than with linked
lists, and all of those extra tree pointers don't go to waste. This technique
was first invented by R. Bayer, and then studied and embellished by L.J.
Guibas and R. Sedgewick.
In this article, I'm assuming you know what a binary search tree is. I also
assume you know how to insert nodes into and delete nodes from a binary search
tree. For a refresher course on binary search trees, see section 6.2.2 of
Knuth's The Art of Computer Programming, vol 3: Searching and Sorting
Algorithms.


Rules for Red and Black


The data structure for a red-black tree is just like a normal binary search
tree except that each node has one extra bit of storage, which contains the
node's color.
There are five rules for building a red-black tree:
1. Every node must be either red or black.
2. The root node must be black.
3. Leaf nodes (null pointers) must be black.
4. Every red node must have a black parent.
5.Every direct path from a node to a leaf must contain the same number of
black nodes.
The clever thing about the system is that rules 4 and 5 ensure that the tree
remains somewhat balanced; see Figure 2. No path from a node to the root is
more than twice as long as any other. This isn't perfect balancing, but it's
close enough for most purposes. And the various insertion and deletion
operations that maintain rules 4 and 5 are much more efficient than those
required for perfect balancing.
The usual algorithms for insertion into and deletion from a binary tree can
break the red-black rules, so new methods are necessary. Examples 1 and 2
present the pseudocode for insertion into and deletion from a red-black tree.
Table 1: Insert operation, in pseudocode

 RedBlackInsert (T,x)
 {
 TreeInsert (T,x)
 color (x) <- Red
 while x != root (T) and color (p(x))==Red
 if p(x)==left(p(p(x)))
 y <- right(p(p(x)))
 if color(y)==Red
 color(p(x)) <- Black
 color(y) <- Black
 color(p(p(x))) <- Red
 x <- p(p(x))
 else
 if x==right(p(x))
 x <- p(x)
 RotateLeft(T,x)
 color(p(x)) <- Black

 color(p(p(x))) <- Red
 RotateRight(T, p(p(x)))
 else
 /* this is the same as the "then" clause,
 * with "right" and "left" interchanged */
 color(root(T)) <- Black
 }

Example 2: Delete operation, in pseudocode

 RedBlackDelete(T,z)
 {
 if left(z)==nil(T) or right(z)==nil(T)
 y <-- z
 else
 y <-- TreeSuccessor(z)
 if left(y) != nil(T)
 x <-- left(y)
 else
 x <-- right(y)
 p(x) <-- p(y)
 if p(y) == nil(T)
 root(T) <-- x
 else
 if y==left(p(y))
 left(p(x)) <-- x
 else
 right(p(y)) <-- x
 if y != z
 key(z) <-- key(y)
 /* if y has other fields, copy them too */
 if color(y)==Black
 then RBDeleteFixup(T,x)
 }
 RBDeleteFixup(T,x)
 {
 while x != root(T) and color(x)==Black
 if x==left(p(x))
 w <-- right(p(x))
 if color(w) <-- Black
 color(p(x)) <-- Red
 RotateLeft(T,p(x))
 w <-- right(p(x))
 if color(left(w)) == Black and color (right(w))==Black
 color(w) <-- Red
 x <-- parent(x)
 else
 if color(right(w)) == Black
 color(left(w)) <-- Black
 color(w) <-- Red
 RotateRight(T,w)
 w <-- right(p(x))
 color(w) <-- color(p(x))
 color(p(x)) <-- Black
 color (right(w)) <-- Black
 RotateLeft(T,p(x))
 x <-- root(T)

 else

 /* this is same as "then" clause,
 * except that right and left are exchanged */
 color(x) <--Black
 }

In the pseudocode, the variables node. llink and node.rlink are pointers to
the left and right children of the node; the variable node.parent is a pointer
to the parent of the node. (Technically, you can save pointers as you traverse
the tree and do without a parent pointer, but having a parent pointer makes
the pseudocode a lot easier.)
The pseudocode assumes you know how to insert a node into a binary tree, find
the successor of a node in a binary tree, and also how to rotate a binary tree
to the left and right at a node. If you don't, reread the appropriate section
in Knuth.


Insert and Delete


The algorithm for red-black insert (Example 1) can be stated concisely: First,
do a normal binary tree insert and force the color of the new node to be red;
then, mess with the ordering and the coloring of the nodes to preserve the
red-black rules of the tree.
When a red node is inserted into a red-black tree, the only rule that might be
violated is rule 4--and only if the new node's parent is also red. The while
loop in the algorithm has the job of moving the rule 4 violation up the tree
without violating rule 5. There are three cases to consider. (Actually there
are six, but three are symmetrical with each other, depending on whether the
new node's parent is a left child or a right child.) The principal cases are:
Case 1: If the new node's parent and uncle (parent's parent's other child) are
both red, then the color of the parent and the uncle are changed to black, and
the color of the grandparent is set to red. This moves the problem up two
levels, so the while loop repeats with the grandparent of the node.
Case 2: If the new node's parent is red and uncle is black, there are two
similar possibilities. If the new node is a left child of its parent, then the
color of the parent is changed to black, the color of the grandparent is
changed to red, and the tree is rotated right about the node's parent. This
solves everything and the algorithm terminates.
Case 3: If the new node is a right child of its parent, then a left rotation
about the parent is performed, and then everything proceeds as in case 2.
The algorithm for the delete operation (shown in Example 2) is more
complicated, at least some of the time. If the deleted node is red, all of the
rules still hold and everything is fine. If the node is black, there is a
mound of code designed to keep the tree somewhat balanced. As with red-black
insert, there are many different cases to consider and different things to do
in each case. It all works out for the best in the end, though.


Conclusion


Red-black trees are not suitable for every tree application. If the data is
mostly static, it might be quicker to let the nodes fall where they may and
not bother with balancing operations. For data that is completely static, such
as a dictionary that comes with a spelling checker, it is probably better to
completely optimize the tree beforehand, or use a heap or a perfect hasher.
For data where certain nodes are accessed far more frequently than most of the
others, a self-adjusting binary tree (see "Self-Adjusting Data Structures" by
Andrew Liao, DDJ, February 1990) is probably a better choice. But for data
that is uniformly accessed and frequently updated, a red-black tree is
probably the way to go.
For a detailed discussion of these algorithms, I enthusiastically recommend
the book Introduction to Algorithms, by Cormen, Leiserson, and Rivest. First
published in 1990, the book is already in its third printing and is probably
the best book on algorithms since Knuth. It belongs on every serious
programmer's shelf.


Bibliography


Bayer, R. "Symmetric Binary B-Trees: Data Structure and Maintenance
Algorithms." Acta Informatica 1, 1972.
Cormen, T.H., C.E. Leiserson, and R.L. Rivest. Introduction to Algorithms.
Cambridge, Mass.: MIT Press/McGraw-Hill, 1991.
Guibas, L.J. and R. Sedgewick. "A Diochromatic Framework for Balanced Trees."
Proceedings of the 19th Annual Symposium on Foundations of Computer Science
(1978).
Sedgewick, R. Algorithms. Reading, Mass.: Addison-Wesley, 1983.


_RED-BLACK TRESS_
by Bruce Schneier

[Example 1: Insert operation, in pseudocode]


RedBlackInsert(T,x)
{
 TreeInsert(T,x)
 color(x) <-- Red
 while x != root(T) and color(p(x))==Red
 if p(x)==left(p(p(x)))
 y <-- right(p(p(x)))
 if color(y)==Red
 color(p(x)) <-- Black
 color(y) <-- Black
 color(p(p(x))) <-- Red
 x <-- p(p(x))
 else
 if x==right(p(x))
 x <-- p(x)
 RotateLeft(T,x)

 color(p(x)) <-- Black
 color(p(p(x))) <-- Red
 RotateRight(T, p(p(x)))
 else
 /* this is the same as the "then" clause,
 * with "right" and "left" interchanged */
 color(root(T)) <-- Black
}



[Example 2: Delete operation, in pseudocode]

RedBlackDelete(T,z)
{
 if left(z)==nil(T) or right(z)==nil(T)
 y <-- z
 else
 y <-- TreeSuccessor(z)
 if left(y) != nil(T)
 x <-- left(y)
 else
 x <-- right(y)
 p(x) <-- p(y)
 if p(y) == nil(T)
 root(T) <-- x
 else
 if y==left(p(y))
 left(p(x)) <-- x
 else
 right(p(y)) <-- x
 if y != z
 key(z) <-- key(y)
 /* if y has other fields, copy them too */
 if color(y)==Black
 then RBDeleteFixup(T,x)
}
RBDeleteFixup(T,x)
{
 while x != root(T) and color(x)==Black
 if x==left(p(x))
 w <-- right(p(x))
 if color(w) <-- Black
 color(p(x)) <-- Red
 RotateLeft(T,p(x))
 w <-- right(p(x))
 if color(left(w)) == Black and color(right(w))==Black
 color(w) <-- Red
 x <-- parent(x)
 else
 if color(right(w)) == Black
 color(left(w)) <-- Black
 color(w) <-- Red
 RotateRight(T,w)
 w <-- right(p(x))
 color(w) <-- color(p(x))
 color(p(x)) <-- Black
 color(right(w)) <-- Black
 RotateLeft(T,p(x))

 x <-- root(T)

 else
 /* this is same as "then" clause,
 * except that right and left are exchanged */
 color(x) <--Black
}























































April, 1992
BLOCK TRUNCATION COMPRESSION


An efficient algorithm for image compression




Anton Kruger


Anton works for Truda Software, a Fortran and C consulting firm. He can be
reached through the DDJ office.


Data compression algorithms can be divided into two groups. The first,
error-free, or lossless compression algorithms, such as Huffman and LZW,
incurs no loss of information during the compress/decompress cycle. This is an
important consideration in many applications. For example, it is normally
disastrous to have even 1 bit wrong in an executable file. Error-free
compression is also imperative with the vast majority of text files. You don't
want a compression program changing the source code of a computer program or
introducing spelling mistakes into a word-processing document.
On the other hand, in some applications involving images, the error-free
requirement can be dropped. When dealing with photographs of people's faces,
natural scenes, and so on, some image degradation is often tolerable. This
second group of compression algorithms is known as "lossy," and is well-suited
for such applications. Examples of this group are the Fourier and cosine
transform compression algorithms, as well as Block Truncation Compression
(BTC), the focus of this article.


Block Truncation Compression


The idea behind BTC is to preserve local image statistics, typically over 4x4
blocks of the image, though larger blocks may also be used. One advantage of
the BTC algorithm is that it does not require global image statistics (such as
the image histogram). The implementation described in this article operates on
4x4 blocks, so that only four lines of the image are in memory at any given
time. The algorithm is fairly simple and quite fast. For example, it takes
about 7.25 seconds to compress a 256x256 8-bit (256-level) monochrome image,
and about 2 seconds to decompress the resulting file on a 20-MHz AT clone. The
most important property of Block Truncation Compression, however, is the
compression ratio. The algorithm described in this article compresses
monochrome image files down to 2 bits/pixel, so that the compressed images
require 25 percent of their former disk space.
How does this compare with lossless compression algorithms? Two image files,
"baboon" and "Valerie" were used as test files. Each contains 256x256 8-bit
monochrome images in pure binary format: There are no file headers or any
other information, and the images are stored line-after-line in the files.
Both the popular compression program PKZIP and a DOS version of the UNIX
compress utility failed to reduce the size of the two test images, and simply
stored the files. Another popular file format for images is CompuServe's GIF
format. A GIF-encoded version of "baboon" is about 20 percent larger, and a
GIF-encoded version of "Valerie" is about 1 percent smaller than the original
files. This is not too surprising, because the UNIX compress program and the
GIF encoder both employ an LZW algorithm. (Sample GIF files are available
eletronically; see page 3.) The files were also run through a Huffman
compression program. The size of both files stayed practically the same.
In this instance we have two files that cannot easily be compressed with
lossless compression algorithms, but can be compressed to 25 percent of their
original size with a lossy compression algorithm. While the specific example
may be extreme, it is generally true that much higher compression ratios are
attainable with lossy compression algorithms than with lossless algorithms.
(The price is a reduction in image quality.) Furthermore, as will hopefully
become clear, this compression ratio is guaranteed with BTC, and is
independent of the image properties. In fact, compression speed, decompression
speed, and compression ratio are all essentially independent of the image, in
contrast with error-free compression algorithms.
BTC is described in terms of Figure 1, adapted from Rosenfeld and Kak's
Digital Picture Processing. Figure 1(a) shows a 4X4 block of pixels, labeled
P[0], P[1] ,..., P[15]. The BTC algorithm operates by replacing the pixels in
the block with only one of two judiciously chosen values, which we shall call
a and b; see Figure 1(b). If a and b are made available to the decompressor,
then the block of pixels can be represented with 1s and Os, as in Figure 1(c).
This 16-bit bit plane can be coded into 2 bytes. Add to this 1 byte each for a
and b, giving a total of 4 bytes required for the 4X4 block. Without
compression, the block requires 16 bytes of storage. The savings is therefore
4/16 or 25 percent. This procedure is called "Block Truncation Compression"
because the digital representation of the pixels in a block is truncated to 1
bit. The block truncation code (code) as well as a and b are needed to recover
the block of pixels.
What values a and b are used to represent the pixels in the block? It was
stated earlier that the idea behind the BTC algorithm is to preserve local
image statistics. In particular, the first and second moments of a block of
pixels are to be preserved. The first moment is simply the mean of the pixels
in the block, as in Example 1(a), where n (=16) is the number of pixels in the
block. The second moment is the mean of the squares of the pixel values; see
Example 1(b). Let q be the number of pixels in the block greater or equal to
the mean. When only a and b are used to represent the pixels in the block, the
mean becomes that shown in Example 1(c), and the second moment becomes that
shown in Example 1(d).
Because we want to preserve the mean and the second moment, we require these
values to be the same as the original values. Thus, the equation in Example
1(a) is equated to that in 1(c) and the equation in 1(b) is equated to 1(d).
When the two sets of equations are solved for a and b, we find the values in
Example 1(e), where d and sigma are given by Example 1(f).
The BTC algorithm now works as follows. First, divide the image into blocks of
4X4 pixels. For each block, compute the mean and second moments from the
equations in Example 1(a) and Example 1(b). Second, construct a bit plane by
replacing each pixel that has a value less than the mean with a 0, and pixels
with values greater or equal to the mean with a 1. Third, pack the resulting
bit plane into 2 bytes, which becomes the block truncation code. Finally,
write a, b, and the 2-byte code to the output file.
Decompression is straightforward. First, read a, b, and a 2-byte block
truncation code from the compressed file. Second, examine each bit in the
2-byte code. If a bit is clear, then display (or write to an output file) the
value a. If a bit is set, display the value b. Repeat this until no more codes
are in the file. Of course, the decompressed file differs from the original,
which is why BTC is not an error-free compression algorithm. However, some
local image statistics (the first and second moments) over 4X4 blocks of
pixels are intact.
The following is a numerical example, taken from the references; refer to
Figure 1(d). With the pixel values shown, it follows that n=16, m[1]=98.75,
m[2]=18.39x10{3}, and q=7. From this and the equations in Example 1(e) and
Example 1(f), it follows that sigma=92.95, d=0.882, a=17, and b=204, where a
and b were rounded to the closest integer. If all pixels smaller than the mean
are set to a, and the rest are set to b, the result is Figure 1(b). The
bit-plane representation is given in Figure 1(c). One way of packing the bit
plane is shown in Figure 1(d), and the block truncation code for this block is
therefore 256* 136+227= 35043. Thus, for this block we write a=17 (1 byte),
b=204 (1 byte), and code=35043 (2 bytes) to the output file.
Two special cases must be considered. The first is when q= n= 16, which occurs
when all the pixel values are identical. In this instance one cannot compute
d--see Example 1(f)--because a divide-by-zero error will result. This is
handled by setting a= b= m[1]. The second case is when the computed values of
a or b fall outside the closed interval [0...255]. This is handled by clamping
values outside the interval so that they fall in the desired range.


Implementation


Listing One (page 104) shows code for compressing a 256x256 8-bit image with
the BTC algorithm, and Listing Two (page 104) contains code for decompressing
the compressed file. In this particular implementation, the image size, number
of gray levels, and block size are all fixed and hard-coded into the
algorithm. This was done for two reasons. The first is simplicity and clarity,
and the second is that it is quite common to compress a large number of
similar and related images on a regular basis. For example, a compression
algorithm can be employed to compress the output of a scanner used to enter
images into a personnel database. In any event, it should not be difficult to
alter the code to suit other image sizes.
Listing One starts with a driver routine that calls btc4x4, the routine that
does the actual compression. This routine first calls GetBlock, which fetches
the next 4X4 block of pixels from the image file, and next calls GetStats,
which computes the statistics a, b, and m[1] (the mean) for each block
according to the equations in Example 1(f). Next, each pixel in the block is
examined and the appropriate bit in variable code is set, after which a, b,
and code are written to the output file with a call to PutCode. The routine
GetStats is quite straightforward. Floating-point calculations are used for
simplicity, and rounding to the nearest integer is performed when required.
The routine GetBlock maintains an internal four-line buffer that normally
contains four lines of the image. A buffer pointer, bp, keeps track of the
current location in the buffer. On each call, 16 pixels are copied from the
four-line buffer, and bp is updated. After 64 calls, bp reaches the end of the
four-line buffer, and is filled again until the end of the file is reached.
Listing Two starts with a driver for the decompression routine, btd4x4. This
routine calls GetCode, which reads an a, b, and a BTC code from the compressed
file. Each bit in the code is then examined, and a or b is stored in the
variable block, depending on whether the bit is clear or not. It stands to
reason that the same byte ordering, as well as the numbering of the bits must
be followed by both the compression and decompression routines; see Figure
1(e). When block is full, PutBlock is called to write the block of pixels to
an output file via a four-line buffer, quite similar to GetBlock. In other
applications, PutBlock will display the block of pixels on a video display
unit, or send the pixels to a printer, and so on.
As a final note on the implementation, the programs should be quite portable,
though they were tested only on a PC under DOS. Two typedefs were used to
create byte (1 byte) and word (2 byte) data-type names. You may need to change
these to correspond to the proper variables on other systems.
Executable versions of both listings and the sample data files are available
electronically.


Results and Discussion


How well does BTC work? This depends to some extent on the image, but the
results are generally quite good for 4X4 block sizes. Figure 2(a) shows
"Valerie," a 256x256 8-bit monochrome image. (The display was a VGA-compatible
system with 6-bit [64-level] gray scale capabilities.) In the BTC-compressed
Figure 2(b), some block-iness is visible, especially around the baby's nose
and chin, as well as the baby's right cheek. While it may not be clear on the
photograph, there is also some false contouring on the baby's left cheek. This
is typical of BTC, where errors show up as ragged edges and false contours in
low-contrast areas. In any event, the image is acceptable, considering that it
requires four times less disk space than the original.
Actually, some of the artifacts are due to the fact that a VGA-compatible
system can only display 64 gray levels, and some block-iness and rough edges
are present in the original Valerie image. When an image is going to be
displayed on a VGA-compatible system, some additional savings are possible. A
VGA-compatible system is only capable of displaying 64 gray levels, so we can
save 2 bits each on a and b. When this is done, we need 1.6 bits/pixel, which
implies 20 percent of the original disk space. However, implementation is a
little bit more complex, because a, b, and the block truncation code no longer
make up an integral number of bytes.
In the current implementation, we save a and b, computed from Example 1(e),
along with the block truncation code to the output file. An alternative is to
save the mean (m[1]) and standard deviation (sigma), and then compute a and b
during decompression. This is a slightly slower approach.
Finally, what happens when the BTC algorithm is applied to an image that has
gone through the compress/decompress cycle once before--is there further
reduction in image quality? The answer is generally no, which can be confirmed
by working an example. Of course, if the image is compressed with another
lossy compression algorithm, more image degradation will normally result.


References


Delp, E.J. and O.R. Mitchell. "Image Compression Using Block Truncation
Coding." IEEE Transactions on Communications (September 1979).
Rosenfeld, A. and A.C. Kak. Digital Picture Processing, Second Edition. San
Diego, Calif.: Academic Press, 1982.































































April, 1992
FINDING STRING DISTANCES


A rose by any other name has a particular Levenshtein distance




Ray Valdes


Ray is a technical editor at DDJ. He can be reached at 411 Borel Avenue, San
Mateo, CA 94402.


This article discusses the theory and practice of sequence comparison. This
topic has lurked mostly unnoticed on the sidelines of computer science, but
has proved tremendously important in biotech research and may now have
widespread application in the areas of handwriting and speech recognition.


Background


Comparing two sequences of characters is ostensibly a rather narrow, simple
problem. Standard texts on algorithms virtually ignore this subject, possibly
because it seems relatively inconsequential. More comprehensive texts discuss
the problem of finding an exact match quickly, relying on the now classic
Boyer-Moore algorithm.
Finding multiple or partial matches between two strings is a subject that has
been well studied, for it falls within the traditional domain of formal
languages and parsing theory. The familiar grep utility (and other UNIX
utilities that parse regular expressions, such as awk, sed, and lex)
established finite state machines as the method of choice for efficient
processing of multiple matches that can be described by regular expressions.
Since the time of those classic programs, there's been little reason for the
mainstream to investigate further. Recently, however, more precise notions of
sequence comparison have become important. When two strings are different,
what metric can be used to describe the difference? What are the basic
operations that cause sequences to change? Given two different strings, can we
arrive at the optimal sequence of steps that will transform one string into
the other?
Actually, some well-known computer scientists (Aho, Hopcroft, and Ullman, for
instance) have published papers on what is known as "the string-to-string
correction problem," but these have not been widely cited.
All this may seem academic until you realize that sequence comparison has
served as one of the underpinnings of research in molecular biology over the
last decade, with benefits that are now coming into fruition, as any glance at
the science section of your local newspaper will show.


Comparing Genetic Sequences


As you may recall, all living creatures grow, develop, and reproduce under the
control of a stored program, the biochemical instructions of which are
recorded digitally on a thin polymer filament--DNA. In the case of humans,
this digital tape is about three feet long, broken up into the 26 chromosomes
that constitute the human genome, and wound up into a compact package within
the cell nucleus.
Because researchers have not yet transcribed and disassembled this program
completely (a feat likely to occur during this decade), molecular biologists
must presently conduct research by trying to match snippets of DNA (sometimes
haphazardly collected) against previously transcribed sequences in a database.
These sequence comparisons are done by computer, using symbolic
representations of biochemical instructions (basically, strings of text that
represent the base pairs out of which DNA is constructed). By finding related
sequences and measuring the distance between them, researchers gain a greater
understanding of how life was created and has evolved, and about the different
roles played by the many biochemical components of the human body.


Handwriting and Speech Recognition


Another area where it is important to know the extent to which two strings
differ is in the area of handwriting recognition and signature verification.
As Ron Avitzur shows in his article "Your Own Handprinting Recognition Engine"
(on page 32 of this issue), you can represent the salient features of a
handprinted character by a string that describes the direction and position of
the stylus as it marks strokes on a surface. Avitzur's algorithm is
straightforward, and uses very short strings ("NWESWS," for example) to
describe the trajectory of the stylus.
A more sophisticated algorithm might use longer string descriptions and
require a more precise measure of the difference and similarity between a
candidate string and the stored templates in the database.
Using such an algorithm to recognize regular handprinted text would be
overkill (not to mention slow). But in the case of verifying handwritten
signatures, which have longer strokes and more intricate shapes, the limits of
Avitzur's fast, straightforward approach--which relies on exact
matching--become apparent. You're less likely to get an exact match in the
case of long strokes with detailed representations. Also, a way of "warping"
the data stream becomes necessary, because people may arbitrarily extend or
shorten parts of strokes when writing in cursive script.
Other applications that require this kind of sequence comparison include voice
recognition and speech processing. Also, some researchers have used sequence
comparison to study bird songs. Other unusual applications include gas
chromatography, geological analysis, and fingerprint analysis.


The Levenshtein Distance


For each of these diverse applications, a number of sequence comparison
algorithms are in use. Most of these stem from the basic work of V.I.
Levenshtein, published in Russian in 1965. His advances in coding theory were
independently rediscovered and published in the early '70s by a dozen or so
researchers in a number of fields (mostly speech processing and molecular
biology). Computer scientists joined the fray rather late in the game, with a
JACM paper by Wagner and Fischer in 1974.
Levenshtein introduced two measures of difference between two strings. One is
the smallest number of substitutions, insertions, and deletions, required to
transform one string to another. The other measure is similar, except that
substitutions are not allowed. Both metrics have been called the "Levenshtein
distance." We reserve the term for the first, shorter distance. Example 1
shows an example, using the two strings "Axolotl" and "Axl Rose." The
Levenshtein distance between these strings is 5, consisting of 4 substitutions
and 1 insertion.
Example 1: The five steps from Axolotl to Axl Rose

 Axolotl
 Axolote (Substitute E for L)
 Axolose (Substitute S for T)
 AxoRose (Substitute R for L)
 Ax Rose (Substitute space for O)
 Axl Rose (Insert L)


The algorithm uses a problem-solving strategy known as "dynamic programming."
Dynamic programming does not mean programming a computer with a lot of loud,
animated motions. Rather, it is a term of unfortunate coinage that comes from
decision theory and operations research, and refers to one particular approach
to solving resource allocation problems. The approach consists of separating
an optimization problem into a series of smaller, interrelated steps (known as
"stages"), each of which represents a partial optimization, and each of which
includes one additional stage. Basically, you undertake stages one at a time
until you arrive at the solution, which is the final stage.
In general, the optimal value at each intermediate-stage decision depends on
conditions which are unknown until the final-stage optimization has been
reached. So you have to keep the results of the intermediate-stage
calculations until you reach the end, when the optimal solution can be
determined. These intermediate-stage results are kept in a decision table or
cost matrix.
Some well-known optimization problems solved with dynamic programming are the
shortest-route problem and the knapsack problem. Another example is Donald
Knuth's algorithm for composing streams of text into paragraphs, to arrive at
the optimal set of line breaks and hyphenations.
Finding the Levenshtein distance between two strings is similar to finding the
shortest route between two cities. Using dynamic programming, there are two
basic ways to solve these problems: forwards and backwards. Say you're trying
to find the shortest route between two cities connected by a network of
highways that traverse intermediate cities. You can either start at the first
city and work forwards, finding the minimum distance between that city and its
neighbors, and then moving on to the next stage. Or you can start at the
destination and work backwards, until you reach the starting point. Listing
One (page 107) shows the program LEVDIST.C, which implements the classic,
forward version of the algorithm.
The algorithm is straightforward, as a result of various constraints. One is
that string distances have the property of a distance in metric space, such as
found in high-school geometry. Remember the triangle inequality? This says
that the distance from point A to point C cannot be less than the distance
from A to B added to the distance from B to C. In the case of travel, it
cannot be shorter to go from San Francisco to Los Angeles by way of Memphis
(unless you use Federal Express). In the case of strings, the cheapest way to
change character A to character C is by replacing A with C, as opposed to
replacing it first with B and then with C. These statements may sound obvious,
but realize that these are ways of formally constraining a general problem
that can be quite complex into one that is more tractable. Other related
constraints are: Distance cannot be negative, the distance between two
identical strings is zero, and the distance between two strings is the same in
either direction (known as the "reflexivity property").


How the Algorithm Works


The top-level routine is main( ), which asks for two strings: A and B. It then
calls functions to initialize the matrix M, fills the cells of M with partial
distances, and backtracks from the final stage at the lower-right corner of M
to display the optimum sequence of string edits.
The procedure calculate_matrix( ) fills the cells in M with the minimum edit
distances at each stage of the transformation from string A to B. The
characters in string A lie along the vertical axis of M at column 0, while
those in string B lie on the horizontal axis at row 0. We start at cell
M[0,0], the upper-left corner of the matrix, and work southeast over to cell
M[m,n].
Moving by one unit from northwest to southeast means substituting the
character at A[i] with character B[j]. Moving one unit east means inserting
character B[j], while moving one unit down means deleting character A[i]. This
is expressed in Example 2 as a recurrence relation resulting in a weighted
sum. At each stage, represented by cell M[i,j], the minimum distance between
partial strings A[1..i] and B[1..j] is the three-way minimum of the
accumulated distance in predecessor stages (the cells immediately north, west,
and northwest of cell M[i,j]) plus the cost of moving from those neighboring
cells over to the current cell--the cost of deleting, inserting, or
substituting the corresponding characters.
At the north boundary, there are no northern predecessors. We can therefore
initialize each cell M[0][j] in that first row with the cost of inserting
character B[j]. Likewise, at the western boundary, we initialize each cell
M[i][0] with the cost of deleting character A[i]. This happens in the routine
initialize_matrix( ).
After calculate_matrix( ) has done its work, we know that the Levenshtein
distance is in cell M[m][n]. We can backtrack from this southeasternmost cell
to find the optimum sequence of insertions, deletions, and substitutions that
will result in string B from string A. This happens in the routine
backtrack_matrix( ). Note that while there may be multiple paths with the same
minimum number of edit operations, only one of these is traced and displayed.
The output of the program is shown in Example 3. The matrix shows the edit
distance for each cell, followed by the operation (INS, DEL, SUB, or EQU) that
brought us there from its northern and/or westerly neighbors. The program
allows you to change the weights assigned to each kind of edit operation. This
is useful for speech recognition, where you want to be able to "time-warp"
(stretch or compress) one signal with respect to another, and would therefore
want insertions and deletions to be much cheaper than substitutions. For
matching DNA and protein sequences, the reverse holds true; substitutions are
cheaper than insertions and deletions.
Example 3: Output of LEVDIST.C

 \ 0 1 2 3 4 5 6 7 8 
 \COL -- ---- ---- ---- ---- ---- ---- ---- ---- --
 ROW \ A x 1 R o s e 
 ---------------------------------------------------------------

 0 \ 0 DEL 1 INS 2 INS 3 INS 4 INS 5 INS 6 INS 7 INS 8 INS
 1 A 1 DEL 0 EQU 1 INS 2 INS 3 INS 4 INS 5 INS 6 INS 7 INS
 2 x 2 DEL 1 DEL 0 EQU 1 INS 2 INS 3 INS 4 INS 5 INS 6 INS
 3 o 3 DEL 2 DEL 1 DEL 1 SUB 2 SUB 3 SUB 3 EQU 4 INS 5 INS
 4 l 4 DEL 3 DEL 2 DEL 1 EQU 2 SUB 3 SUB 4 SUB 4 SUB 5 SUB
 5 o 5 DEL 4 DEL 3 DEL 2 DEL 2 SUB 3 SUB 3 EQU 4 INS 5 SUB
 6 t 6 DEL 5 DEL 4 DEL 3 DEL 3 SUB 3 SUB 4 SUB 4 SUB 5 SUB
 7 l 7 DEL 6 DEL 5 DEL 4 DEL 4 SUB 4 SUB 4 SUB 5 SUB 5 SUB

 Levenshtein distance is 5.

 Backtrace: Axolotl
 D(7,8)=5 SUB -> Axolote
 D(6,7)=4 SUB -> Axolose
 D(4,5)=3 SUB -> AxoRose
 D(3,4)=2 SUB -> Ax Rose
 D(2,3)=1 INS -> Axl Rose

The costs are stored in the optable[] array. A more general way of allowing
varied cost would be a two-dimensional weight matrix, containing all letters
of the alphabet and specifying the cost of substituting any one letter for any
other. Even this is rudimentary in the case of genetic sequences, where
researchers use more elaborate algorithms to dynamically vary operation
weights.


Complexity


The algorithm is of complexity O(nm), where n and m are the lengths of the two
strings. If the strings are equal in length, this of course means O(n{2}).
This complexity means it is probably unsuitable for recognizing handprinted
characters rapidly, but likely to work well in the case of signature
verification.
One way of speeding it up is known as the "Four Russians" algorithm, a rather
juvenile name from a 1980 paper by American researchers Masek and Paterson
that cites a 1970 publication by Arlazarov, Dinic, Kronrod, and Faradzev. The
algorithm works faster by splitting the computation up into many smaller
computations, which, if small enough and if the alphabet is finite, can be
precomputed and combined to derive the larger computation. In other words, the
matrix is partitioned into submatrices, and all possible computations on
submatrices are precomputed. This version is of complexity O(mn/min(m,log n)),
assuming that m<= n. Unfortunately, the performance gain only shows up in the
case of very long strings. (The example provided by Masek and Paterson does
not pay off until the string is longer than 262,419 characters.)
Molecular biologists have taken these general-case algorithms and modified
them for specific circumstances, such as the FASTA family of algorithms by
Lipman and Pearson. In these special cases, complexity has been lowered to
roughly O(m). But given databases now more than 30 million characters in
length, even better methods must be found. Current research focuses on
developing parallel algorithms that can take advantage of architectures such
as that from Thinking Machines Inc.
Eventually, we will be able to know, in megabytes of sub-microscopic detail,
exactly how one rose differs from a rose by another name.


References


Tyler, Elizabeth, Martha Horton, and Philip Krause. "A Review of Algorithms
for Molecular Sequence Comparison." Computers and Biomedical Research (vol.
24, 1991).
Sankoff, David and Joseph B. Kruskal (eds.). Time Warps, String Edits and
MacroMolecules: The Theory and Practice of Sequence Comparison. Reading,
Mass.: Addison-Wesley, 1983.
Miclet, Laurent. Structural Methods in Pattern Recognition. New York, N.Y.:
Springer-Verlag, 1986.

Levenshtein, V.I. Binary Codes Capable of Correcting Deletions, Insertions,
and Reversals. Doklady Akadaemii Nauk SSR 163(4):845-848, 1965. (Russian
reference from Sankoff and Kruskal.)


_FINDING STRING DISTANCES_
by Ray Valdes


[LISTING ONE]

/***********************************************************************
LEVDIST.C -- Computing the Levenshtein distance (string-to-string edit)
 by Ray Valdes, DDJ April 92
***********************************************************************/

#define TRUE 1
#define FALSE 0
#define private static
#define public /**/
typedef int bool;

private bool verbose_mode = TRUE;

typedef enum { MATCH, INS, DEL, SUB } opcode;

typedef struct
{ int cost;
 char* name;
 int delta_row;
 int delta_col;
} operation;

#define COST(op) (optable[(int)op].cost) // for convenience
#define OPSTR(op) (optable[(int)op].name) // for convenience

private operation optable[] = //costs defined on a per-op basis
{
 /*--cost, name, delta_row, delta_col---------------------------------*/
 { 0, "EQU", -1, -1}, /* a match or no-op backtracks to NorthWest */
 { 1, "INS", 0, -1}, /* insert op backtrack to the West */
 { 1, "DEL", -1, 0}, /* delete op backstracks to the North */
 { 1, "SUB", -1, -1}, /* substitution op backtracks to NorthWest */
};

typedef struct
{ int distance;
 opcode op;
} matrix_cell;

#define NUM_ROWS 64
#define NUM_COLS NUM_ROWS
#define SIZEOF_STRING NUM_ROWS

private char A [SIZEOF_STRING];
private char B [SIZEOF_STRING];
private matrix_cell M [NUM_ROWS] [NUM_COLS]; // this is The Matrix

#define DIST(i,j) (M [(i)][(j)].distance) // for convenience
/****************************************************************/


private void say_hello (void);
private bool get_strings (void);
private void initialize_matrix (void);
private void calculate_cell (int row,int col);
private void print_cell (int row,int col);
private void calculate_matrix (int num_rows,int num_cols);
private void backtrack_matrix (int num_rows,int num_cols);
/****************************************************************/
public int
main(int argc,char **argv)
{ say_hello();
 while(get_strings())
 { initialize_matrix();
 calculate_matrix(strlen(A),strlen(B));
 backtrack_matrix(strlen(A),strlen(B));
 }
 return 0;
}
/****************************************************************/
private void
say_hello(void)
{ if(verbose_mode) printf("\nLevenshtein distance, V1.0");
}
/****************************************************************/
private bool
get_strings(void)
{ char buffer[SIZEOF_STRING*3]; //arbitrarily big buffer

 printf("\nEnter first string > "); gets(buffer);
 if(buffer[0]=='\0') return FALSE;
 strcpy(A,buffer);
 printf("\nEnter second string > "); gets(buffer);
 if(buffer[0]=='\0') return FALSE;
 strcpy(B,buffer);
 return TRUE;
}
/****************************************************************/
private void
initialize_matrix(void)
{ int row,col;

 for(row=0,col=0; col<NUM_COLS; col++) // initialize the first row
 {
 M [row][col].distance = col;
 M [row][col].op = INS;
 }
 for(row=0,col=0; row<NUM_ROWS; row++) // initialize the first column
 {
 M [row][col].distance = row;
 M [row][col].op = DEL;
 }
}
/****************************************************************/
private void
calculate_cell(int row,int col)

{ int dNorthWest = DIST(row-1, col-1);
 int dWest = DIST(row , col-1);
 int dNorth = DIST(row-1, col );


 if(dWest < dNorth)
 { if(dWest < dNorthWest)
 { M [row][col].op = INS;
 M [row][col].distance = dWest + COST(INS);
 }
 else // dNorthWest <= dWest < dNorth
 { opcode op;
 op = ( A[row-1]==B[col-1] ) ? MATCH : SUB;
 M [row][col].op = op;
 M [row][col].distance = dNorthWest + COST(op);
 }
 }
 else // dNorth <= dWest
 { if(dNorth < dNorthWest)
 { M [row][col].op = DEL;
 M [row][col].distance = dNorth + COST(DEL);
 }
 else // dNorthWest <= dNorth <= dWest
 { opcode op;
 op = ( A[row-1]==B[col-1] ) ? MATCH : SUB;
 M [row][col].op = op;
 M [row][col].distance = dNorthWest + COST(op);
 }
 }
}
/****************************************************************/
private void
calculate_matrix(int num_rows,int num_cols)
{ int row,col;

 if(verbose_mode)
 { printf("\n\\\n \\COL ");
 for(col=0; col < num_cols+1; col++)
 printf("____%d____");
 printf("\nROW \\ ");
 for(col=1; col < num_cols+1; col++)
 printf(" %c ", B [col-1]);
 printf("\n 0 \\");
 for(row=0,col=0; col < num_cols+1; col++)
 print_cell(row,col);
 }
 for(row=1; row<num_rows+1; row++)
 { if(verbose_mode) printf("\n% 2d %c ",row, A [row-1]);
 print_cell(row,0);
 for(col=1; col<num_cols+1; col++)
 {
 calculate_cell(row,col);
 if(verbose_mode) print_cell(row,col);
 }
 }
}

/****************************************************************/
private void
print_cell(int row,int col)
{ printf(" %d %s ",DIST(row,col),OPSTR( M [row][col].op));
}
/****************************************************************/

private void
backtrack_matrix(int num_rows,int num_cols)
{ int dx,dy;
 int i,j;
 int row = num_rows;
 int col = num_cols;

 printf("\n\nLevenshtein distance is %d.\n",DIST(row,col));
 printf("\nBacktrace: %s",A);

 while(row>0 col>0)
 { if( ( M [row][col].op != MATCH) && verbose_mode)
 { printf("\nD(%d,%d)=%d ", row,col,DIST(row,col));
 printf("%s --> ",OPSTR( M [row][col].op));
 for(i=1;i<row-(M[row][col].op==DEL?1:0);i++)
 printf("%c", A[i-1]);
 /*printf("_");*/
 for(j=col-(M[row][col].op==INS?1:0);j<num_cols+1;j++)
 printf("%c", B[j-1]);
 }
 dy = optable[(int)( M [row][col].op)].delta_row;
 dx = optable[(int)( M [row][col].op)].delta_col;
 row += dy;
 col += dx;
 }
}




































April, 1992
PORTING UNIX TO THE 386 DEVICE DRIVERS


Getting into and out of interrupt routines




William Frederick Jolitz and Lynne Greer Jolitz


Bill was the principal developer of 2.8 and 2.9BSD and was the chief architect
of National Semiconductor's GENIX project, the first virtual-memory,
microprocessor-based UNIX system. Prior to establishing TeleMuse, a market
research firm, Lynne was vice president of marketing at Symmetric Computer
Systems. They conduct seminars on BSD, ISDN, and TCP/IP. Send e-mail questions
or comments to lynne@berkeley.edu. (c) 1992 TeleMuse.


For the last couple of months, we've been examining device drivers and 386BSD.
We pick up where we left off last month, focusing on interrupt routines.
In Listing One (page 108), we use a series of macros to implement the
interrupt entry stubs, using the C preprocessor. Typically, for each system
compiled, a configuration program creates this file of interrupt stub routines
(see Listing Two, page 108), which handles interrupt entry (at Vwd0 in Listing
Two) and processes this file to the point where it can be passed off to the
driver interrupt-handler function (in this case, wdintr). To keep the macros
succinct, they are written out of other macros. To keep the code short, it is
implemented inline with a minimum of branches, although it could have been
written as a series of subroutines.
The macros ORPL( ) and SETPL( ) are nearly identical to the internal macros
__ORPL__( ) and __SETPL__( ), used to implement the spls. They are used for
the same reasons, but in assembler, not C. Macro INTR( ) builds an interrupt
entry stub per invocation. The address of these stubs is obtained from the IDT
table entry that the 386 fetches when processing an interrupt. The first
instructions executed when handling an interrupt are at the beginning of this
macro. We begin by crafting a trap frame suitable for a variety of purposes
that may result from this incoming interrupt, and saving the processor state
so we may restore it when we return from the interrupt later.
After saving the state in a trap frame, we send a command to the ICUs to
dismiss the "in service" bit (turned on when the processor began processing
the interrupt to lockout lower-priority requests). This indicates to both ICUs
that the interrupt is stacked and that we will allow any subsequent interrupts
that are unmasked, regardless of priority. This is done as soon as possible to
forestall interrupts that might otherwise be missed.
Next, the segment registers the kernel will use are saved and loaded with
known values. Alternatively, the interrupt could have been received when the
processor was in user mode. In that case, both the ES and DS segment selectors
would reflect the user process, not the kernel. We need worry only about the
ES and DS registers because the kernel will not use FS and GS. The instruction
selector and stack selector are handled automatically by the interrupt
mechanism of the 386 and are already stacked.
We then OR in our interrupt vector's group mask to disable any
companion-device interrupts (as well as the device). We must save the old mask
value on the stack, so that it is put back the way we found it. Finally, we
stack the unit number of this interrupt vector and reenable interrupts to
allow unmasked devices to interrupt while we are processing this interrupt
(nesting). We can now call a C device interrupt-handler function.
At this point, an interrupt frame is present on the stack, the kernel's
segment registers are set to be consistent with what the kernel's program
expects, all competing interrupts are blocked and unrelated interrupts are
unblocked and active. The interrupt vector stub (see Listing Two) then calls
the C handler (in our earlier example wdintr) to process the device's request
and return. At this point, all device interrupt routines are effectively
called with:
wdintr(frame) struct intr_frame frame;


Getting Out of Interrupts


You'd think that leaving an interrupt would be as easy or easier than getting
in, but this is not the case. We still have some housekeeping chores to do on
the way out that may be unrelated to the interrupt we were servicing. This
clouds the picture a little, as seen in Listing Three (page 108). This
housekeeping is basically a performance enhancement that moves the processing
of certain kernel services running at interrupt level (and blocking the
processor from other interrupts) to run just ahead of the user process.
In anticipation of these chores, we've given both interrupts and traps similar
stack frames. By popping off two words, we are back into a trap frame and
ready to look for windows to wash. We restore the previous mask into the ICUs
and check whether we are returning to a point where interrupts were masked
before (a nested interrupt or critical section, for example). If so, we branch
to do a simple return by unwinding our trap frame. We can do this because we
are certain that the code we are returning to will eventually return to an
unmasked level, and then the chores will be done.
We first check whether any network protocol software interrupts must be
processed. Incoming packets received by drivers are enqueued at the
device-interrupt level, and a software interrupt is requested. On some
machines (DEC VAX, for instance), the software-interrupt mechanism is
implemented in hardware. (One sets a bit on a special register, which causes
an interrupt when the processor returns to an unmasked level.)
For the 386, we emulate this feature in software. If we find a request, we
lockout other competing interrupt returns by setting the PROCESSING bit and
successively call all protocol input routines that have been requested to be
called, clearing each request as we go.
In a similar fashion, we call the software interrupt associated with the
rescheduling clock. Earlier versions of UNIX did most scheduling calculations
and watchdog routines at high levels. (Even worse, because they didn't use
queues of processes, linear searches of all processes were done, and this was
expensive--cf. Version 6.) This method did not enhance UNIX's reputation in
the real-time department. One significant change in the BSD kernel was to move
as much of the interrupt-level processing out to interruptible points and
minimize the impact of overall system overhead. Clock processing turns out to
account for much overhead in this regard (so much so, that 4.2BSD's 100-Hz
clocking was cut back to 60 Hz), so this change was well justified.
Our final chore is to check whether we are returning to a user process, as
opposed to the kernel. The contents of the code selector are checked to see if
the selector will be executing user or kernel code. We're a bit sloppy here,
because we must check only the 2 low-order bits of the selector to find out if
it belongs to the USER ring (3). This suffices for now, but we may have to
revisit this code when 386BSD is enhanced to support multisegmented
programming.
If we are headed back to a user program, we must quickly check to see if the
kernel has marked this process to be rescheduled. If so, we take the
rescheduling trap we prepared for at the start of the INTR( ) macro. As
mentioned in the 386BSD articles on multiprogramming (see DDJ, September and
October 1991), we have few opportunities to allow rescheduling. We could
induce a case in which we're safe from deadlocks by rescheduling when we are
returning to the user program.


UNIX as Interface--Is it Adequate?


Although every edition of UNIX has contained innovative new areas of design,
many device interfaces have not been altered to take advantage of this new
technology. As such, the mechanism for communicating with the device driver is
virtually identical from micro- to supercomputer, from first to last edition.
Few people are willing to take on the daunting (or insane) challenge of
breaking and fixing every single driver created! Because the best drivers
already exploit the current interface to great advantage, changing the
interface may seem a bigger problem than it is worth.
The rewards coupled with this challenge are less obvious, so a careful look at
emerging technology considerations is in order. A problem this big must be
carefully outlined:
Multithreaded I/O Devices. A near-term nuisance, commonly noticed with the use
of disk arrays in particular, is the difficulty in adapting the
characteristics of multithreaded (that is, more than one concurrent stream of
I/O operations) devices to the flat, strictly synchronous I/O model implied by
the current UNIX device-driver model. This does not suggest an arbitrary jump
to an asynchronous model. History has shown the simplicity and elegance of the
synchronous model to be valuable allies in the battle against operating-system
kernel "bloat." However, here we need coordinated, synchronous scheduling of
I/O that can be used by the higher levels in the operating system to arrange
the "time domain" of multithreaded operation to best advantage.
Streamer Devices. Almost every UNIX system has a magical kludge to support
streaming devices such as cartridge tape, DAT, or 8MM video tape. These
devices require extensive buffering to maintain "streaming operation"--in
which the device can synchronously accept data to match its physical transfer
"commitment". If this commitment is not met, the device must stop, rewind, get
up to speed again, and re-do the transfer when the data is finally present.
This causes the tape drive to "wheeze" back and forth and the data transfer to
slow to a fraction of its top data rate; this turns system backups into
interminable waits. Worse yet, the nature of these devices is that you only
learn of a tape write error long after the driver has reported the block
"written." This is because our UNIX device-driver interface implies a 1:1
relationship between "raw" transfers and their error status, which means that
only one can be outstanding at a given time. Therefore, most of the kludges
break the rules to attempt to keep the device double-buffered, all to varying
degrees of success. The amount of RAM and disk storage is going up radically
with time (some say soon to about 1 gigabyte per user), so high-speed backup
is not just desirable; it's necessary for survival! We need a class of device
with so-called "tear-away" I/O, where a portion of a process's address space
is passed off to the device driver and the backup process. This way, a queue
of regions of memory can be operated on by the driver in strict, synchronous
order, with the depth of the queue appropriate to the demands of the device.
Redundancy of Common Code. Even with the previously mentioned efforts to
collect driver common code into what is effectively a support library of
routines, many of the system's drivers still have considerable "common code"
left inside. You begin to wonder if this method is adequate. An alternative
approach would be to express the routines as methods of a driver "object" in
an object-oriented language, such as C++. Thus, the commonality could be dealt
with by having classes of drivers, and the state of an active driver/device
could be expressed as an instance variable created on demand. In other words,
it may be time to apply object-oriented languages to the kernel itself.
Information Caching. The driver interface also suffers in terms of cached
information possibly shared by other processors (say, in a multiprocessor or a
dual-ported disk or via a network). To maintain cache consistency,
invalidation mechanisms are necessary, perhaps even at the driver level. Also,
cache mechanisms above the driver level need additional information about the
status of a drive's queued transfers to determine if the drive is already
saturated with requests. If so, the cache should decline requests on that
drive until it finishes that to which it has already committed. Drivers should
be viewed as presenting an employable set of capabilities that the file-system
cache can exploit.
The device-driver interface should provide a real-time schedule for I/O to
conclude in a deterministic period, and a mechanism for ordered writes in a
strictly ordered fashion. (With almost every UNIX system today, one has to
"sync" multiple times to ensure that sandbagged disk I/O makes it out before
shutting down the system.) The file-system cache is also handicapped on writes
because it must keep the disk consistent--so a single write of an additional
block of data in a file causes as many as three or four separate writes to
update related data structures on the disk to keep them consistent. By adding
these missing mechanisms, the system's cache could maintain stability without
losing performance.
Device Conflict. Because autoconfiguration is usually only done at bootload
time, devices that conflict (and are hence "lost") cannot be used, and devices
cannot be added after the system is up and running. This was acceptable when
BSD UNIX ran on VAX mainframes for months at a time between reconfigurations,
but it's irritating to constantly reboot a PC if, say, a device was unplugged
or turned off when the system last booted. Configuration and resource
allocation (interrupts, DMA channels) need to be done uniformly, at will, and
in an automatic fashion.
File System to Device Disparity. The semantics of many device drivers fit the
"file" metaphor, but UNIX drivers don't have the complete semantics of files
at all. Although this is no great loss in many cases (for example, unit record
devices, such as serial ports and printers), for others (notably mass storage
devices), file attributes such as size, write-protection, modes, and even
device-naming conventions visible to the programmer are not visible to the
device driver, causing a loss of potential functionality. We can repair this
by developing a special device (spec) file system, where device drivers are
attached. In this case, the primitive operators of the Virtual Filesystem of
the BSD kernel correspond to all of the user-level process file operations to
the device driver itself. (For example, we export the virtual file abstraction
down to the device driver.) This is currently implemented in BSD. At the last
moment, however, it converts to the ancient UNIX device-driver conventions to
avoid dealing with the problem (for now). (If this sounds like shades of
"Plan-9," who are we to say differently?...)
MACH devotees have decreed that device drivers in user processes are a must,
in order to allow them to be dynamically loaded. But by choosing our file
abstraction appropriately, we can gain the facility for loadable drivers
without all the smoke and mirrors. If access through to the drivers is uniform
through the central concept of a virtual file system, then the interface can
be "remoted," much as NFS provides for remote files. (This is just an analogy,
not a suggested mechanism.) In short, the choice of a better device-driver
interface can offer substantial advantages without necessarily requiring large
amounts of radical code that would contribute to the bloat factor of the
kernel.


_PORTING UNIX TO THE 386_
by William Frederick Jolitz and Lynne Greer Jolitz


[LISTING ONE]


/* [Excerpted from /sys/i386/isa/icu.h] */
 ...
/* Macro's for interrupt level priority masks (used in assembly code) */
/* mask additional interrupts */
#define ORPL(m) \
 cli ; /* disable interrupts */ \
 movw m , %dx ; /* get the mask */ \
 inb $ IO_ICU1+1, %al ; /* next, get low order mask */ \
 xchgb %dl, %al ; /* switch the old with the new */ \
 orb %dl, %al ; /* finally, or it in! */ \
 outb %al, $ IO_ICU1+1 ; \
 inb $0x84, %al ; \
 inb $ IO_ICU2+1, %al ; /* next, get high order mask */ \
 xchgb %dh, %al ; /* switch the old with the new */ \
 orb %dh, %al ; /* finally, or it in! */ \
 outb %al, $ IO_ICU2+1 ; \
 inb $0x84, %al ; /* flush write buffer, delay bus cycle */ \
 movzwl %dx, %eax ; /* return old priority */ \
 sti ; /* enable interrupts */
/* force interrupt mask */
#define SETPL(v) \
 cli ; /* disable interrupts */ \
 movw v , %dx ; \
 inb $ IO_ICU1+1, %al ; /* next, get low order mask */ \
 xchgb %dl, %al ; /* switch the old with the new */ \
 outb %al, $ IO_ICU1+1 ; \
 inb $0x84, %al ; \
 inb $ IO_ICU2+1, %al ; /* next, get high order mask */ \
 xchgb %dh, %al ; /* switch the copy with the new */ \
 outb %al, $ IO_ICU2+1 ; \
 inb $0x84, %al ; /* flush write buffer, delay bus cycle */ \
 movzwl %dx, %eax ; /* return old priority */ \
 sti ; /* enable interrupts */
/* Mask a group of interrupts atomically - interrupt entry */
#define INTR(unit,mask,offst) \
 pushl $0 ; /* first, build a trap frame for ... */ \
 pushl $ T_ASTFLT ; /* ... possible rescheduling that may occur */ \
 pushal ; \
 nop ; \
 movb $0x20, %al ; /* next, as soon as possible send EOI ... */ \
 outb %al, $ IO_ICU1 ; /* ...so in service bit may be cleared ...*/ \
 inb $0x84, %al ; /* ... ASAP */ \
 movb $0x20, %al ; /* likewise, the other one as well */ \
 outb %al,$ IO_ICU2 ; \
 inb $0x84,%al ; \
 pushl %ds ; /* save our data and extra segments ... */ \
 pushl %es ; \
 movw $0x10, %ax ; /* ... and reload with kernel's own */ \
 movw %ax, %ds ; \
 movw %ax, %es ; \
 incl _cnt+V_INTR ; /* tally interrupts */ \
 incl _isa_intr + offst * 4 ; \
 movw mask , %dx ; /* assert group mask */ \

 inb $ IO_ICU1+1, %al ; /* next, get low order mask */ \
 xchgb %dl, %al ; /* switch the old with the new */ \
 orb %dl, %al ; /* finally, or it in! */ \
 outb %al, $ IO_ICU1+1 ; \

 inb $0x84,%al ; \
 inb $ IO_ICU2+1, %al ; /* next, get high order mask */ \
 xchgb %dh, %al ; /* switch the old with the new */ \
 orb %dh, %al ; /* finally, or it in! */ \
 outb %al, $ IO_ICU2+1 ; \
 inb $0x84, %al ; \
 pushl %edx ; /* save old mask for when we return */ \
 pushl $ unit ; /* finish off interrupt frame with unit # */ \
 sti ; /* and allow other unmasked interrupts */
...






[LISTING TWO]

/* AT/386 -- Interrupt vector routines -- Generated by config program */

#include "machine/isa/isa.h"
#include "machine/isa/icu.h"

#define VEC(name) .align 4; .globl _V/**/name; _V/**/name:
 .globl _hardclock
VEC(clk)
 INTR(0, ___highmask__, 0)
 call _hardclock
 INTREXIT1

 .globl _wdintr, _wd0mask
 .data
_wd0mask: .long 0
 .text
VEC(wd0)
 INTR(0, ___biomask__, 1)
 call _wdintr
 INTREXIT2

 .globl _fdintr, _fd0mask
 .data
_fd0mask: .long 0
 .text
VEC(fd0)
 INTR(0, ___biomask__, 2)
 call _fdintr
 INTREXIT1

 .globl _cnrint, _cn0mask
 .data
_cn0mask: .long 0
 .text
VEC(cn0)
 INTR(0, ___ttymask__, 3)
 call _cnrint
 INTREXIT1

 .globl _npxintr, _npx0mask
 .data

_npx0mask: .long 0
 .text
VEC(npx0)
 INTR(0, _npx0mask, 4)
 call _npxintr
 INTREXIT2

 .globl _comintr, _com0mask
 .data
_com0mask: .long 0
 .text
VEC(com0)
 INTR(0, ___ttymask__, 5)
 call _comintr
 INTREXIT1

 .globl _weintr, _we0mask
 .data
_we0mask: .long 0
 .text
VEC(we0)
 INTR(0, ___netmask__, 6)
 call _weintr
 INTREXIT1

 .globl _lptintr, _lpt0mask
 .data
_lpt0mask: .long 0
 .text
VEC(lpt0)
 INTR(0, ___ttymask__, 7)
 call _lptintr
 INTREXIT1






[LISTING THREE]

/* [Excerpted from /sys/i386/isa/icu.h] */
 ...
/* First eight interrupts (ICU1) */
#define INTREXIT1 \
 jmp doreti
/* Second eight interrupts (ICU2) */
#define INTREXIT2 \
 jmp doreti
 ...
/* [Excerpted from /sys/i386/isa/icu.s] */
 ...
/* Handle return from interrupt after device handler finishes */
doreti:
 /* move to a trap frame */
 cli /* interrupts off while we work ... */
 popl %ebx /* remove unit number */
 popl %eax /* get previous priority mask */


 /* restore previous mask */
 movw %ax, %cx
 outb %al, $ IO_ICU1+1
 inb $0x84, %al
 movb %ah, %al
 outb %al, $ IO_ICU2+1
 inb $0x84, %al

 /* are we at interrupt level / nested interrupt already ? */
 cmpw ___nonemask__, %cx
 jne 3f

 /* do we need to process an network software interrupt ? */
 cmpl $0, _netisr
 je 2f
 btsl $ NETISR_PROCESS, _netisr
 jb 2f

#include "../net/netisr.h"
#define DOCALL(n, s, c) ; \
 .globl c ; \
 btrl $ s , n ; \
 jnb 1f ; \
 call c ; \
1:
 /* process a network software interrupt */
 sti
 DOCALL(_netisr, NETISR_RAW, _rawintr)
#ifdef INET
 DOCALL(_netisr, NETISR_IP, _ipintr)
#endif
#ifdef IMP
 DOCALL(_netisr, NETISR_IMP, _impintr)
#endif
#ifdef NS
 DOCALL(_netisr, NETISR_NS, _nsintr)
#endif
 btrl $ NETISR_PROCESS, _netisr
2:
 /* do we need to process a software clock "interrupt" */
 cli
 btrl $ SCLK_NEED, ___softclock__
 jnb 1f
 btsl $ SCLK_PROCESS, ___softclock__
 jb 1f

 /* process a software clock "interrupt" */
 sti
 pushl ___nonemask__ /* to an interrupt frame again */
 pushl %ebx
 call _softclock
 popl %eax /* back to trap frame for possible AST */
 popl %eax
 btrl $ SCLK_PROCESS, ___softclock__
1:
 /* see if we need to process an AST (rescheduling) fault */
 cmpw $0x1f, tCS*4(%esp) /* were we executing from a user mode ... */
 jne 3f /* ... code selector? */
 DOCALL(___ast__, AST_NEED, _trap);

3:
 /* restore the state and return */
 popl %es
 popl %ds
 popal
 nop
 addl $8, %esp
 iret
 ...





















































April, 1992
MULTIUSER DOS FOR CONTROL SYSTEMS PART I


A multitasking -- yet compatible -- alternative to DOS




Richard Kryszak


Richard is a senior engineer at Rockwell International Graphic Systems. He has
a BSEE from the University of Illinois in Chicago and a MSCS from Illinois
Institute of Technology. He can be contacted at 9616 South 49th Avenue, Oak
Lawn, IL 60453.


The introduction of several versions of industrialized PC computers has made
it possible to easily integrate desktop technology into the factory
environment. While the PC is an excellent platform for program development,
conventional DOS cannot provide the multitasking features required for
development of control systems. Several alternative multitasking operating
systems are available for PCs, including MTOS, QNX, and many flavors of UNIX,
but none are directly compatible with DOS. Multiuser DOS (DRMDOS) from Digital
Research, however, is a multitasking operating system that attempts to provide
complete compatibility with DOS.
DRMDOS is a good platform for a control system for a number of reasons.
Factory automation systems, for instance, are beginning to make increasing use
of the power of graphics to convey information to the users. Using user
interface libraries allows graphical front ends to be developed in a fraction
of the time this work took previously. Therefore, DOS compatibility is
important. Additionally, the multitasking features can be used to divide tasks
into easily manageable chunks. Finally, with DRMDOS many programmers who don't
have a great deal of experience with multitasking operating systems can
develop sections of code that either don't require multitasking features or
have the multitasking features disguised as function calls to mask complexity.
A project I recently worked on used DRMDOS as the operating system. I worked
with a couple of other programmers who didn't have an extensive background in
multitasking operating systems but did have DOS experience. During the
project, I developed a library of functions to allow the use of DRMDOS's
multitasking and interprocess communication features. This library can be used
to turn normal DOS programs into efficient DRMDOS programs.
Even though DRMDOS is a preemptive multitasking operating system, the time
slice allotted to each process is quite long. The system I used had a slice of
16.67 ms. A normal DOS-type program will hold on to the CPU for the full time
slice. For running word processors and spreadsheets this is fine, but a
control system can't tolerate this. We therefore needed a method of cutting
the time a process holds on to the CPU to the minimum required to complete its
task. The answer lies in cooperative multitasking, in which a process hangs on
to the processor only long enough to perform one program loop or a specific
set of functions. It then gives up the CPU to allow another process to run.
A programmer who works only on a single-tasking operating system often has
difficulty breaking up a program into processes to be run on a multitasking
operating system. The system I worked on used a shared-memory database to
allow passing of information. In this case, any operation linked to another
operation only by the database can be a separate process. In a control system,
examples of separate processes are the operator interface, the I/O scanner,
and serial communication routines.


Interfacing with DRMDOS


Before starting on this control system, I'd never ventured beyond the
capabilities standard library function calls provide. To make use of DRMDOS's
multitasking and interprocess communications features, it's necessary to use
the int86 and int86x calls. These are the software interrupts that provide a
low-level interface to DRMDOS as well as direct access to the BIOS functions;
see Figure 1.
Figure 1: Interrupts that provide a low-level interface to DR Multiuser DOS

 int86(int_number, union REGS inregs,
 union REGS outregs);

 int86x(int_number, union REGS inregs,
 union REGS outregs,
 struct SREGS seg_regs);

To access DRMDOS, the interrupt number (int_number) must be set to 224. The
final part of the function calls may look a little strange. To use the int86
calls, a pair of unions must be declared. These unions are used to pass
register information to and from the int86 functions. The union allows the
corresponding word and byte representation of the CPU registers to be
overlayed, as in Example 1(a). The inregs and outregs unions consist of the
two structures in the example. The declarations for the union and the two
structures are contained in the file dos.h, which is part of Microsoft QuickC;
see Example 1(b).
Example 1: (a) Declaring paired unions; (b) declarations for the union and the
two structures.

 (a)
 union REGS { struct WORDREGS x;
 struct BYTEREGS h;
 };

 (b)
 struct WORDREGS { unsigned int ax;
 unsigned int bx;
 unsigned int cx;
 unsigned int dx;
 unsigned int si;
 unsigned int di;
 unsigned int cflag;
 };

 struct BYTEREGS { unsigned char al, ah;
 unsigned char bl, bh;
 unsigned char cl, ch;
 unsigned char dl, dh;
 };


Different representations of the registers are necessary because some function
calls use the 8-bit registers (al, ah) and others use the 16-bit versions
(ax). The (l) and (h) represent the low and high bytes of the register. For
example, (al) is the low byte of the (a) register.
The difference between the int86 function calls is that the int86x call also
passes the segment register information, some DRMDOS system calls require the
segment registers to be passed. The segment register structure is also defined
in the dos.h header file and is shown in Example 2.
Example 2: Segment register structure

 struct SREGS { unsigned int es;
 unsigned int cs;
 unsigned int ss;
 unsigned int ds;
 };

Before either int86 function call is made, proper register values must be
written to the inregs and segregs unions. The results of the function call
will be written to the outregs union. These indicate the success or failure of
the function call and are used to return other information, such as a pointer
to a memory block.


DRMDOS Interface Library


DRMDOS provides a full range of services, from file operations to time and
date functions. This section will show how to use the multitasking, shared
memory, and queue system features of DRMDOS, which I used for an industrial
control system. In the following discussion, the interrupt-number parameter is
set to a defined value CCPM or 224 decimal. The library shown here does not
include rigorous error checking. This would have to be added for completeness.
In most cases, a returned value of 0 indicates that no errors occurred. The
code for the interface library is contained in Listing One (page 110).


Multitasking Features


DRMDOS supports up to eight virtual consoles that are accessed using Alt-1
through Alt-8. This feature is useful for running up to eight programs written
to run under DOS. However, this can be limiting in the case of control
systems, which often have many more processes. These processes are frequently
much smaller than a typical DOS program because they are designed to implement
a specific part of a control system. If a program (or process) does not
perform screen I/O, it can be run detached from the console, as a background
process, by using the c_detach( ) function call.
The c_detach( ) call first declares the register unions and then assigns the
proper value to the (cl) register. The int86 call is then made. The value in
ax register is returned to indicate the status of the int86 call. As a result
of using this call, the process becomes a background process.
Often, processes do not perform any operation that will cause them to give up
the CPU. One such operation is a queue read or write. When such processes
finish performing a task or program loop, they should allow the operating
system scheduler to let another process run. This is done by using the
p_dispatch( ) function call and causes the process to give up the CPU. The
process is immediately put at the end of the list of processes ready to run.
The p_dispatch call first declares the register unions and then assigns the
proper value to the (cl) register. The int86 call is then made, as a result of
which the process is preempted. The process is then put back into the ready
list to wait for its next chance to run.
In a control system, there are often functions that must be run at regular
intervals. In the case of an industrial control system, digital I/O routines
must be run at a definite interval so that important transitions are not
missed. The p_priority( ) and p_delay( ) function calls accomplish this. The
p_priority( ) call is used to change the priority of the function. In the
DRMDOS operating system, priorities range from 0 to 255, 0 being the highest.
Normal user programs in DRMDOS run at a priority of 200 decimal, the
recommended range being 200 to 254; a priority of 255 is an idle process.
Using priorities outside this range might affect operation of the operating
system itself by causing necessary system processes to be delayed, and is
therefore not recommended. Priority level 199 is listed as undefined, as are
several other areas. The only problem with the I/O process being at a higher
priority is that DRMDOS is a priority-based operating system. If nothing
blocks it from running, it will hog the CPU. The p_delay( ) function call
solves this problem: It causes the process to give up the CPU and be put onto
a delayed process list. This essentially puts the process to sleep for a
number of clock ticks. Once the process comes out of the delay mode, it will
be the highest-priority user process, and will be run when the current process
gives up the CPU or is forced to give it up due to a clock-tick interrupt.
The p_priority( ) call accepts a priority level passed in the function call.
As stated earlier, levels from 0 to 255 are possible but not recommended. The
p_priority call first declares the register unions and then loads the proper
value into the (cl) register. The requested priority level is put into the
(dl) register. Finally the int86 call is made. The process priority is
immediately adjusted to the new value.
The p_delay( ) call accepts a requested delay time passed in the function
call. The delay value can be in the range of 0 to 65,535 clock ticks. In my
DRMDOS system, for example, the clock tick is 16.67 ms long, so to delay a
process for about 50 ms, the number of clock ticks would be set to 3. The
process declares the register unions and then loads the proper value to the
(cl) register. The requested delay count is loaded into the (dx) register.
Finally, the int86, call is made. The process is preempted and placed onto the
waiting list. When the proper number of clock ticks has elapsed, the process
will be returned to the ready list. A process is delayed at least the
requested number of clock ticks.


System Memory


In many cases it is desirable to allow two or more processes to share a block
of memory. I needed to support a memory-resident database shared among several
processes. This database was to be accessed using a pointer to the memory and
an offset to index into the database. To do this, you might simply declare an
array for the database and pass a pointer using the queues, which will be
covered below; unfortunately, this will not work. DRMDOS uses a banked memory
system in which each program is overlayed on a so-called Transient Program
Area (TPA). Thus the physical location of the database is not in the normal
640K range of DOS but somewhere in memory above 1 Mbyte. DRMDOS provides a way
around this problem. The OS can be set to contain up to 9999 bytes of system
memory. Because this is part of the operating system and is locked in the 640K
range of memory, we can use a base pointer to access this memory.
The s_memory function call is provided to request a block of system memory.
The function begins by declaring local variables, including the unions and
structure. The value passed to the function is the number of words being
requested. This value is doubled to get the number of bytes. The (cl) register
is set to the system memory value and the (dx) register to the number of
bytes. The int86x call is then made. If the (cx) register returns with a value
of 0xFFFF, a NULL pointer is returned, meaning there was not sufficient memory
available to be assigned. The segment of the pointer and the offset are
returned in the (es) and (ax) registers, respectively. These are combined into
a pointer and returned to the caller.
Version 5 of DRMDOS provides methods for requesting larger blocks of memory
but I have not made use of this feature. These blocks reside in the extended
memory area.


Interprocess Communications


Queues are a powerful method of passing data between two processes. The number
of readers or writers that can use DRMDOS queues is not limited. In spite of
this, I have found that typically, I use queues in two modes. The first is a
data-collection mode--a queue, for instance, used to funnel messages to be
sent out over a communication link. Many writers are potentially generating
messages to be sent, but only one reader takes the messages and processes
them. The second mode is the broadcast mode, used to distribute information
from a single writer to many readers. One process creates the queue and writes
the original information to it. Other processes read the information and then
rewrite it to make it available to other processes. An example of this will be
shown later when a process uses a queue to pass the pointer to a shared-memory
database.
To be used, a queue must be created or made by a process. Once made, the queue
must be opened. A pair of data structures--the queue descriptor and parameter
blocks--must be used to make and open a queue (see Listing Two, page 111). The
queue descriptor block tells DRMDOS some details of the queue. It contains
several fields that must be filled in by the process and others used by DRMDOS
to manage the queue. The queue parameter block is used to provide access to
the queue.
The q_make( ) call allows the creation of queues. Five items are passed the
q_make( ) function: a pointer to a queue descriptor block, the length and
number of messages the queue will support, the name of the queue, and a
pointer to an error variable. The error variable is used to return the status
of the queue make operation. The function begins by declaring local variables,
including the required unions and structure. The segment-register values are
read and stored. Several queue descriptor fields are filled in with values
required by DRMDOS. Next, the number and length of the messages are filled in.
The flag's value is set to 0 to indicate that no flags are being used. The
flags indicate the type of queue being created. A general-use queue has flags
set to 0. A queue can also be declared strictly for semaphore or system use.
The name of the queue is then copied into the proper field. Normally, a queue
being created by a user process sets the buffer field to 0, meaning the
operating system will use its own memory for the buffer. This leads to
problems only if the system buffer space is used up. The DRMDOS programmer's
guide gives more information about other methods of providing a buffer but I
have not found it necessary to do this. The (cl) register is set to the q_make
value. The (dx) register is set to the offset part of the pointer to the
descriptor table using the FP_OFF call. The int86x call is then made. The
error-pointer variable is set to the result code from the (cx) register. The
result of the int86x call is also returned to provide additional information.
If the call is successful, the queue is ready to be opened.
To access a queue, a process must first open it. Any process can open an
existing queue using the q_open( ) function call. Three items are passed to
the q_open function: a pointer to a queue parameter block, a queue name, and a
pointer to an error variable. The function begins by declaring local
variables, including the required unions and structure. The segment register
values are read and stored. Two of the parameter block fields are set to 0, as
required by DRMDOS. The filename is then copied into the parameter block. The
(cl) register is set to the q_open value. The (dx) register is set to the
offset part of the pointer to the parameter block using the FP_OFF( ) call.
The int86x call is then made. The error-pointer variable is set to the result
code from the (cx) register. The result of the int86x call is also returned to
provide additional information. If the call is successful, the queue is ready
to be written or read.
The q_read( ) and q_cread( ) function calls are used to read a message from
the queue. The q_read( )function is an unconditional read: If no messages are
available from the queue, the process is suspended until one is available. The
q_cread( ) function is a conditional read: If no messages are available, the
function returns a value to indicate this. Any nonzero value indicates an
error condition.
The information passed to the functions include pointers to the parameter
block, to a buffer used to receive the message, and to the error reporting
location. The functions begin by declaring local variables, including the
required unions and structure. The segment-register values are read and
stored. The offset of the buffer is obtained using FP_OFF and put into the
parameter block. The (cl) register is set to the proper value for the read or
cread operation. The int86x call is then made. In the case of the conditional
read, the error-reporting location is set to the value in the (cx) register. A
value of 0 indicates that a message is available in the buffer. In the case of
the unconditional read, the value will be nonzero if the queue does not exist
or is not open. Upon a successful return, the message will be available for
use. It is important to ensure that the buffer is sized properly to hold the
message.
The q_write( ) and q_cwrite( ) calls are used to write a message to the queue.
q_write( ) is an unconditional write: If there is no room in the queue to
accept a message, the process is suspended until room is available. q_cwrite(
) is a conditional write: If no room is available in the buffer, the function
will return a value to indicate this. Any nonzero value indicates an error
condition.
The information passed to the functions includes pointers to the parameter
block, to a buffer used to deliver the message, and to the error-reporting
location. The functions begin by declaring local variables, including the
required unions and structure. The segment-register values are read and
stored. The offset of the buffer is obtained using the FP_OFF( ) call and put
into the parameter block. The (cl) register is set to the proper value for the
write or cwrite operation. The int86x call is then made. In the case of the
conditional write, the error reporting location is set to the value in the
(cx) register. A value of 0 indicates that a message is available in the
buffer. In the case of the unconditional write, the value will be nonzero if
the queue does not exist or is not open. Upon a successful return, the message
will have been accepted by the queue and is available for reading.


Next Month


In next month's installment, I'll describe how the DRMDOS interface library
can be used to develop a system consisting of three independent processes: The
first is the owner of a memory resident database, the second an I/O process,
and the third a logic function that monitors data in the input portion of the
database for changes. When a change is encountered, the data is operated on to
produce an output that is put back into the database.



Products Mentioned


DR Multiuser DOS Digital Research Inc. Box DRI Monterey, CA 93942 408-649-3896
$695 System requirements: 386SX, 386, or 486 PCs and compatibles


_MULTIUSER DOS FOR CONTROL SYSTEMS: PART I_
by Richard Kryszak


[LISTING ONE]

/* file name: system.c */

#include <dos.h>
#include "queues.h"
#include <stdio.h>

/*===============*/
/* local defines */
/*===============*/
#define CCPM 0xE0 /* cdos call int value */
#define C_DETACH 0x93 /* console detach CL register value */
#define P_DELAY 0x8D /* process delay CL register value */
#define P_DISPATCH 0x8E /* process dispatch CL register value */
#define P_PRIOR 0x91 /* process priority CL register value */
#define Q_CREAD 0x8A /* queue cread CL register value */
#define Q_CWRITE 0x8C /* queue cwrite CL register value */
#define Q_MAKE 0x86 /* queue make CL register value */
#define Q_OPEN 0x87 /* queue open CL register value */
#define Q_READ 0x89 /* queue read CL register value */
#define Q_WRITE 0x8B /* queue write CL register value */
#define S_MEMORY 0x59 /* system memory allocation request */

/*=====================*/
/* function prototypes */
/*=====================*/
unsigned int c_detach(void);
void p_dispatch(void);
void p_priority(unsigned char data);
void p_delay(unsigned int del);
unsigned int far * s_memory(int mem_size);
int q_make(struct q_descriptor *descript_ptr,
 unsigned int msg_length,
 unsigned int num_msg,
 char que_name[8],
 int *err_ptr);
int q_open(struct q_parameter_blk *param_blk_ptr,
 char que_name[8],
 int *err_ptr);
int q_read(struct q_parameter_blk *param_blk_ptr,
 unsigned char *buff_ptr,
 int *err_ptr);
int q_write(struct q_parameter_blk *param_blk_ptr,
 unsigned char *buff_ptr,
 int *err_ptr);
int q_cread(struct q_parameter_blk *param_blk_ptr,
 unsigned char *buff_ptr,
 int *err_ptr);

int q_cwrite(struct q_parameter_blk *param_blk_ptr,
 unsigned char *buff_ptr,
 int *err_ptr);

/*======================*/
/* function definitions */
/*======================*/
unsigned int c_detach()
 { union REGS inregs,outregs;

 inregs.h.cl = C_DETACH; /* detach function call */
 int86(CCPM,&inregs,&outregs); /* call cdos */
 return(outregs.x.ax); /* return call status */
 }
void p_dispatch()
 { union REGS inregs,outregs;

 inregs.h.cl = P_DISPATCH; /* dispatch function call */
 int86(CCPM,&inregs,&outregs); /* call cdos */
 }
void p_priority(unsigned char priority)
 { union REGS inregs,outregs;

 inregs.h.cl = P_PRIOR; /* priority change call */
 inregs.h.dl = priority; /* desired priority */
 int86(CCPM,&inregs,&outregs); /* call cdos */
 }
void p_delay(unsigned int del)
 { union REGS inregs,outregs;
 inregs.h.cl = P_DELAY; /* delay function call */
 inregs.x.dx = del; /* number of ticks */
 int86(CCPM,&inregs,&outregs); /* call cdos */
 }
unsigned int far * s_memory(int mem_size)
 { union REGS inregs,outregs;
 struct SREGS seg_regs; /* segment registers */
 unsigned int _far *mem_ptr=NULL; /* pointer to memory block */
 mem_size *= 2; /* compute # of bytes */
 inregs.h.cl = S_MEMORY; /* system memory allocation */
 inregs.x.dx = mem_size; /* # of bytes requested */
 int86x(CCPM,&inregs,&outregs,&seg_regs); /* call cdos */
 if(outregs.x.ax == 0xFFFF) /* if not successful */
 { return(NULL); /* return a null pointer */
 }
 mem_ptr = (unsigned int far *)
 ((0x10000 * seg_regs.es)
 + outregs.x.ax); /* convert into a pointer */
 return(mem_ptr); /* return the pointer */
 }
int q_make(struct q_descriptor *descript_ptr,
 unsigned int msg_length,
 unsigned int num_msg,
 char que_name[8],
 int *err_ptr)
 { int int86_error; /* return status */
 int i; /* index variable */
 union REGS inregs, outregs; /* processor registers */
 struct SREGS seg_regs; /* segment registers */
 segread(&seg_regs); /* read segment registers */

 descript_ptr->internal_1 = 0; /* must be 0 */
 descript_ptr->internal_2 = 0; /* must be 0 */
 descript_ptr->internal_3 = 0; /* must be 0 */
 descript_ptr->internal_4 = 0; /* must be 0 */
 descript_ptr->internal_5 = 0; /* must be 0 */
 descript_ptr->internal_6 = 0; /* must be 0 */
 descript_ptr->msglen = msg_length; /* add message length */
 descript_ptr->nmsgs = num_msg; /* add number of messages */
 descript_ptr->flags = 0; /* no flags used */
 for(i = 0; i < 8; ++i) /* copy queue name */
 { descript_ptr->name[i]=que_name[i];
 }
 descript_ptr->buffer = 0; /* buffer in system area */
 inregs.h.cl = Q_MAKE; /* queue make call */
 inregs.x.dx = FP_OFF(descript_ptr); /* put offset into dx */
 int86_error=int86x(CCPM,&inregs,
 &outregs,&seg_regs); /* call cdos */
 *err_ptr = outregs.x.cx; /* write error code */
 return(int86_error); /* int86 return status */
 }
int q_open(struct q_parameter_blk *param_blk_ptr,
 char que_name[8],
 int *err_ptr)
 { int int86_error; /* return status */
 int i; /* index variable */
 union REGS inregs, outregs; /* processor registers */
 struct SREGS seg_regs; /* segment registers */
 segread(&seg_regs); /* read segment registers */
 param_blk_ptr->internal_1 = 0; /* must be 0 */
 param_blk_ptr->internal_2 = 0; /* must be 0 */
 for(i = 0; i < 8; ++i)
 { param_blk_ptr->name[i] = que_name[i]; /* copy queue name */
 }
 inregs.h.cl = Q_OPEN; /* q_open call */
 inregs.x.dx = FP_OFF(param_blk_ptr); /* put offset into dx */
 int86_error=int86x(CCPM,&inregs,
 &outregs,&seg_regs); /* call cdos */
 *err_ptr = outregs.x.cx; /* write error code */
 return(int86_error); /* int86 return status */
 }
int q_write(struct q_parameter_blk *param_blk_ptr,
 unsigned char *buff_ptr,
 int *err_ptr)
 { int int86_error; /* return status */
 union REGS inregs, outregs; /* processor registers */
 struct SREGS seg_regs; /* segment registers */
 segread(&seg_regs); /* read segment registers */
 param_blk_ptr->buffer=FP_OFF(buff_ptr); /* pointer to the buffer */
 inregs.h.cl = Q_WRITE; /* q_write call */
 inregs.x.dx = FP_OFF(param_blk_ptr); /* put offset into dx */
 int86_error=int86x(CCPM,&inregs, &outregs,&seg_regs); /* call cdos */
 *err_ptr = outregs.x.cx; /* write error code */
 return(int86_error); /* int86 return status */
 }
int q_read(struct q_parameter_blk *param_blk_ptr,
 unsigned char *buff_ptr,
 int *err_ptr)
 { unsigned int int86_error; /* return status */
 union REGS inregs, outregs; /* processor registers */

 struct SREGS seg_regs; /* segment registers */
 segread(&seg_regs); /* read segment registers */
 param_blk_ptr->buffer=FP_OFF(buff_ptr); /* pointer to the buffer */
 inregs.h.cl = Q_READ; /* q_read call */
 inregs.x.dx = FP_OFF(param_blk_ptr); /* put offset into dx */
 int86_error=int86x(CCPM,&inregs,&outregs,&seg_regs); /* int86 call */
 *err_ptr = outregs.x.cx; /* write error code */
 return(int86_error); /* int86 return status */
 }
int q_cwrite(struct q_parameter_blk *param_blk_ptr,unsigned char *buff_ptr,
 int *err_ptr)
 { int int86_error; /* return status */
 union REGS inregs, outregs; /* processor registers */
 struct SREGS seg_regs; /* segment registers */
 segread(&seg_regs); /* read segment registers */
 param_blk_ptr->buffer=FP_OFF(buff_ptr); /* pointer to the buffer */
 inregs.h.cl = Q_CWRITE; /* q_write call */
 inregs.x.dx = FP_OFF(param_blk_ptr); /* put offset into dx */
 int86_error=int86x(CCPM,&inregs,&outregs,&seg_regs); /* call cdos */
 *err_ptr = outregs.x.cx; /* write error code */
 return(int86_error); /* int86 return status */
 }
int q_cread(struct q_parameter_blk *param_blk_ptr,unsigned char *buff_ptr,
 int *err_ptr)
 { int int86_error; /* return status */
 union REGS inregs, outregs; /* processor registers */
 struct SREGS seg_regs; /* segment registers */
 segread(&seg_regs); /* read segment registers */
 param_blk_ptr->buffer=FP_OFF(buff_ptr); /* pointer to the buffer */
 inregs.h.cl = Q_CREAD; /* q_cread call */
 inregs.x.dx = FP_OFF(param_blk_ptr); /* put offset into dx */
 int86_error=int86x(CCPM,&inregs,&outregs,&seg_regs); /* call cdos */
 *err_ptr = outregs.x.cx; /* write error */
 return(int86_error); /* int86 return status */
 }






[LISTING TWO]

/* file name: queues.h */

struct q_descriptor
{ unsigned int internal_1; /* for internal use ; must be zero */
 unsigned int internal_2; /* for internal use ; must be zero */
 int flags; /* for internal use ; queue flags */
 char name[8]; /* queue name */
 int msglen; /* number of bytes in each logical message */
 int nmsgs; /* maximum number of messages supported */
 unsigned int internal_3; /* for internal use ; must be zero */
 unsigned int internal_4; /* for internal use ; must be zero */
 unsigned int internal_5; /* for internal use ; must be zero */
 unsigned int internal_6; /* for internal use ; must be zero */
 unsigned int buffer; /* address of the queue buffer */
 };


struct q_parameter_blk
{ unsigned int internal_1; /* for internal use ; must be zero */
 int queueid; /* queue number field ; filled by q_open */
 unsigned int internal_2; /* for internal use ; must be zero */
 unsigned int buffer; /* offset of queue message buffer */
 char name[8]; /* queue name */
 };






[LISTING THREE]

/* file name: database.c */

#include <stdio.h>
#include "queues.h"

/*=====================*/
/* function prototypes */
/*=====================*/
void main(void);

/*===============*/
/* local defines */
/*===============*/
#define Q_DEPTH 1 /* queue contains 1 message */
#define DBASE_SIZE 2048 /* size of the database */
#define TRUE 1

/*================================*/
/* external function declarations */
/*================================*/
extern unsigned int c_detach(void);
extern void p_delay(unsigned int del);
extern unsigned int far *s_memory(int mem_size);
extern int q_make(struct q_descriptor *descript_ptr,
 unsigned int msg_length,
 unsigned int num_msg,
 char que_name[8],
 int *err_ptr);
extern int q_open(struct q_parameter_blk *param_blk_ptr,
 char que_name[8],
 int *err_ptr);
extern int q_write(struct q_parameter_blk *param_blk_ptr,
 unsigned char *buff_ptr,
 int *err_ptr);
/*=====================*/
/* function definition */
/*=====================*/
void main()
 { int result; /* result of q_make */
 int error_type; /* cdos return code */
 union base
 { unsigned int far *base_ptr;
 unsigned char base[sizeof(unsigned int far *)];
 }base_union; /* composite pointer */

 struct q_descriptor dbase_descript; /* descriptor block */
 struct q_parameter_blk dbase_parameters; /* parameter block */
 base_union.base_ptr = s_memory(DBASE_SIZE); /* request system memory */
 if(base_union.base_ptr == NULL) /* if NULL pointer */
 { puts("No System Memory Available"); /* print an error message */
 exit(-1); /* exit, memory error */
 }
 result = q_make(&dbase_descript, /* pointer to descriptor */
 sizeof(base_union), /* length of messages */
 Q_DEPTH, /* number of messages */
 "database", /* queue name */
 &error_type); /* error return */
 result = q_open(&dbase_parameters, /* pointer to parameter */
 "database", /* queue name */
 &error_type); /* error return */
 result = q_cwrite(&dbase_parameters, /* write to queue */
 &base_union.base[0], /* pointer to database */
 &error_type); /* error return */
 c_detach(); /* detach from console */
 while(TRUE) /* loop */
 { p_delay(1800); /* delay 30 seconds */
 }
 }
/* NOTE: DATABASE.EXE is made up of database.c and system.c */






[LISTING FOUR]

/* file name: dbsuport.c */

#include <stdio.h>
#include "queues.h"

/*=====================*/
/* function prototypes */
/*=====================*/
void dbopen(void);
unsigned int dbread(int index);
void dbwrit(int index, unsigned int value);
unsigned int far *open_dbase(void);

/*==============================*/
/* external function prototypes */
/*==============================*/
extern int q_open(struct q_parameter_blk *param_blk_ptr,
 char que_name[8],
 int *err_ptr);
extern void p_dispatch(void);
extern int q_read(struct q_parameter_blk *param_blk_ptr,
 unsigned char *buff_ptr,
 int *err_ptr);
extern int q_write(struct q_parameter_blk *param_blk_ptr,
 unsigned char *buff_ptr,
 int *err_ptr);


/*================*/
/* global storage */
/*================*/
unsigned int far *dbase_ptr; /* pointer to database */

/*===============*/
/* local defines */
/*===============*/
#define FAILURE 1
#define SUCCESS 0

/*======================*/
/* function definitions */
/*======================*/
void dbopen()
 { dbase_ptr = NULL; /* initialize pointer */
 while(dbase_ptr == NULL) /* loop while still NULL */
 { dbase_ptr = open_dbase(); /* call open database */
 }
 }
void dbwrit(int index, unsigned int value)
 { *(dbase_ptr + index) = value; /* write value to database */
 }
unsigned int dbread(int index)
 { return(*(dbase_ptr + index)); /* return value at index */
 }
unsigned int far *open_dbase()
 { struct q_parameter_blk dbase_parameters; /* parameter block */
 int result; /* result of q_make */
 int error_type; /* cdos return code */
 union base
 { unsigned int far *base_ptr;
 unsigned char base[sizeof(unsigned int far *)];
 }base_union; /* composite pointer */
 result = FAILURE; /* preset the variable */
 while(result != SUCCESS) /* loop til we can open */
 { result = q_open(&dbase_parameters, /* pointer to param block */
 "database", /* queue name */
 &error_type); /* error type return */
 p_dispatch(); /* let someone else run */
 }
 result = q_read(&dbase_parameters, /* read the dbase_queue */
 &base_union.base[0], /* msg read is in union */
 &error_type); /* error return type */
 result = q_write(&dbase_parameters, /* write to dbase_queue */
 &base_union.base[0], /* msg sent is pointer */
 &error_type); /* error return type */
 return(base_union.base_ptr); /* return the pointer */
 }







[LISTING FIVE]

/* file name ioboard.c */


#include <conio.h>

/*====================*/
/* function prototype */
/*====================*/
void main(void);

/*==============================*/
/* external function prototypes */
/*==============================*/
extern void dbopen(void);
extern void dbwrit(int index, unsigned int value);
extern unsigned int dbread(int index);
extern unsigned int c_detach(void);
extern void p_priority(unsigned char data);
extern void p_delay(unsigned int del);

/*===============*/
/* local defines */
/*===============*/
#define INPUT_BASE_ADDR 0x300 /* hardware input address */
#define OUTPUT_BASE_ADDR 0x300 /* hardware output address */
#define DBASE_WRITE_ADDR 0 /* database write location */
#define DBASE_READ_ADDR 1 /* database read location */
#define TRUE 1

/*=====================*/
/* function definition */
/*=====================*/
void main()
 { unsigned int temp_data; /* for reading database */
 p_priority(199); /* set priority */
 dbopen(); /* link to database */
 c_detach(); /* detach from console */
 while(TRUE)
 { temp_data = inp(INPUT_BASE_ADDR); /* read data from port */
 dbwrit(DBASE_WRITE_ADDR, temp_data); /* write to the database */
 temp_data = dbread(DBASE_READ_ADDR); /* read data from database */
 outp(OUTPUT_BASE_ADDR+3, temp_data); /* write to output port */
 p_delay(3); /* delay for 50 ms */
 }
 }

/* NOTE: IOBOARD.EXE is made up of ioboard.c, system.c, and dbsuport.c */






[LISTING SIX]

/* file name logic.c */

/*====================*/
/* function prototype */
/*====================*/
void main(void);


/*==============================*/
/* external function prototypes */
/*==============================*/
extern void dbopen(void);
extern unsigned int dbread(int index);
extern void dbwrit(int index, unsigned int value);
extern unsigned int c_detach(void);
extern void p_dispatch(void);

/*===============*/
/* local defines */
/*===============*/
#define DATA_IN 0 /* data written by I/O process */
#define DATA_OUT 1 /* data read by I/O process */
#define TRUE 1

/*=====================*/
/* function definition */
/*=====================*/
void main()
 { static unsigned int last_data_read; /* old data retainer */
 unsigned int temp_data; /* for reading database */

 dbopen(); /* link to database */
 c_detach(); /* detach from console */
 while(TRUE) /* continuous loop */
 { temp_data = dbread(DATA_IN); /* read input data */
 if(temp_data ^ last_data_read) /* if there was a change */
 { dbwrit(DATA_OUT, ~temp_data); /* write the to the port */
 last_data_read = temp_data; /* save the new value */
 }
 p_dispatch(); /* let another process run */
 }
 }
/* NOTE: LOGIC.EXE is made up of logic.c, system.c, and dbsuport.c */


























April, 1992
WINDOWS PROGRAMMING WITH BASIC


GFA Basic simplifies Windows programming while maintaining DOS compatibility




Raymond J. Schneider


Ray is the director of engineering at ComSonics Inc. in Harrisonburg, Va. He
is a licensed professional engineer in the state of Virginia and a doctoral
student of Information Technology at George Mason University.


Writing programs for Windows 3 can be a daunting task, especially with
Microsoft's SDK (System Development Kit) and other C-based development
environments. Basic, however, promises to deliver Windows programming with
much less effort. And with Microsoft's Visual Basic, Within Technologies'
Realizer Basic, and GFA Software's GFA Basic for Windows, there's suddenly a
plethora of Windows development tools available for all programmers.
Sure, many programmers think that "real programmers" don't use Basic, but it
can't be beat for writing quick, scientific programs. (In the IEEE 488 world
of instrument control, in fact, Basic is the norm.) Basic is also terrific for
checking algorithms and building those quick-look prototypes.
Consequently, when I received a copy of GFA Basic for Windows 3, I was anxious
to put it through its paces. To test GFA Basic's effectiveness as a Windows
programming tool, I built an application based on a GW Basic program I'd
previously published (see Computer User, November 1990) that uses Lorenz
equations to show how mathematical chaos works. The Windows version of the
application presented here uses standard Windows features such as menuing,
multiple windows, buttons, and so on.


Programming for Chaos


In his efforts to understand the unpredictability of weather, MIT
meteorologist Edward Lorenz discovered chaos when experimenting with a
simplified model of the weather in the early '60s. He found that even tiny
changes in the starting conditions quickly led to major shifts in the
predicted behavior. This came to be called the "Butterfly Effect," because
changes as small as the flick of a butterfly's wing in initial conditions led
to substantial changes later, in the time evolution of the equations. Lorenz
concluded that if weather actually behaves in a fashion similar to his
equations, then long-range weather prediction isn't possible.
The Lorenz equations, shown in Example 1(a), are a simplified model of
atmospheric convective warming -- a flat, fluid layer is warmed from below and
cooled from above. The variable x represents the convective motion, y the
horizontal temperature variation, and z the vertical temperature variation.
Sigma, Rho, and B are proportional to the Prandtl number, the Rayleigh number,
and the size of the region being approximated, respectively. All these
parameters are positive.
Example 1: (a) The Lorenz equation; (b) a typical loop DO loop to process
events (UNTIL MENU signals that the close box has been moused); (c) using a
SWITCH statement to process menu selections.

 (a)

 dx/dt=SIGMA*(-x+y)
 dy/dt=-x*z+RHO*x-y
 dz/dt=x*y-B*z

 (b)

 DO
 GETEVENT
 [Statements to Process Events]
 UNTIL MENU(1)=4

 (c)

 IF MENU(1)=20
 SWITCH MENU(0)
 CASE 1 ...
 ENDSWITCH
 ENDIF

The Lorenz equations are three dimensional, so you can have three different
normalized plane views. Consequently, I wanted to use a menu bar to select
views of the Lorenz equations to graph, and to provide some pop-up requesters
to set the parameters. To keep it simple, I wanted to have a menu bar that let
me select views (XY, YZ, XZ), a pop-up requester to set the initial parameters
(Sigma, Rho, and B), and finally a Go menu item to start the calculations and
plotting. The equations run in all the selected boxes until a predefined
number of points has been plotted; selecting Go again will plot another slug
of points.
My first step was getting the menu bar to activate. This seemed easy because
all you have to do is initialize a string array with the various menu titles,
submenu titles, and separating blank strings and then invoke the menu with a
MENU array$() command.
All communications between GFA Basic and Windows take place through an array
named MENU(), which is only a little confusing. You have to get used to
Windows doing all sorts of things over which you have little control: At any
moment your program must be prepared to respond to an event caused by some
operator action. (This event-driven programming style does not come naturally
to a Basic programmer used to strictly procedural code.)


Processing Events


GFA Basic hooks into the Windows environment through a variety of functions,
the MENU() array, ON MENU functions, and looping while listening to Windows. A
typical construct is shown in Example 1(b).
There are two ways to process menu items: The first uses a SWITCH statement in
the event loop, and the second uses the ON MENU GOSUB statement. In the
application (see listing One, page 112), I chose the first method, which nests
a SWITCH statement inside an IF statement.

In Example 1(c), for instance, MENU (1) is set to 20 to signal a menu-bar
event, and the string array selected is contained in MENU(O). It took me a
little time to discover that the menu titles, despite being highlighted and
having a string-array number, were not returned in MENU(O). So, I changed my
menu bar to have GO as a submenu item rather than a main-menu item. There's
probably a way to detect the mouse click on the main-menu item, but I didn't
find it in the documentation. Once you can invoke a menu-bar item, it's simple
to incorporate the necessary instructions in the CASE elements of the SWITCH
statement.


Dialogs and Views


Next I needed a dialog box to pop up on the screen and let me enter the
critical parameters of the Lorenz equations -- SIGMA, RHO, and B. GFA Basic
supplies a lot of push buttons, radio buttons, check boxes, combo boxes, list
boxes, and so on. But all I wanted was a simple box to enter data into and a
button to push when finished. Digging through the documentation produced
instructions on how to set up a dialog box. The procedure SetDialog() records
the process. Within a DIALOG statement, you list the properties and locations
of the control elements that you want in the dialog box. When you've defined
all these items, you put the dialog box on the window with a SHOW-DIALOG #n
statement, where #n is the number of the dialog.
The views were established as child windows -- windows inside a parent window.
To maximize simplicity, I didn't try to incorporate resizing or redrawing
functions, but with enough attention to detail (all those messages in the
MENU() array) you can detect window resizing, requests to paint the window
over, and all the other amazing amount of stuff you have to keep track of to
maintain a well-behaved Windows application.
The HandleMessage() and GetText() procedures were modeled on similar
procedures in the GFA Basic manuals.


Computing the Function


The final task was to add the computation and graphics to the application. I
wrote a single procedure, PlotLorenz (X,Y,Z,N%,I%), which plots N% points
starting at X,Y,Z in the plane identified by the integer I% (1 = XY, 2 = YZ, 3
= ZX). On repeated presses of GO, the program continues to plot points in
additional groups of N%, set to 100 in the program. The actual plotting was
accomplished using the LINE x1,y1,x2,y2 statement. I made no attempt to scale
the data for proper aspect ratio or "handedness," although that would be more
accurate for visualization.
Nevertheless, the resulting program is interesting. First you pop open the
windows you want (usually all three), then open the control menu and select
"Set Parameters." Next, you enter the values in the EDITTEXT boxes and mouse
on the OK button to accept the values. Finally, click the mouse on GO and
continue until satisfied. Restart the program for another run. A reasonable
starting value is SIGMA = 60, RHO = 40, B = 20. This gets you a view of a
strange attractor with a lazy-eight orbit; see Figure 1.


About GFA Basic for Windows


While the chaos program gave me an opportunity to sample the joys of Windows
programming, I've only lightly touched the power of GFA Basic. The programming
environment combines a syntax-sensitive editor and a direct-mode environment
which I didn't find much occasion to use. The editor is easy to use and
helpful, because it refuses to let you enter a line with incorrect syntax. It
could, however, be confused by incorrectly terminated compound statements
(NEXT I% or ENDIF, for example).
The language itself is rich in capability. It includes a large array of
graphics functions, linear algebra (matrix) functions, and a host of
specialized functions for handling the Windows environment. Remarkably, except
for about 25 Windows-specific commands, the language is compatible with its
DOS counterpart, so that developers creating applications in GFA Basic can
easily transport them, complete with windowing functions, to a non-Windows
environment.
GFA Basic for Windows comes with several example programs, written in GFA
Basic, which double as utilities and examples of powerful Windows code.
RCS.GFW is an elementary dialog editor which generates ASCII text files of
statements that create the code for radio buttons, combo boxes, and so on for
an application. When you SAVE your window with the various elements that you
have selected and positioned, you get the necessary GFA Basic instructions
written to a text file, which can then be merged with your program under
development.
ICOEDIT.GFW is another example, this time of an icon editor. I played around
with ICOEDIT and created a little icon with a feather pen and an ink bottle.
Then I saved the icon to a file with an .ICO extension.
To test the built-in compiler which generates .EXE files, I first asked it to
make an .EXE file out of the ICO-EDIT.GFW. This it did in a flash. I attached
the ink-pot icon I'd designed to the file and popped it into my
program-manager window. I double-clicked on the ink pot and there I was,
running the icon editor as an independent .EXE file. The compiler ran quickly
and transparently.
I found several of the example programs somewhat fragile. Some locked up my
machine, forcing an ALT-CTRL-DEL to escape from the situation.


GFA vs. Visual Basic


How does GFA Basic stack up to the more famous Microsoft Visual Basic? The two
are sharply different in philosophy and feel. Visual Basic is a dramatic
departure from traditional procedural programming languages because it
requires the programmer to adopt a particular event-driven, state-machine
model of programming.
Visual Basic programs are not composed in a text-editor environment, but
rather in an environment that resembles a draw program. The left side of the
screen contains a vertical panel of tool icons, while in the center of the
screen is a generic window to receive the programmer's ministrations.
To construct a Visual Basic program, you first create the forms that will
communicate with the user. You then add features such as menus, text boxes,
and buttons to your forms. Most of this is accomplished by "hook and drag"
mousing around. Then you add code to each element of your form or control in a
text editor, so that when the control is invoked, the program does the right
thing. After saving your code as a project, you test it and create an
executable. This is easy for small programs with just a few choices, but it
gets complex fast.
The tools Visual Basic provides for manipulating forms include a variety of
buttons and boxes as well as labels, scrolling controls, and timers. If you
don't find enough controls already built in, Microsoft offers the Control
Development System so you can develop your own.
For small programs, Visual Basic is the hands-down winner in the ease-of-use
category. As programs get larger, however, you'll find that more discipline is
required in using Visual Basic, because the event-driven paradigm fragments
the program into small pieces which may be invoked under a variety of complex
conditions. Debugging such a beast is rather different from debugging
structured procedural code. Moreover, when you print out your Visual Basic
programs, you don't get all the code. Menus generated in the menu generator,
for example, don't produce any printout when you print out your program. The
same is true of your forms, which are built up but not documented in a
readable form by the system.
GFA Basic is harder to use for small programs than Visual Basic, but it has
the advantage of offering source-code compatibility with its DOS counterpart
to create windows and pull-down menus, and to control mouse actions. In
addition, GFA Basic adds advanced graphics functions to create splines,
ellipses, arcs, and Bezier curves; DLLs to access and update dBase files; LAN
support; support for huge arrays up to 20 Mbytes; and more.


Conclusion


GFA Basic is a good way to get into Windows programming and offers a way of
more quickly creating at least relatively simple Windows applications. A good
friend of mine likes to talk about the law of "Conservation of Complexity" --
there's no such thing as a free lunch. While GFA Basic can help make Windows
programming a little less difficult, the fact remains that there's a complex
beast under those windows, and it takes a complex program to keep up.


Products Mentioned


GFA Basic for Windows 3 GFA Software Technologies Inc. 27 Congress St. Salem,
MA 01970 508-744-0201 $295


_WINDOWS PROGRAMMING WITH BASIC_
by Raymond J. Schneider



[LISTING ONE]


/* Lorenz Equations in GFA BASIC
/* Copyright 1991 Raymond J. Schneider
XEND=0,YEND=0,ZEND=0 //Initializes end of plot run variables
/* Configure and Open Parent Window
pstyle%= WS_OVERLAPPEDWINDOW
OR pstyle%, WS_CLIPCHILDREN
OR pstyle%, WS_VISIBLE
TITLEW #1, "Lorenz Equations"
PARENTW #1,0,0,_X,_Y,pstyle% //Open full window

/* Configure three child windows
cstyle%=0
cstyle%= WS_SYSMENU

/* Title the windows
TITLEW #2, "Lorenz XY-View"
TITLEW #3, "Lorenz YZ-View"
TITLEW #4, "Lorenz ZX-View"

/* Initalize Menu Structure
DIM m$(15)
m$(0)="&Views" //First Menu Item
m$(1)="&XY-View" // Sub-Menu Items
m$(2)="&YZ-View"
m$(3)="&ZX-View"
m$(4)="" // Null-string terminates sub-menus
m$(5)="&Control"
m$(6)="&Set Parameters"
m$(7)="&GO"
m$(8)=""
m$(9)=""
MENU m$( )

/* Main Program
cww%=_X/2-30,chh%=_Y/2-30
ON MENU MESSAGE GOSUB HandleMessage( )
DO
 GETEVENT
 IF MENU(1)=20 //Then a Menu Selection Has been Pressed
 /* Process Menu Selection
 SWITCH MENU(0)
 CASE 1 // Open XY-View
 CHILDW #2,1,5,5,cww%,chh%,cstyle%
 CASE 2 // Open YZ-View
 CHILDW #3,1,cww%+8,5,cww%,chh%,cstyle%
 CASE 3 // Open ZX-View
 CHILDW #4,1,5,chh%+8,cww%,chh%,cstyle%
 CASE 6 //Set-Parameters
 EXIT IF DLGRV&<>0
 SetDialog()
 DO
 SLEEP
 UNTIL DLGRV&
 TEXT 0,0,"Sigma="+STR$(SIGMA)
 TEXT 0,12,"Rho= "+STR$(RHO)
 TEXT 0,24,"B= "+STR$(B)
 CASE 7 // Calculate Lorenz and Plot in Windows
 TOPW #2
 IF XEND=0 //Initialization of starting conditions X,Y,Z

 X=1
 ELSE
 X=XEND
 ENDIF
 IF YEND=0
 Y=1
 ELSE
 Y=YEND
 ENDIF
 IF ZEND=0
 Z=1
 ELSE
 Z=ZEND
 ENDIF
 PlotLorenz(X,Y,Z,100,1)
 TOPW #3
 PlotLorenz(X,Y,Z,100,2)
 TOPW #4
 PlotLorenz(X,Y,Z,100,3)
 ENDSWITCH
 ENDIF
 IF MENU(1)=4
 SWITCH MENU(14)
 CASE 2
 CLOSEW #2
 CASE 3
 CLOSEW #3
 CASE 4
 CLOSEW #4
 ENDSWITCH
 ENDIF
 EXIT IF MENU(1)=4 && MENU(14)=1
LOOP
CLOSEW #1
END

PROCEDURE HandleMessage( )
 IF DLG(1)=MENU(15)
 ENDIF /*Discard General Messages
 DLGRV&=0
 hw%=GetParent(MENU(15))
 IF hw%=DLG(1)
 SWITCH MENU(6)
 CASE 100
 IF MENU(1)=30
 GetText(1,101,SIGMA$)
 GetText(1,102,RHO$)
 GetText(1,103,B$)
 SIGMA=VAL(SIGMA$),RHO=VAL(RHO$),B=VAL(B$)
 DLGRV&=MENU(6)
 ENDIF
 ENDSWITCH
 ENDIF
RETURN

PROCEDURE GetText(d,di&,VAR a$)
 LOCAL buffer$
 buffer$=SPACE$(100)
 IF GetWindowText(DLGITEM(d,di&),V:buffer$,LEN(buffer$))>0

 a$=CHAR{V:buffer$}
 ELSE
 a$=""
 ENDIF
RETURN

PROCEDURE SetDialog()
 s%=WS_TABSTOP
 sb%=s%
 sb%=BS_DEFPUSHBUTTON
 seb%=s%
 seb%=WS_BORDER
 seb%=ES_UPPERCASE
 ~SetFocus(DLGITEM(1,100))
 DLGRV&=0
 DIALOG #1,325,175,300,160,"Set Parameters"
 DLGBASE UNIT
 DEFPUSHBUTTON "OK",100,80,65,14,12,sb%
 EDITTEXT "",101,80,10,40,12,seb%
 EDITTEXT "",102,80,30,40,12,seb%
 EDITTEXT "",103,80,50,40,12,seb%
 RTEXT "SIGMA",104,10,10,55,12
 RTEXT "RHO",105,10,30,55,12
 RTEXT "B",106,10,50,55,12
 ENDDIALOG
 SHOWDIALOG #1
RETURN

PROCEDURE PlotLorenz(X,Y,Z,N%,I%)
 /* SIGMA, RHO, B assumed Global
 LOCAL LX,LY,LZ,LX1,LY1,LZ1,DT,j%
 LOCAL DX,DY,DZ
 PX=_X/4+50,PY=_Y/4+10 // Plot Offsets
 LX=X,LY=Y,LZ=Z,DT=.01
 FOR j%=1 TO N%
 DX=SIGMA*(LY-LX)*DT
 DY=(-LX*LZ+RHO*LX-LY)*DT
 DZ=(LX*LY - B*LZ)*DT
 LX1=LX+DX
 LY1=LY+DY
 LZ1=LZ+DZ
 SWITCH I%
 CASE 1 //PLOT XY
 LINE LX+PX,LY+PY,LX1+PX,LY1+PY
 CASE 2 //PLOT YZ
 LINE LY+PX,LZ+PY,LY1+PX,LZ1+PY
 CASE 3 //PLOT ZX
 LINE LZ+PX,LX+PY,LZ1+PX,LX1+PY
 ENDSWITCH
 LX=LX1,LY=LY1,LZ=LZ1 // Update function
 NEXT j%
 XEND=LX,YEND=LY,ZEND=LZ
RETURN





































































April, 1992
THE DESIGN OF THE MATHEMATICA PROGRAMMING LANGUAGE


A Single paradigm provides surprising diversity




Roman E. Maeder


Roman is one of the original authors of Mathematica. He is assistant professor
of computer science at the Federal Institute of Technology in Zurich,
Switzerland, and the author of Programming in Mathematica (Addison-Wesley,
1991). He can be reached via Wolfram Research, 100 Trade Center Drive,
Champaign, IL 61820.


Mathematica is a symbolic computation system for doing mathematics. Like
programs such as MacSyma and Maple, it allows the mathematician to define and
to work interactively with a wide range of mathematical functions. In
addition, the Mathematica system provides support for interactive graphics,
sound, and animation.
Unlike some systems, Mathematica also provides a powerful programming language
with unique features. This article describes the rule-based paradigm
underlying the Mathematica language and shows how this model can emulate the
style and structure of mainstream programming languages.


Design Goals


Conventional programming languages are not well suited for expressing
mathematical formulas and algorithms. Therefore, in designing the language in
Mathematica, we had to break new ground. We wanted a system that fits
naturally with how a mathematician works and thinks, and at the same time make
it accessible to programmers accustomed to traditional languages such as C,
Pascal, Lisp, and APL.
The underlying programming model resembles the rule-based system found in
Prolog. This mechanism of pattern-matched rewrite rules underlies all other
programming constructs, including control structures and procedure and
function definitions.
Languages built around a single pervasive programming paradigm usually split
the programmer community into two groups: ardent advocates of that language,
and no less ardent critics. Programming languages in this category include
Lisp, Prolog, Forth, and APL. (If you reacted strongly to the above statement,
you are likely in one of the advocate groups.)
Although it's built on top of a single paradigm, Mathematica combines ideas
from many diverse languages. For example, Mathematica lets you write programs
in a style similar to C or Pascal -- even though, strictly speaking, the
Mathematica language has no procedures or functions. For those used to Lisp,
Mathematica provides list data structures, lambda expressions, and
list-manipulation functions. For people familiar with APL, Mathematica
provides for convenient manipulation of structured data (such as arrays) by
using commands and operators similar to those in APL. The Mathematica language
also allows for object-oriented modularization similar to facilities in
Smalltalk, Modula-2, and C++.


Matching How Math is Done


The Prolog-like pattern matching and database of rewriting rules stems from a
primary design goal: to be able to specify mathematical relations or rules in
an easy and natural way.
In Mathematica there are no function or procedure definitions proper. The only
way to write a program is to specify a set of rewrite rules. A rewrite rule
consists of two parts: the pattern on the left side and the replacement text
on the right side. Computation proceeds by evaluation of expressions. An
expression is evaluated by finding those rewrite rules whose pattern matches
part of the expression. That part is then replaced by the replacement text of
that rule. Evaluation then proceeds by searching for further matching rules
until no more are found.
This process, though perhaps complicated at first glance, is conceptually
rather simple -- and is familiar to scientists and engineers. Handbooks of
mathematical functions or tables of integrals are nothing but large
collections of rewrite rules. They are given as equations, as in Example 1(a).
Say you have a formula you want to simplify, for example a Integral t sin{2} t
dt. You can "look it up" in the table of integrals. You notice that your
formula is of the form shown in Example 1(a), with x replaced by t. You can
therefore replace that part of your formula by the right side of the formula
in the integral table, making sure to replace each occurrence of x by t. The
result then becomes as shown in Example 1(b). You will not find a formula that
matches this simple expression in your handbook, so this is the final answer.
There are many transformation rules that we apply automatically, without
looking them up in a handbook. We might want to transform logarithms of
products into sums of logarithms, for example. The rules we use in our head
are shown in Example 2(a).
Example 2: (a) Rules for logarithms, expressed in traditional mathematical
notation; (b) rules for logarithms, as represented in Mathematica; (c)
Mathematica uses rules to simplify a logarithmic expression.

 (a)

 log (x y) = log x + log y
 log (x{n}) = n log x

 (b)

 In [1] := log [x_ y_] := log [x] + log[y]
 In [2] := log [x_^n_] := n log [x]

 (c)

 In [3] := log[a b c^2/d]
 Out [3] = log[a] + log[b] + 2 log[c] - log[d]

With these two rules we have no trouble transforming log(abc{2}/d) into loga+
logb+ 2 logc-logd by repeatedly applying the rules and by noting that 1/d=
d{-1.} We can program these rules directly into Mathematica. The result is
shown in Example 2(b).
Note that we have to explicitly denote the parts of the pattern that are
variables, by appending an underscore to the variable name; in this example,
x, y, and n are the pattern variables. We can now try to simplify the example
given above; see Example 2(c).
In general, it takes some experience to make the task work as smoothly as this
illustration. Humans are much cleverer at matching patterns than computers.
For example, we have no trouble transforming log 1/3 into -log 3. However, the
Mathematica rules presented in Example 2(b) will not accomplish this, because
the rational number 1/3 is not stored as 3{-1}; therefore, the rule will not
match. A different rule is needed for rational numbers. But we can easily add
such a rule: log[r_Rational]:= log[Numerator[r]]-log[Denominator[r]].
(Mathematica users should also note that we chose to use the symbol log
instead of the built-in logarithm function Log for this example.)
This simple example shows that designing rule sets for performing advanced
computations is not trivial. Some familiarity with the many different pattern
objects that Mathematica supports and some experimenting are required.
Fortunately, this is easy to do in an interpreted system. Also, you can learn
by looking at rule sets that others have written, such as the samples that
come with the Mathematica system.



Mathematica Syntax


Mathematica has a lot of predefined operators, which are represented by a
single character, for example @. While convenient for the expert, a program
that uses these operators freely may be hard to understand by the novice
reader. In this respect, Mathematica can resemble APL. Nevertheless, the basic
structure of the language is simple, and allows all programs to be written
entirely without the use of operators. For example, instead of writing a + b,
you can write Plus[a, b]. Likewise, instead of a b c, you can write Times[a,
b, c].
Expressions are built out of atoms (similar to Lisp atoms) and composite
expressions. Atoms can be symbols (names of variables), numbers, or character
strings. All composite expressions have the form: b[e[1], e[2], ..., e[n]],
where b and the e[i] are themselves expressions.
You can read or interpret a Mathematica expression in any of three ways:
As a function call, for example, Sin[x]
As a procedural command, such as Expand [(a + b)^ 10]
As a collection of objects, as in List [alpha, beta, gamma], (normally written
as {alpha, beta, gamma})
The Mathematica system comes with hundreds of standard, built-in mathematical
functions. These built-ins are compiled functions, implemented in C. Many are
quite powerful -- for example, the NDSolve function, which finds numeric
solutions for differential equations where no closed-form solution exists.
Syntactically, there is no difference between a built-in function and one
defined by the user.
The use of square brackets for function calls is somewhat nonstandard.
Mathematica reserves ordinary parentheses for grouping of expressions, as in
a(b+c). Thus, there is never confusion between a nested expression and a
function call such as a[b + c]. This syntax makes it possible to leave out the
multiplication operator (*), as is the normal custom with mathematical
formulas. The third kind of brackets, curly braces, are used to denote lists
of things.
Even constructs commonly identified as statements are simply expressions. For
example, the "statement" a = 5, in internal form is the expression Set[a, 5].
When evaluated, this expression will assign the value 5 to the variable a. The
semicolon used to separate statements is really an operator, just like + or *.
The sequence stmt[1]; stmt[2]; stmt[3] is therefore read as
CompoundExpression[stmt[1], stmt[2], stmt[3]]. The effect of this compound
expression is, of course, to evaluate each of the three expressions stmt[1],
stmt[2], and stmt[3] in sequence.


Procedural Programming


How can we write procedural programs in a language that offers "only" pattern
matching? The key idea is that a pattern in its simplest form looks just like
a function or procedure declaration (or formal parameter list).
For example, a pattern such as SinSum [x0_, x1_, dx_] can serve as the
declaration of a function with three arguments -- x0, x1, and dx -- inside the
body of the function definition. Such a function would be used in a call such
as 2*SinSum[0, 1, 0.1]. If you recall how the evaluator works, you can see
that the expression SinSum[0, 1, 0.1] will match the pattern SinSum[x0_, x1_,
dx_]. Mathematica will then replace this expression by the right side of the
rule for SinSum, after having replaced x0, x1, and dx by their values 0, 1,
and 0.1. So, the right side of the rule for SinSum [x0_, x1_, dx_] serves as
the body of the function; see Example 3(a).
Example 3: (a) Summing a sine series in a procedural manner; (b) summing a
sine series, as expressed in C.

 (a)

 SinSum[x0_, x1_, dx_] :=
 Module[ {s, x},
 s = 0.0;
 x = x0;
 While[ x <= x1,
 s = s + Sin[x];
 x = x + dx
 ];
 s
 ]

 (b)

 double SinSum (double x0, double x1, double dx)
 {
 double s, x;

 s = 0.0;
 x = x0;
 while (x <= x1) {
 s = s + sin (x),
 x = x + dx;
 }
 return s;
 }

Superficially, there is not much difference between this Mathematica program
and the equivalent C program, shown in Example 3(b). (An expert C programmer
may perhaps write this a bit differently, but my version remains a little more
readable for the wider audience.)
Note that there is no need to declare the type of the formal parameters in
Mathematica. As in other languages that use polymorphic data types, the same
block of code can sum the sine values of real numbers, complex numbers, or
some user-defined types.
Also note that the construct Module [varlist, body] introduces local variables
(s and x, in our example), just like the declaration in the C program.
You can also see that there is no need for a return statement. The last value
evaluated (in this case, the variable s) is returned as the value of the
function.
One final difference is that in Mathematica, unlike C and Pascal, variables
are not restricted to 32-bit integers, but can be arbitrary precision
floating-point (hundreds of digits in length), complex numbers, or
user-defined data types. The number of digits displayed is a configuration
value that can be changed at any time.


Lisp-style Data Manipulation



The Mathematica program in Example 3(a) has a distinctly procedural "look,"
despite the fact that all constructs used in it are simply expressions. The
Mathematica language, however, supports other kinds of programming. Here's how
to define a basic Lisp data system with in Mathematica.
As you may recall, data in Lisp is constructed of elements known as cells,
which are created with (cons x y). The two parts can be retrieved with (car 1)
and (cdr 1), respectively. These functions satisfy the two equations: (car
(cons x y)) = x, and (cdr (cons x y)) = y. These two equations can be entered
directly into Mathematica as rules.
Every Lisp object that is not such a cons cell is an atom -- that is, a number
or a symbol. A nested cell structure of the form: (cons e[1] (cons e[2] ...
(cons e[n] nil) ... )) is called a list and is written as (e[1] e[2] ...
e[n]). Nil is an atom denoting the empty list. The Lisp predicate nullQ
returns true for the empty list, false for nonempty lists. All these
definitions can be concisely expressed in Mathematica, as shown in Example
4(a).
Example 4: (a) Mathematica rules for Lisp-like lists; (b) building
higher-level, Lisp-like forms; (c) applying the Lisp-like functions on a list.

 (a)

 car[cons[e_, l_]] := e
 cdr[cons[e_, l_]] := l
 nullQ[nil] = True
 nullQ[_cons] = False
 list[] = nil
 list[e_, r___] := cons[e, list[r]]

 (b)

 In[1] := append[l_, e_] :=
 If[ nullQ[l], list [e], cons[ car [1], append[cdr[l], e] ] ]

 In[2] := reverse[l_, e_] :=
 If[ nullQ[l], nil, append[ reverse[cdr[l]], car[l] ] ]

 (c)

 In[3] := alist = list [e1, e2, e3, e4, e5]
 Out[3] := (e1 e2 e3 e4 e5)

 In[4] := reverse[alist]
 Out[4] - (e5 e4 e3 e2 e1)

Example 4(b) shows how these basic forms can be used to build higher-level
Lisp functions, such as the familiar append and reverse. append appends a new
element at the end of a list, and reverse, of course, reverses the order of
elements in a list. Example 4(c)shows how these functions would be used, by
defining a sample list and then applying reverse to it. In addition to the
given definitions, Mathematica was told to output lists in true Lisp notation.
(Those definitions are not shown.)


Functional Programming


Moving further away from procedural programming, and going beyond manipulation
of Lisp-style data, we arrive at the realm of truly functional programming.
Functional programming is also= known as applicative programming, in that pure
functions are applied to data, using a minimum of state variables, counters,
and side effects.
For example, a more elegant way of summing the sines of the numbers x[0], x[0]
+dx,... is to first generate a list of these numbers, then map the sine
function over the list, and finally fold the result using the addition
function. The transcript of an interactive Mathematica session that
accomplishes these steps is shown in Example 5(a).
Example 5: (a) A nonprocedural way to sum sines; (b) nesting nonprocedural
functions; (c) a functional rule for summing sines; (d) using the built-in
summation function.

 (a)

 In[1] := Range[ 0, 1, 0.1 ]

 Out[1] = {0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.}

 In [2]:= Map[ Sin, % ]
 Out [2]={0, 0.0998334, 0.198669, 0.29552, 0.389418,

 0.479426, 0.564642, 0.644218, 0.717356, 0.783327,

 0.841471}

 In[3] := Fold[ Plus, 0, % ]
 Out[3] = 5.01388

 (b)


 In[4] := Fold[ Plus, 0, Map[ Sin, Range[ 0, 1, 0.1 ] ] ]
 Out[4] = 5.01388

 (c)

 SinSum[x0_, x1_, dx_] :=
 Fold[ Plus, 0, Map[ Sin, Range[x0, x1, dx] ] ]

 (d)

 SinSum[x0_, x1_, dx_] := Sum[ Sin[x], {x, x0, x1, dx} ]

In the transcript, input lines entered by the user are labeled by Mathematica
according to the form In[n]:=. The lines of output that correspond to this
input have the form Out[n]=. The percent sign (%) is a shorthand symbol that
represents the output of the previous line.
Having followed these steps, we can nest the functions inside each other to
achieve the one-line solution in Example 5(b). This leads to the functional
definition of SinSum, shown in Example 5(c), which is a vast improvement over
the procedural program presented in Example 3(a).
As an aside, note that the summation function is one of the built-in functions
supplied with Mathematica. We can therefore rewrite our definition by using
the built-in summation function; see Example 5(d).


Pure Functions


One of the remarkable features of most functional programming languages is the
notion of a "nameless," pure function, such as the Lisp expression (lambda(x)
(*xx)). This example uses the lambda form to define an anonymous function that
returns the square of its argument.
A lambda function can be applied to an argument, for example:((lambda (x)
(*xx))5). Here, evaluation of this function will produce a result of 25. The
lambda capability makes it unnecessary to give functions a name; functions can
be defined on-the-fly, as needed. They can be passed as arguments to other
functions or even returned as values from functions. In this sense they are
first-class objects in the language.
In Mathematica, our example would be written as Function[x, x^2] and applied
to the argument 5 like this: Function [x, x^2][5]. Together with Map, the
function that applies functions to the elements of a list, we can write a
short definition to square all elements of a list; see Example 6.
Example 6: Squaring elements of a list using the Map function

 In[1] := SquareList [l_] := Map[ Function[x, x^2], l ]

 In[2] := Squarelist[ {1, 5.5, a, Sqrt[2]} ]

 Out[2] = {1, 30.25, a2, 2}



Operations on Structured Data


APL's strength lies in using operators to work on data structured as arrays,
matrices, and so on. Most of the APL operators can be made available in
Mathematica, but not in the form of single characters. Rather, these operators
are simply implemented as functions.
Here's an example of defining an APL operator, one not built into Mathematica.
The reshape operator takes an arbitrary structure and rearranges it such that
it has certain dimensions.
Consider the data shown in the first statement of Example 7(a), which is a
structure nested three levels deep. The dimensions are shown by the second
statement of Example 7(a). The reshape "operator" rearranges the data into a
matrix with the dimensions 4x3, as shown in Example 7(b) .
Example 7: (a) A three-level data structure that will be reshaped; (b) the
data structure after reshaping; (c) the definition for the Reshape function;
(d) the steps in reshaping, as shown by the debugging trace facility.

 (a)

 In[1] := data = {{{a1, a2}, {b1, b2}, {c1, c2}},
 {{d1, d2}, {e1, e2}, {f1, f2}}};

 In[2] := Dimensions[ data ]

 Out [2] = {2, 3, 2}

 (b)

 In[3] := Reshape[ data, {4, 3} ]

 Out[3] = {{a1, a2, b1}, {b2, c1, c2}, {d1, d2, e1},

 {e2, f1, f2}}

 (c)


 Reshape[list_, dims_] := reshape[ Flatten[list], dims ]

 reshape[list_, {n_}] := list /; Length[list] == n

 reshape[list_, {head__, n_}] := reshape[ Partition[list, n], {head} ]

 (d)

 In[4] := Trace[Reshape[ data, {4, 3} ], reshape]

 Out[4] = {reshape[Flatten[{{{a1, a2}, {b1, b2}, {c1, c2}},

 {{d1, d2}, {e1, e2}, {f1, f2}}}], {4, 3}],

 reshape[{a1, a2, b1, b2, c1, c2, d1, d2, e1, e2, f1,

 f2}, {4, 3}], reshape[Partition[{a1, a2, b1, b2, c1,

 c2, d1, d2, e1, e2, f1, f2}, 3], {4}],

 reshape[{{a1, a2, b1}, {b2, c1, c2}, {d1, d2, e1},

 {e2, f1, f2}}, {4}],

 {{a1, a2, b1}, {b2, c1, c2}, {d1, d2, e1},

 {e2, f1, f2}}}

The function Reshape[data, dims] can easily be written recursively on the list
of dimensions. The last element of this list is the length of the innermost
parts of the resulting structure. The built-in function Partition[list, size]
groups elements of list into sublists of length size. With its recursion on a
list, the resulting definitions have a distinct, Lisp-like flavor; see Example
7(c).
First, the old structure that the input might have is destroyed with Flatten,
making it into a linear list. The work is then done in the auxiliary function
reshape. (Note the lowercase r.) The boundary case (when the list of
dimensions has just a single element) merely tests whether the length of the
structure is correct. Nothing else needs to be done.
A trace of the reshape program is shown in Example 7(d), showing the steps
followed as the rules for reshape are applied. The tracing facility in
Mathematica can provide a list of the intermediate steps in an evaluation
sequence. In this case, the arguments to reshape are first simplified (the
Flatten and Partition commands are carried out), and then the rules for
reshape itself take effect.


Modularization


To organize large programming projects, modern programming languages use
facilities such as modularization, encapsulation, and information hiding. In
Mathematica, modular program components are known as "packages." Like the
facility found in Modula and Object Pascal, a package consists of two parts:
an interface definition and an implementation part. The interface section
describes the aspects that a client needs to know. The implementation section
provides that functionality, but keeps the details hidden from users of the
packages (the clients of the interface).
Even with such a small example as the function Reshape[] (from the preceding
section), it makes sense to write a package. As written in package form, the
reshape function is shown in Example 8. The part between the initial
BeginPackage[] and Begin ["'Private'"] is the interface. It declares all
functions exported from this package (in this case, only one) and documents
them. The part between Begin and End is the implementation. The auxiliary
function reshape is not exported. It will not be available to users of this pa
kage. The command Protect[ Reshape ] prevents any modification of the exported
function Reshape by the user of our package.
Example 8: The reshape function in package form

 BeginPackage ["Reshape'"]

 Reshape::usage = "Reshape[list, dims] rearranges list as a nested
 structure with dimension dims."

 Begin["'Private'"]

 reshape[list_, {n_}] := list /; Length[list] == n
 reshape[list_, {head__, n_}] := reshape[ Partition[list, n], {head} ]
 Reshape[list_, dims_] := reshape[ Flatten[list], dims ]

 End[]

 Protect[ Reshape ]

 EndPackage[]

You load a package from disk into a Mathematica session by typing: <<
Reshape'. The notation used is machine independent; Mathematica translates the
context name Reshape' into a valid filename on any computer. The commands in
that package are read in, just as if they had been typed in response to input
prompts.



Conclusion


The rule-based paradigm in Mathematica's programming language provides
surprising flexibility in supporting a range of programming styles and
problem-solving strategies. It is especially suited for interactive
experimentation, and allows code to be written in a way that matches closely
the natural formulation of the problem to be solved.


References


Gray, Theodore and Jerry Glynn. Exploring Mathematics with Mathematica.
Reading, Mass.: Addison-Wesley, 1991.
Gradshteyn, Izrail Solomonovich and Iosif Moiseevich Ryzhik. Table of
Integrals, Series and Products. San Diego, Calif.: Academic Press, 1965.
Maeder, Roman E. Programming in Mathematica, Second Edition. Reading, Mass.:
Addison-Wesley, 1991.
Skiena, Steven S. Implementing Discrete Mathematics: Combinatorics and Graph
Theory with Mathematica. Reading, Mass.: Addison-Wesley, 1990.
Wagon, Stan. Mathematica in Action. New York, N.Y.: W.H. Freeman, 1990.
Wolfram, Stephen. Mathematica: A System for Doing Mathematics by Computer,
Second Edition. Reading, Mass.: Addison-Wesley, 1991.













































April, 1992
PROGRAMMING PARADIGMS


The First Programmer




Michael Swaine


My friend Betty Toole just published the book she's been working on for years:
Ada, The Enchantress of Numbers (Strawberry Press, P.O. Box 452, Sausalito, CA
94966). Although it sounds like a biography, it is in fact a selection from
the letters of Ada Lovelace, and from her description of the Analytical
Engine, the first computer ever designed.
This is basic research material rather than an entertaining story, although
the story is there. So on the assumption that many DDJ readers won't get
around to reading it, I'll tell you what I got out of it (and out of the
ancillary research it inspired).


The Analytical Society


In 1834, at the age of 41, Charles Babbage was a bright light in the English
scientific community. His Saturday soirees were attended by scientists such as
Michael Faraday and the eccentric electrical experimenter Andrew Crosse, as
well as the Duke of Wellington and Charles Dickens. The least notable attendee
may have been a medical-school dropout whose family despaired of the long
hours he spent wandering in the fields -- young Charles Darwin.
While still a student at Cambridge, Babbage formed the Analytical Society with
two friends, George Peacock and John Herschel. Their ambitious goal was to
break the 100-year stranglehold of Newton's notation on English algebra. A
proper notation, they knew, would make it easier to separate real abstract
mathematics from mere computation. They were successful: Modern algebra uses
notation more in line with Leibnitz's than with Newton's. Herschel, the son of
the leading astronomer of his time, followed in his father's footsteps and
later became president of the Royal Astronomical Society. Peacock continued in
mathematics and wrote an influential text on abstract algebra. By training,
Babbage was also a mathematician, but he became an inventor.
It was an age of invention.
Technically, it was pre-Victorian England, an era when Romantic poets
confronted the industrial revolution. A generation before, the Luddites had
protested industrialization, and in his maiden speech before Parliament, Lord
Byron had sounded an alarm over the dangers of technology. Charles Dickens,
one of Babbage's friends, wrote satires on factory conditions.
It was a world about to change, and aware of it. Most of the trappings of
modern civilization did not yet exist: People still rode in buggies, read by
gaslight, and communicated by letter. But railroad tracks were being laid,
Faraday had built the first electric generator, and telegraph lines would be
strung from London to outlying districts within the decade. Another Analytical
Society friend of Babbage's, William Whewell, coined the word "scientist" in
this same decade.
In this age of science and invention, Babbage turned to invention. Some of his
inventions were process innovations: He demonstrated mathematically that
postage should not be based on the distance a letter travels, and he created
the first actuarial table. Other inventions were mechanical devices: an
ophthalmoscope and two devices symbolic of the way the world was going -- the
speedometer and the locomotive cowcatcher. He was highly thought of and
thought highly of himself and his ideas. Although impatient with fundraising,
he managed to get funding for a mathematical invention, the Difference Engine.
The Difference Engine was essentially a calculator. It was not the first:
Blaise Pascal had built that 200 years earlier, and Leibnitz built one 31
years after Pascal. Leibnitz had also championed the binary system, and it was
his algebraic notation that the Analytical Society persuaded English
mathematicians to adopt.
But the Difference Engine was a step beyond these early calculators. Pascal's
was an adding machine; Leibnitz's also had the power to multiply and divide.
The Difference Engine performed these basic operations, but could also compute
powers of numbers and roots of quadratic equations. And it was intended to be
a practical device. It was about the size of a high-end workstation today: It
fit comfortably on or under a workbench. Babbage had also given thought to
throughput, and proudly demonstrated that a calculation could be performed
every nine seconds.
Then in 1834, Babbage abandoned the Difference Engine for a more ambitious
project: the Analytical Engine. If the Difference Engine was a calculator, the
Analytical Engine was a computer. It could be controlled by input instructions
on punched cards and fed data on other punched cards; store the results of
calculations to be used in further calculations; and produce output in three
forms: type mold, printed text, or punched cards. The punched cards were an
idea Babbage got from the Jacquard loom, which had been in use in England
since 1804. The Jacquard loom allowed the design of fabric to be entered on
its cards, and was regarded at the time as one of the marvels of the
industrial revolution.
The Analytical Engine was much more marvelous than the loom, but it had a
defect. It didn't exist. It was just a plan, eventually spread across
thousands of pages of diagrams, designs, explanations, and examples. Babbage's
voluminous notes demonstrate to those who can wade through them that he
understood precisely how to construct a digital computer. But he had no
working model or even a set of engineering plans he could turn over to
technicians to execute. He needed someone to transform his notes and ideas
into something others could work from, and he needed funding to build the
machine.
It was a teenage girl who proved to be the key to both.


Society Woman


Augusta Ada Byron, later Lady Lovelace, was born to the titled class of
English society. She was also, not incidentally, the daughter of a poet and a
mathematician.
Although titled and monied, her father, George Gordon, Lord Byron, was not
your typical patrician gentleman. Like his ancestors Foul Weather Jack, the
Wicked Lord, and Mad Jack (Byron's father), he had a wild streak. He was
enormously popular. (The only contemporary analogy is with rock musicians, and
it's not a bad one.) But not with everyone: For reasons uncertain and probably
meaningful only to Victorian minds, he became unpopular with his wife, and
they separated soon after the birth of their only daughter, who always went by
her middle name, Ada. Byron left the country, to spend most of the rest of his
life in Greece and Italy.
Ada was raised by her mother, a brilliant if uptight mathematician. (One frank
friend described her as an "icicle.") Lady Byron saw to it that Ada got an
excellent if eccentric education, rather heavy on mathematics. Ada showed
unmistakable signs of genius early on. Her earliest letters, starting at age
five, mix adult diction and grammar with childish concerns and, occasionally,
spelling errors.
Ada died at age 36, after bearing and raising three children and honoring the
time-consuming social obligations of her station in society. Throughout her
childhood and adult life, she was often incapacitated by illness. All of this
left little time for what she earnestly considered her profession.
Usually she considered that profession mathematics. With 100 years of
hindsight, Ada is seen as the first person to spell out the working and uses
of the digital computer in a language an average educated reader could
understand. That would make her the first documentation writer, and that's
probably technically correct. But she is sometimes referred to as "the first
programmer," and while that's a distortion of the facts, it's a distortion
worth entertaining tentatively for the insight it gives into a recurring
question about the nature of programming.


Ada Meets Babbage


In the week after her 19th birthday, Ada's mother took her to visit Charles
Babbage, the eccentric genius whom Ada had met at a party some months earlier.
It wasn't their first visit; Babbage had entertained Ada in earlier visits
with his Difference Engine, which she much preferred to the mechanical doll he
tried to show her. On this occasion, though, Babbage was afire with a new idea
about a different kind of machine, a device that would "foresee [and] act on
that foresight." The notion was not two months old and perhaps not too well
thought out. In any case, Lady Byron, no mean mathematician, later dismissed
Babbage's talk as "unsound and paradoxical." She approved of Babbage's
mechanical devices, but thought his metaphysical bent a bad influence on her
daughter. Ada listened to Babbage and envisioned a new world.
For a time, Ada was distracted by marriage and three quick pregnancies. But
she was soon back at her studies. In the 1800s women did not go to university,
but some, like Ada and her mother before her, got a good, if idiosyncratic,
education by studying independently. Ada asked Babbage to recommend a
mathematics tutor, and he recommended August De Morgan. De Morgan, who had
studied under Babbage's friend Peacock, helped to establish the foundations of
abstract algebra, both by his own efforts and by encouraging a young
mathematician, George Boole, to go further. His tutoring relationship with Ada
was highly informal: Ada wrote to him when she got stuck, and was otherwise on
her own.
Meanwhile, an Italian engineer named Menabrea heard Babbage lecture on the
Analytical Engine and published a paper on it in French in a Swiss journal.
Babbage mentioned the piece to Ada, who immediately translated it into English
and sent it off to Babbage. He encouraged her to write more, and she set out
to revise and write notes to the translation. These notes turned out to be
longer than the article itself.
The translation and the notes are Ada's contribution to the understanding of
the digital computer. Nothing more. What she accomplished was no mean feat:
Babbage's papers on the Analytical Engine had grown to thousands of pages. But
that feat is all she is remembered for, all she accomplished of historical and
scientific importance.
It was enough.


Ada's Insights


Ada not only described the workings of the machine, she also applied her
considerable imagination to figuring out what could be done with it.
Not the least important of Ada's contributions was her improvement on
Menabrea's notation. In her notes, Ada discussed the importance of a good
abstract notation, which of course was the lesson that the Analytical Society
taught English mathematics. It was also arguably the basis of De Morgan's
enduring work and the basis for understanding what a computer really does.
She pointed out the importance of distinguishing the symbol for the function
from the symbol for the result of the application of the function. This
distinction made, she was able to follow one of its implications: that the
Analytical Engine could produce as output not only numbers but also symbols.
These symbols could then be used to control the operation of the machine. In
other words, she realized without having today's terminology to express it
that a program can produce another program as output.

I write imprecisely but not, I hope, misleadingly when I write of
"programming" the Analytical Engine. Although one did not write programs in a
sense we'd understand today, the Analytical Engine was programmable. Ada wrote
a program, for example, to compute the Bernoulli numbers. Writing such
programs involved specifying the order in which operation and variable cards
were to be fed into the machine.
In this sense, Babbage also wrote programs for the Analytical Engine, so he
could be called the first programmer. But Babbage's programs were selected by
the inventor to demonstrate the operation of his invention. Ada approached the
writing of programs with this goal as well, but she also brought to it some
more-applied thinking, asking what real-world problems could be solved by the
computer and what it was good for. Because she so earnestly wanted a
profession, perhaps we can think of Ada Lovelace as the first professional
programmer.
Toole asked Ada programmer Rick Gross to annotate Ada's notes, connecting them
with current thinking in programming practice and methodologies. As explained
by Gross, Ada foresaw many of today's software concepts:
She drew a distinction much like the contemporary one between specification
vs. implementation.
She made explicit the concept of reuse of software components.
She recognized and accentuated those capabilities of the Analytical Engine
that allowed branching on conditions and variable iterations: the basis of IF
and REPEAT constructs.
She demonstrated an understanding of the complexity, design, and
synchronization issues in parallel operation. The Analytical Engine wasn't a
parallel computer, but, like any computer, it embodied some parallel
processes.
She understood the limits of computers and expressed them in terms reassuring
to the Victorian mind, which had been frightened by that novel by her father's
friend, Mary Shelley.
She probed possible uses of computers: metaphysical, utilitarian, musical.


The Mind of a Programmer


This much of the story has been told elsewhere; in fact I drew on several
sources in sketching this story of the Analytical Engine and the contributions
of Ada Lovelace. But Toole's book, because it is largely Ada's words, offers
an insight into the mind of the first professional programmer.
There are questions that recur because they are so trivial that we keep
forgetting the answers. But some questions recur because they are profound,
and there is more to learn from asking them than from any answer we can make.
Here's one question that keeps coming up in programming magazines and software
development conferences. Is programming a science or an art? A lot of people
today are laboring to see programming become an engineering discipline. Others
argue that, however noble that quest may be, it's futile. Programming, they
say, is a craft or an art.
No doubt both views are right. It even seems that programming, more than any
other human endeavor that springs immediately to mind, is a blend of rigor and
art, science and poetry.
Rigor and art, science and poetry: Human thought was being fiercely tugged at
by these conflicting urges during Ada's short life. And Ada's life is a
self-conscious archetype of this conflict and the synthesis that can grow out
of it.
You couldn't ask for a purer symbol of the wild, untrammeled spirit of art
than the mad, bad, Romantic poet Lord Byron. And you couldn't ask for a better
example of the actuarial mindset than the rigid mathematician Lady Byron.
Although Ada's father is better known to history than her mother, Lady Byron
was immortalized, albeit in grotesque caricature, in Lord Byron's satirical
Don Juan. In this poem she appears as Donna Inez, whose "thoughts were a
theorem, her words a problem."
Not many children have to deal with this kind of public conflict between their
parents. But although Ada accepted the public view that her father was a bad
person, she fully expected and wanted to show both aspects of her ancestry:
She wrote of wanting to create a poetical science or a scientific poetry. And
she did embody both her parents' distinct geniuses. Being highly intelligent,
perceptive, and self-analytical, she was conscious of the fact, almost to the
point of obsession. Her letters are full of the kinds of questions any
reflective programmer entertains sometimes, questions about the relationship
between art and science. But with Ada art and science had names: Father and
Mother.
Science was still coming into existence in Ada's time, and in her letters we
see the relationship of science and art analyzed by a first-rate intellect
who, because of her own unique heritage and circumstances, was incapable of
ignoring the matter. Writing to her mother, she asked, "You will not concede
me philosophical poetry. Invert the order! Will you give me poetical
philosophy, poetical science?" In that wish, she may still be ahead of her
time.








































April, 1992
C PROGRAMMING


The D-Flat Menu System


 This article contains the following executables: DFLT11.ARC


Al Stevens


This month we add menus to D-Flat. Applications use menus to present command
selections to the user. The menu lists commands from which the user may
choose. A command tells the program to do something, and that's how the users
exercise the functions and features of an application.
We are nearing the end of this series. The D-Flat project is almost a year
old, and it will run another four months. If you started at the beginning, you
have absorbed a lot of knowledge. As a side effect, you have learned about
event-driven, message-based programming, a good lesson because that is how
much of tomorrow's programming will be done -- perhaps wrapped in an
object-oriented shroud as well. There is a lot more to know about programming
than there used to be. In fact...


The Myth of the Obsolete Programmer


Programming ain't gettin' any easier. That's not how it was supposed to be,
however. Not long ago, we were hearing that 4GLs were going to make us
programmers obsolete. About ten years ago James Martin, the self-anointed
systems guru, wrote a ridiculous book called Applications Development Without
Programming. I started reading the book with some dismay, I must tell you. I
had respect for Martin's earlier works on database technology, and here he was
predicting that programmers would be unnecessary because users would be
writing their own programs. The anxiety was premature, however. My fears for
the future were allayed when I got to the part where Martin suggested some
contemporary languages with which users would soon obviate programmers. One of
them was APL. I stopped reading and gave the book to Fast Eddie Dwyer, who
used it for a door stop.
Now, a decade later, we are no closer to user-written programs than we were
when James Martin made his silly predictions. Why? Because to be a complete
programmer you still need to know how to design, code, test, and install a
program, and for that you need skill, discipline, and structure, and all that
is getting harder, not easier.
To begin with, programming takes more computer hardware than it used to. I
have several of the latest C++ compilers for the PC. Guess what? They do not
work without a lot of hardware. For example, some of them need 2 or more
Mbytes of extended memory. Some run under Windows, which itself needs plenty
of gear. You cannot install some of the compilers without at least 40 spare
Mbytes of hard disk. Most of the new breed of compilers are much bigger than
their forebears -- bigger even than the hard disk of one of my computers. They
look like Weird Al Yankovic in his video. Why so big? New features, mostly. It
takes new features for a compiler to continue to tap the sap of the follow-on
market. You don't sell upgrades without new features. New platforms, too.
Compilers need to support Windows, extended DOS, and protected mode. And then
there is the new C++ paradigm, which is bigger than C.
All that feature-laden size adds language, functions, types, and so on, which
turn into high-octane complexity. The ways of doing things change so rapidly
that you struggle to find the reusable parts of a program. The user interface
in a typical program takes up most of its code now. I've listened to
programmers talk about porting Windows programs to the Mac and back. They
can't ever find the application buried in all that incompatible user
interface/file manager/database engine/ environment-dependent code. They wind
up rewriting the program.
User-written programs, indeed. Just imagine your doctor, accountant, and car
salesperson each installing DOS, Windows, a C++ compiler, function and class
libraries, a resource editor, debuggers, profilers, a database engine, a CASE
tool, and the very proper, compatible, extended and expanded memory managers.
Now imagine them dutifully and patiently plugging away, encapsulating,
instantiating, polymorphizing, and getting their applications designed,
written, tested, and up and running -- during which time the patients croak,
the clients go broke, and the cars all rust on the lot.
No, programming ain't gettin' any easier, and that's good. They'll be needing
us for a spell to come, I'll hazard a guess.


D-Flat Menus


D-Flat implements the Common User Access (CUA) specification, which identifies
standard menus, standard menu commands, and the effect and behavior of each.
The specification further provides for custom menus and commands that are
unique to the application. The CUA menu system starts with a menu bar that
extends across the top of the application window, directly under the title bar
and above the data area. The menu bar is one character line high, and it has
labels that identify the application's menus. When you select a menu label
from the menu bar, a pop-down menu pops down under the label. Then you select
a command from the pop-down menu.
In September of last year, this column showed how a programmer defines the
menu bar and its pop-down menus by coding a file named menu.c. This file
contains macro calls the C compiler's preprocessor translates into structure
declarations and initializations by using the macros defined in menu.h, which
I discussed in September, too. This month, I show you the code from the D-Flat
library that implements the menus.


The Menu Bar


Listing One, page 150, is menubar.c, the code that implements the menu bar.
When you create the application window, you specify the name of the menu you
described in menu.c. The code for the APPLICATION window class creates a
window of the MENUBAR class. The MENUBAR window uses the named menu structures
to define its contents. You will learn about the APPLICATION window class next
month. The menu bar becomes a part of the application window and displays when
the application window displays.
The program in Listing One consists of a window-processing module for the
MENUBAR class and some functions that process each of the window's messages.
The CREATE_WINDOW message allocates memory for the menu bar and sets that
memory to spaces. The SETFOCUS message sets the menu bar's active selection to
the first selection if one is not selected. It paints the window and tells its
parent to clear the status bar if the menu bar is giving up the focus.
The APPLICATION window sends the BUILDMENU message to the menu bar to tell it
to add pop-down menus to itself. The parameter in the BUILDMENU message is the
address of the menu structure defined in menu.c. The message begins by
building a blank menu bar. Then it builds a table of menu selections with an
entry for each of the pop-down menus defined in the structure. The table
records the beginning and ending x-coordinate positions of the menu label in
the menu bar and the character value of the menu selection's shortcut key.
That value is defined in the structure's text with a tilde (~) ahead of the
shortcut-key character. The character displays in a highlight color on the
menu bar. Each menu-bar selection causes the length of the menu-bar's display
string to grow to accommodate the control strings that change and reset the
colors around the shortcut key. The CopyCommand function copies the menu
selection's name into the menu-bar display string, inserting the color
controls where it finds the tilde.
The PAINT message calls the wputs function to display the menu bar. Then, if
the menu bar has the focus or if a pop-down menu is being displayed, the
message paints the label of the current active menu-bar selection in the
reverse colors of the menu bar. If the menu bar has the focus and no pop-down
menu is being displayed, the message displays the menu-bar selection's help
text in the application window's status bar by sending the ADDSTATUS message.
When the user presses a key, the D-Flat message system captures the keyboard
event, turns it into a KEYBOARD message, and sends the message to the window
that has the focus. If the window processes the keystroke, it returns. If not,
the window passes the message up the window class hierarchy. Each class in the
hierarchy to which the window belongs gets a crack at the KEYBOARD message.
Remember that this is all in the name of the in-focus window. At the top of
the hierarchy is the NORMAL window class. If the NORMAL class does not process
the keystroke, this particular window is not interested in the keystroke, and
the NORMAL window-processing module sends the KEYBOARD message to the parent
window of the window that has the focus. The pass through the hierarchy
repeats, this time tracing the class hierarchy of the parent window class. The
message passes this way through the classes of each parent and then through
the generations from parent to parent until either one of the windows
processes the message or the applications window receives the message in the
window-processing module of the APPLICATION class. If the application window
does not process the message, the window passes the message to the
application's menu- bar window. That's how the menu bar eventually gets
keystrokes that no other window wants. The message makes a long trek on a busy
screen. Why, the whole thing could take microseconds.
KEYBOARD messages that make it all the way to the menu bar might be shortcut
keys for the menu-bar labels or accelerator keys for the pop-down menus
themselves. If the value of the keystroke matches one of the menu-bar shortcut
keys and the Alt key is down, the program sends the MB_SELECTION message to
the menu-bar window, telling the window to pop down the selected pop-down menu
just as if the user had clicked it with the mouse. If the value of the
keystroke matches one of the accelerator keys on a pop-down window, the
program does what the POP-DOWNMENU window does when the user chooses the
corresponding selection from the pop-down menu. The program inverts the toggle
setting of a TOGGLE menu selection, returns the focus to the most recent
document window that had the focus, and sends the COMMAND message to the
application window along with the command code that matches the menu
selection.
The KEYBOARD message for the menu-bar processes some other key values. The F1
key is the help key, and each menu label has an associated help screen. If no
pop-down window is open and the user presses F1, the program calls the
DisplayHelp function with the active menu-bar label as the help parameter. If
the menu bar has the focus but no pop-down menu is open, the Enter key means
that the user wants to pop down the menu of the selected menu-bar label. The
program sends the appropriate MB_SELECTION message to the menu bar. The user
can use the F10 key to toggle the focus between the menu bar and a document
window, and the Esc key to move the focus from the menu bar to the most recent
document window. The right- and left-arrow keys, FWD and BS, change the
selected menu label on the menu bar. If a pop-down menu is open, these keys
send the MB_SELECTION message as well to open a different pop-down menu.
When the user clicks the left button on a menu-bar label, the menu-bar window
translates the LEFT_BUTTON message coordinates into the chosen label and sends
the MB_SELECTION message to the menu bar.
The MB_SELECTION message tells the menu bar that the user has chosen a
pop-down menu. If the pop-down menu has a preparation function defined, the
program calls that function first. If the second parameter of the MB_SELECTION
is true, the chosen menu is a cascaded menu, one that cascades down from the
selection of another menu. For example, the Tabs selection on the Options menu
of the example memopad program calls a cascaded menu of tab settings. The
program computes the screen location of a cascaded menu as a function of the
location of the pop-down menu from which it cascades. The position of menus
that do not cascade is computed from the position of the selection's label on
the menu bar.
If a pop-down menu is already open when the MB_SELECTION message arrives, and
the new menu is not being cascaded, the program closes the current one. Next,
it creates the new pop-down menu window. If the pop-down menu has no selection
text, then the program does not display the menu. The Window menu will have no
text if there are no document windows open, for example. If there are
selections, however, the program sends the BUILD_SELECTIONS and SETFOCUS
messages to the pop-down menu.
When the user chooses a selection from a menu, the POPDOWNMENU window sends
the COMMAND message to the menu bar with the associated command identification
code as a parameter. If the command points to a cascaded menu, the program
sends the menu bar the MB_COMMAND message with the menu code for a parameter
and a true value for the second parameter. If the chosen command does not
point to a cascaded menu, the program closes the pop-down menu window, sends
the SETFOCUS message to the document window that had the focus before the user
selected the pop-down menu, and posts the COMMAND message to the application
window.
When the POPDOWNMENU window closes, it sends the CLOSE_POPDOWN message to its
parent, in this case the menu bar. If the pop-down menu is a cascaded menu,
the menu bar sends its parent a CLOSE_WINDOW message. Otherwise, the menu bar
cleans house by setting the active selection to a null value, returning the
focus to the most recent in-focus document window, and repainting itself to
turn off the highlight on the selection label. When the menu bar closes, it
frees the memory allocated for its display and sets its pointers to null
values.


The Pop-down Menu


Listing Two, page 152, is popdown.c, the code that implements the POPDOWN-MENU
window class. A pop-down menu window derives from the LISTBOX class. The
POPDOWNMENU class adds some unique message processing to give the menu list
box the appearance and behavior of a pop-down menu. The CREATE_WINDOW message
captures the mouse and keyboard into the window. When a pop-down window is
opened, no other window can take over until the pop-down window turns things
loose.
A list-box window uses a single click to position the mouse cursor on the
selection and a double click to choose the selection for further action. A
pop-down menu ignores double clicks and uses the single click to make the
choice. The program sends the LB_SELECTION message to position the selection
cursor bar when the pop-down menu window gets the LEFT_BUTTON message. Then,
when the window gets the BUTTON_RELEASED message, the program sends the
LB_CHOOSE message to indicate the user's choice.
Each time a pop-down menu window gets a PAINT message, it rebuilds the
text-display buffer. This is because the display characteristics of the
selections change, depending on whether the selections are active or not. An
inactive selection displays with characters of lesser intensity. The BORDER
message temporarily disables the in-Focus variable so that D-Flat will display
a single-line border, the usual border for windows that do not have the focus.
Then it changes the vertical border characters for those lines on the menu
that have separator bars between the selections.

The LB_SELECTION message rejects itself if the user selects a separator bar on
the menu. Otherwise, it allows the base LISTBOX class to process the message.
The LB_CHOOSE message means that the user has chosen a menu selection to
execute. If the command is currently inactive, the program calls the beep
function to sound a warning. Otherwise, the program inverts the command's
check-mark display if the command is a toggle, and then sends the COMMAND
message with the command's identification code to the pop-down menu's parent
window (in most cases, the menu bar).
A pop-down menu selection may have a shortcut key and an accelerator key. The
KEYBOARD message tests the keystroke to see if it is one of these. If so, the
program sends the LB_SELECTION and LB_CHOOSE messages to the pop-down menu
window.
The KEYBOARD message can do one of several other things, depending on the
keystroke. The up- and down-arrow keys work like other list boxes, except that
the pop-down menu assumes that the entire display fits in the window and does
not scroll. Instead, the cursor wraps from bottom to top and top to bottom.
The right- and left-arrow keys mean that the user wants to close the current
pop-down menu and open the one to the right or the left. Because the menu-bar
window handles that, the pop-down window passes those keys to the menu bar.
The F1 help key calls the Display-Help function, passing the help
identification of the current menu selection. The Esc key closes the pop-down
window.
The CLOSE_WINDOW message releases the mouse and the keyboard and sends the
CLOSE_POPDOWN message to the pop-down menu window's parent.


The Menu API


Listing Three, page 153, is menu.c, which contains functions an applications
program can use to inquire about and change certain characteristics of the
D-Flat menus and their commands. The FindCmd function is local to the source
file. The other functions use it to get a pointer to the PopDown structure
that supports a specified command identification.
The isActive function returns true if the command on the menu is active. The
GetCommandText function returns the address of a menu command's title text.
The ActivateCommand and DeactivateCommand functions activate and deactivate a
command on a menu.
Some menu commands are toggles rather than process executors. Examples are the
Insert and Word wrap commands on the Options menu. The GetCommandToggle,
SetCommandToggle, ClearCommandToggle, and InvertCommandToggle functions get,
set, clear, and invert the toggle setting for a specified command on a
specified menu.


The System Menu


The CUA specification defines the system menu for all windows. It is a menu
the user calls by selecting the control box in the upper-left corner of the
window. It includes commands to move, size, restore, minimize, and maximize
the window, and its purpose is to allow a user without a mouse to perform
those operations with the keyboard.
Listing Four, page 154, is sysmenu.c, the code that implements the standard
system menus that associate with most other window types. D-Flat calls
BuildSystemMenu when the user presses the Alt+minus or Alt+spacebar keys or
clicks a window's control box. The function computes the system menu's
position from the location of the parent window's control box and creates the
system-menu window. Then it uses the ActivateCommand and DeactivateCommand
functions to activate and deactivate the system-menu commands based on the
condition of the parent window. You can't restore a window that is not
minimized or maximized, for example. Once the commands are prepared, the
program sends the BUILD_SELECTIONS and SHOW_WINDOW messages to the menu window
to get it built, displayed, and in control.
The SystemMenuProc function is the window-processing module for the system
menu. The CREATE_WINDOW message changes the active menu bar to the dummy one
associated with the system menu. The LEFT_BUTTON message gets intercepted when
it hits the control button of the parent window. The LB_CHOOSE message closes
the window. If the DOUBLE_CLICK message hits the control box of the parent,
that is the same as closing the window, so the program posts the DOUBLE_ CLICK
message to the parent so it will close itself, and then closes the system-menu
window.


How to Get D-Flat Now


The D-Flat source code is on CompuServe in Library 0 of the DDJ Forum and on
M&T Online. If you cannot use either online service, send a formatted 360K or
720K diskette and an addressed, stamped diskette mailer to me in care of Dr.
Dobb's Journal, 411 Borel Ave., San Mateo, CA 94402. I'll send you the latest
version of D-Flat. The software is free, but if you care to, stuff a dollar
bill in the mailer for the Brevard County Food Bank. They help homeless and
hungry families here in my home town. We've collected over $1000 so far from
generous D-Flat "careware" users. If you want to discuss D-Flat with me, use
CompuServe. My CompuServe ID is 71101,1262, and I monitor the DDJ Forum daily.
Next month we'll discuss the D-Flat APPLICATION window class.


_C PROGRAMMING COLUMN_
by Al Stevens


[LISTING ONE]


/* ---------------- menubar.c ------------------ */

#include "dflat.h"

static void reset_menubar(WINDOW);

static struct {
 int x1, x2; /* position in menu bar */
 char sc; /* shortcut key value */
} menu[10];
static int mctr;

MBAR *ActiveMenuBar;
static MENU *ActiveMenu;

static WINDOW mwnd;
static BOOL Selecting;

static WINDOW Cascaders[MAXCASCADES];
static int casc;
static WINDOW GetDocFocus(WINDOW);


/* ----------- SETFOCUS Message ----------- */
static void SetFocusMsg(WINDOW wnd, PARAM p1)
{
 if ((int)p1 && ActiveMenuBar->ActiveSelection == -1)
 ActiveMenuBar->ActiveSelection = 0;
 SendMessage(wnd, PAINT, 0, 0);
 if (!(int)p1)
 SendMessage(GetParent(wnd), ADDSTATUS, 0, 0);
}

/* --------- BUILDMENU Message --------- */
static void BuildMenuMsg(WINDOW wnd, PARAM p1)
{
 int offset = 3;
 reset_menubar(wnd);
 mctr = 0;
 ActiveMenuBar = (MBAR *) p1;
 ActiveMenu = ActiveMenuBar->PullDown;
 while (ActiveMenu->Title != NULL &&
 ActiveMenu->Title != (void*)-1) {
 char *cp;
 if (strlen(GetText(wnd)+offset) <
 strlen(ActiveMenu->Title)+3)
 break;
 GetText(wnd) = realloc(GetText(wnd),
 strlen(GetText(wnd))+5);
 memmove(GetText(wnd) + offset+4, GetText(wnd) + offset,
 strlen(GetText(wnd))-offset+1);
 CopyCommand(GetText(wnd)+offset,ActiveMenu->Title,FALSE,
 wnd->WindowColors [STD_COLOR] [BG]);
 menu[mctr].x1 = offset;
 offset += strlen(ActiveMenu->Title) + (3+MSPACE);
 menu[mctr].x2 = offset-MSPACE;
 cp = strchr(ActiveMenu->Title, SHORTCUTCHAR);
 if (cp)
 menu[mctr].sc = tolower(*(cp+1));
 mctr++;
 ActiveMenu++;
 }
 ActiveMenu = ActiveMenuBar->PullDown;
}

/* ---------- PAINT Message ---------- */
static void PaintMsg(WINDOW wnd)
{
 if (wnd == inFocus)
 SendMessage(GetParent(wnd), ADDSTATUS, 0, 0);
 SetStandardColor(wnd);
 wputs(wnd, GetText(wnd), 0, 0);
 if (ActiveMenuBar->ActiveSelection != -1 &&
 (wnd == inFocus mwnd != NULL)) {
 char *sel;
 char *cp;
 if ((sel = malloc(200)) != NULL) {
 int offset=menu[ActiveMenuBar->ActiveSelection].x1;
 int offset1=menu[ActiveMenuBar->ActiveSelection].x2;
 GetText(wnd)[offset1] = '\0';
 SetReverseColor(wnd);
 memset(sel, '\0', 200);

 strcpy(sel, GetText(wnd)+offset);
 cp = strchr(sel, CHANGECOLOR);
 if (cp != NULL)
 *(cp + 2) = background 0x80;
 wputs(wnd, sel,
 offset-ActiveMenuBar->ActiveSelection*4, 0);
 GetText(wnd)[offset1] = ' ';
 if (!Selecting && mwnd == NULL && wnd == inFocus) {
 char *st = ActiveMenu
 [ActiveMenuBar->ActiveSelection].StatusText;
 if (st != NULL)
 SendMessage(GetParent(wnd), ADDSTATUS,
 (PARAM)st, 0);
 }
 free(sel);
 }
 }
}

/* ------------ KEYBOARD Message ------------- */
static void KeyboardMsg(WINDOW wnd, PARAM p1)
{
 MENU *mnu;
 if (mwnd == NULL) {
 /* ----- search for menu bar shortcut keys ---- */
 int c = tolower((int)p1);
 int a = AltConvert((int)p1);
 int j;
 for (j = 0; j < mctr; j++) {
 if ((inFocus == wnd && menu[j].sc == c) 
 (a && menu[j].sc == a)) {
 SendMessage(wnd, MB_SELECTION, j, 0);
 return;
 }
 }
 }
 /* -------- search for accelerator keys -------- */
 mnu = ActiveMenu;
 while (mnu->Title != (void *)-1) {
 struct PopDown *pd = mnu->Selections;
 if (mnu->PrepMenu)
 (*(mnu->PrepMenu))(GetDocFocus(wnd), mnu);
 while (pd->SelectionTitle != NULL) {
 if (pd->Accelerator == (int) p1) {
 if (pd->Attrib & INACTIVE)
 beep();
 else {
 if (pd->Attrib & TOGGLE)
 pd->Attrib ^= CHECKED;
 SendMessage(GetDocFocus(wnd),
 SETFOCUS, TRUE, 0);
 PostMessage(GetParent(wnd),
 COMMAND, pd->ActionId, 0);
 }
 return;
 }
 pd++;
 }
 mnu++;

 }
 switch ((int)p1) {
 case F1:
 if (ActiveMenu != NULL &&
 (mwnd == NULL 
 (ActiveMenu+ActiveMenuBar->ActiveSelection)->
 Selections[0].SelectionTitle == NULL)) {
 DisplayHelp(wnd,
 (ActiveMenu+ActiveMenuBar->ActiveSelection)->Title+1);
 return;
 }
 break;
 case '\r':
 if (mwnd == NULL &&
 ActiveMenuBar->ActiveSelection != -1)
 SendMessage(wnd, MB_SELECTION,
 ActiveMenuBar->ActiveSelection, 0);
 break;
 case F10:
 if (wnd != inFocus && mwnd == NULL) {
 SendMessage(wnd, SETFOCUS, TRUE, 0);
 break;
 }
 /* ------- fall through ------- */
 case ESC:
 if (inFocus == wnd && mwnd == NULL) {
 ActiveMenuBar->ActiveSelection = -1;
 SendMessage(GetDocFocus(wnd),SETFOCUS,TRUE,0);
 SendMessage(wnd, PAINT, 0, 0);
 }
 break;
 case FWD:
 ActiveMenuBar->ActiveSelection++;
 if (ActiveMenuBar->ActiveSelection == mctr)
 ActiveMenuBar->ActiveSelection = 0;
 if (mwnd != NULL)
 SendMessage(wnd, MB_SELECTION,
 ActiveMenuBar->ActiveSelection, 0);
 else
 SendMessage(wnd, PAINT, 0, 0);
 break;
 case BS:
 if (ActiveMenuBar->ActiveSelection == 0)
 ActiveMenuBar->ActiveSelection = mctr;
 --ActiveMenuBar->ActiveSelection;
 if (mwnd != NULL)
 SendMessage(wnd, MB_SELECTION,
 ActiveMenuBar->ActiveSelection, 0);
 else
 SendMessage(wnd, PAINT, 0, 0);
 break;
 default:
 break;
 }
}

/* --------------- LEFT_BUTTON Message ---------- */
static void LeftButtonMsg(WINDOW wnd, PARAM p1)
{

 int i;
 int mx = (int) p1 - GetLeft(wnd);
 /* --- compute the selection that the left button hit --- */
 for (i = 0; i < mctr; i++)
 if (mx >= menu[i].x1-4*i &&
 mx <= menu[i].x2-4*i-5)
 break;
 if (i < mctr)
 if (i != ActiveMenuBar->ActiveSelection mwnd == NULL)
 SendMessage(wnd, MB_SELECTION, i, 0);
}

/* -------------- MB_SELECTION Message -------------- */
static void SelectionMsg(WINDOW wnd, PARAM p1, PARAM p2)
{
 int wd, mx, my;
 MENU *mnu;

 Selecting = TRUE;
 mnu = ActiveMenu+(int)p1;
 if (mnu->PrepMenu != NULL)
 (*(mnu->PrepMenu))(GetDocFocus(wnd), mnu);
 wd = MenuWidth(mnu->Selections);
 if (p2) {
 mx = GetLeft(inFocus) + WindowWidth(inFocus) - 1;
 my = GetTop(inFocus) + inFocus->selection;
 }
 else {
 int offset = menu[(int)p1].x1 - 4 * (int)p1;
 if (mwnd != NULL) {
 SendMessage(wnd, SETFOCUS, TRUE, 0);
 SendMessage(mwnd, CLOSE_WINDOW, 0, 0);
 }
 ActiveMenuBar->ActiveSelection = (int) p1;
 if (offset > WindowWidth(wnd)-wd)
 offset = WindowWidth(wnd)-wd;
 mx = GetLeft(wnd)+offset;
 my = GetTop(wnd)+1;
 }
 mwnd = CreateWindow(POPDOWNMENU, NULL,
 mx, my,
 MenuHeight(mnu->Selections),
 wd,
 NULL,
 wnd,
 NULL,
 0);
 AddAttribute(mwnd, SHADOW);
 if (mnu->Selections[0].SelectionTitle != NULL) {
 SendMessage(mwnd, BUILD_SELECTIONS, (PARAM) mnu, 0);
 SendMessage(mwnd, SETFOCUS, TRUE, 0);
 }
 else
 SendMessage(wnd, PAINT, 0, 0);
 Selecting = FALSE;
}

/* --------- COMMAND Message ---------- */
static void CommandMsg(WINDOW wnd, PARAM p1, PARAM p2)

{
 if (isCascadedCommand(ActiveMenuBar, (int)p1)) {
 /* find the cascaded menu based on command id in p1 */
 MENU *mnu = ActiveMenu+mctr;
 while (mnu->Title != (void *)-1) {
 if (mnu->CascadeId == (int) p1) {
 if (casc < MAXCASCADES) {
 Cascaders[casc++] = mwnd;
 SendMessage(wnd, MB_SELECTION,
 (PARAM)(mnu-ActiveMenu), TRUE);
 }
 break;
 }
 mnu++;
 }
 }
 else {
 if (mwnd != NULL)
 SendMessage(mwnd, CLOSE_WINDOW, 0, 0);
 SendMessage(GetDocFocus(wnd), SETFOCUS, TRUE, 0);
 PostMessage(GetParent(wnd), COMMAND, p1, p2);
 }
}

/* --------------- CLOSE_POPDOWN Message --------------- */
static void ClosePopdownMsg(WINDOW wnd)
{
 if (casc > 0)
 SendMessage(Cascaders[--casc], CLOSE_WINDOW, 0, 0);
 else {
 mwnd = NULL;
 ActiveMenuBar->ActiveSelection = -1;
 if (!Selecting)
 SendMessage(GetDocFocus(wnd), SETFOCUS, TRUE, 0);
 SendMessage(wnd, PAINT, 0, 0);
 }
}

/* ---------------- CLOSE_WINDOW Message --------------- */
static void CloseWindowMsg(WINDOW wnd)
{
 if (GetText(wnd) != NULL) {
 free(GetText(wnd));
 GetText(wnd) = NULL;
 }
 mctr = 0;
 ActiveMenuBar->ActiveSelection = -1;
 ActiveMenu = NULL;
 ActiveMenuBar = NULL;
}

/* --- Window processing module for MENUBAR window class --- */
int MenuBarProc(WINDOW wnd, MESSAGE msg, PARAM p1, PARAM p2)
{
 int rtn;

 switch (msg) {
 case CREATE_WINDOW:
 reset_menubar(wnd);

 break;
 case SETFOCUS:
 rtn = BaseWndProc(MENUBAR, wnd, msg, p1, p2);
 SetFocusMsg(wnd, p1);
 return rtn;
 case BUILDMENU:
 BuildMenuMsg(wnd, p1);
 break;
 case PAINT:
 if (!isVisible(wnd) GetText(wnd) == NULL)
 break;
 PaintMsg(wnd);
 return FALSE;
 case BORDER:
 return TRUE;
 case KEYBOARD:
 KeyboardMsg(wnd, p1);
 return TRUE;
 case LEFT_BUTTON:
 LeftButtonMsg(wnd, p1);
 return TRUE;
 case MB_SELECTION:
 SelectionMsg(wnd, p1, p2);
 break;
 case COMMAND:
 CommandMsg(wnd, p1, p2);
 return TRUE;
 case INSIDE_WINDOW:
 return InsideRect(p1, p2, WindowRect(wnd));
 case CLOSE_POPDOWN:
 ClosePopdownMsg(wnd);
 return TRUE;
 case CLOSE_WINDOW:
 rtn = BaseWndProc(MENUBAR, wnd, msg, p1, p2);
 CloseWindowMsg(wnd);
 return rtn;
 default:
 break;
 }
 return BaseWndProc(MENUBAR, wnd, msg, p1, p2);
}

/* ----- return the WINDOW handle of the document window
 that had the focus when the MENUBAR was activated ----- */
static WINDOW GetDocFocus(WINDOW wnd)
{
 WINDOW DocFocus = Focus.LastWindow;
 CLASS cl;
 while ((cl = GetClass(DocFocus)) == MENUBAR 
 cl == POPDOWNMENU 
 cl == STATUSBAR 
 cl == APPLICATION) {
 if ((DocFocus = PrevWindow(DocFocus)) == NULL) {
 DocFocus = GetParent(wnd);
 break;
 }
 }
 return DocFocus;
}


/* ------------- reset the MENUBAR -------------- */
static void reset_menubar(WINDOW wnd)
{
 if ((GetText(wnd) =
 realloc(GetText(wnd), SCREENWIDTH+5)) != NULL) {
 memset(GetText(wnd), ' ', SCREENWIDTH);
 *(GetText(wnd)+WindowWidth(wnd)) = '\0';
 }
}







[LISTING TWO]

/* ------------- popdown.c ----------- */

#include "dflat.h"

static int SelectionWidth(struct PopDown *);
static int py = -1;

/* ------------ CREATE_WINDOW Message ------------- */
static int CreateWindowMsg(WINDOW wnd)
{
 int rtn;
 ClearAttribute(wnd, HASTITLEBAR 
 VSCROLLBAR 
 MOVEABLE 
 SIZEABLE 
 HSCROLLBAR);
 rtn = BaseWndProc(POPDOWNMENU, wnd, CREATE_WINDOW, 0, 0);
 SendMessage(wnd, CAPTURE_MOUSE, 0, 0);
 SendMessage(wnd, CAPTURE_KEYBOARD, 0, 0);
 SendMessage(NULL, SAVE_CURSOR, 0, 0);
 SendMessage(NULL, HIDE_CURSOR, 0, 0);
 return rtn;
}

/* --------- LEFT_BUTTON Message --------- */
static void LeftButtonMsg(WINDOW wnd, PARAM p1, PARAM p2)
{
 int my = (int) p2 - GetTop(wnd);
 if (InsideRect(p1, p2, ClientRect(wnd))) {
 if (my != py) {
 SendMessage(wnd, LB_SELECTION,
 (PARAM) wnd->wtop+my-1, TRUE);
 py = my;
 }
 }
 else if ((int)p2 == GetTop(GetParent(wnd)))
 if (GetClass(GetParent(wnd)) == MENUBAR)
 PostMessage(GetParent(wnd), LEFT_BUTTON, p1, p2);
}


/* -------- BUTTON_RELEASED Message -------- */
static BOOL ButtonReleasedMsg(WINDOW wnd, PARAM p1, PARAM p2)
{
 py = -1;
 if (InsideRect((int)p1, (int)p2, ClientRect(wnd))) {
 int sel = (int)p2 - GetClientTop(wnd);
 if (*TextLine(wnd, sel) != LINE)
 SendMessage(wnd, LB_CHOOSE, wnd->selection, 0);
 }
 else {
 WINDOW pwnd = GetParent(wnd);
 if (GetClass(pwnd) == MENUBAR && (int)p2==GetTop(pwnd))
 return FALSE;
 if ((int)p1 == GetLeft(pwnd)+2)
 return FALSE;
 SendMessage(wnd, CLOSE_WINDOW, 0, 0);
 return TRUE;
 }
 return FALSE;
}

/* --------- PAINT Message -------- */
static void PaintMsg(WINDOW wnd)
{
 int wd;
 unsigned char sep[80], *cp = sep;
 unsigned char sel[80];
 struct PopDown *ActivePopDown;
 struct PopDown *pd1;

 ActivePopDown = pd1 = wnd->mnu->Selections;
 wd = MenuWidth(ActivePopDown)-2;
 while (wd--)
 *cp++ = LINE;
 *cp = '\0';
 SendMessage(wnd, CLEARTEXT, 0, 0);
 wnd->selection = wnd->mnu->Selection;
 while (pd1->SelectionTitle != NULL) {
 if (*pd1->SelectionTitle == LINE)
 SendMessage(wnd, ADDTEXT, (PARAM) sep, 0);
 else {
 int len;
 memset(sel, '\0', sizeof sel);
 if (pd1->Attrib & INACTIVE)
 /* ------ inactive menu selection ----- */
 sprintf(sel, "%c%c%c",
 CHANGECOLOR,
 wnd->WindowColors [HILITE_COLOR] [FG]0x80,
 wnd->WindowColors [STD_COLOR] [BG]0x80);
 strcat(sel, " ");
 if (pd1->Attrib & CHECKED)
 /* ---- paint the toggle checkmark ---- */
 sel[strlen(sel)-1] = CHECKMARK;
 len=CopyCommand(sel+strlen(sel),pd1->SelectionTitle,
 pd1->Attrib & INACTIVE,
 wnd->WindowColors [STD_COLOR] [BG]);
 if (pd1->Accelerator) {
 /* ---- paint accelerator key ---- */
 int i;

 int wd1 = 2+SelectionWidth(ActivePopDown) -
 strlen(pd1->SelectionTitle);
 for (i = 0; keys[i].keylabel; i++) {
 if (keys[i].keycode == pd1->Accelerator) {
 while (wd1--)
 strcat(sel, " ");
 sprintf(sel+strlen(sel), "[%s]",
 keys[i].keylabel);
 break;
 }
 }
 }
 if (pd1->Attrib & CASCADED) {
 /* ---- paint cascaded menu token ---- */
 if (!pd1->Accelerator) {
 wd = MenuWidth(ActivePopDown)-len+1;
 while (wd--)
 strcat(sel, " ");
 }
 sel[strlen(sel)-1] = CASCADEPOINTER;
 }
 else
 strcat(sel, " ");
 strcat(sel, " ");
 sel[strlen(sel)-1] = RESETCOLOR;
 SendMessage(wnd, ADDTEXT, (PARAM) sel, 0);
 }
 pd1++;
 }
}

/* ---------- BORDER Message ----------- */
static int BorderMsg(WINDOW wnd)
{
 int i, rtn = TRUE;
 WINDOW currFocus;
 if (wnd->mnu != NULL) {
 currFocus = inFocus;
 inFocus = NULL;
 rtn = BaseWndProc(POPDOWNMENU, wnd, BORDER, 0, 0);
 inFocus = currFocus;
 for (i = 0; i < ClientHeight(wnd); i++) {
 if (*TextLine(wnd, i) == LINE) {
 wputch(wnd, LEDGE, 0, i+1);
 wputch(wnd, REDGE, WindowWidth(wnd)-1, i+1);
 }
 }
 }
 return rtn;
}

/* -------------- LB_CHOOSE Message -------------- */
static void LBChooseMsg(WINDOW wnd, PARAM p1)
{
 struct PopDown *ActivePopDown = wnd->mnu->Selections;
 if (ActivePopDown != NULL) {
 int *attr = &(ActivePopDown+(int)p1)->Attrib;
 wnd->mnu->Selection = (int)p1;
 if (!(*attr & INACTIVE)) {

 if (*attr & TOGGLE)
 *attr ^= CHECKED;
 PostMessage(GetParent(wnd), COMMAND,
 (ActivePopDown+(int)p1)->ActionId, p1);
 }
 else
 beep();
 }
}

/* ---------- KEYBOARD Message --------- */
static BOOL KeyboardMsg(WINDOW wnd, PARAM p1, PARAM p2)
{
 struct PopDown *ActivePopDown = wnd->mnu->Selections;
 if (wnd->mnu != NULL) {
 if (ActivePopDown != NULL) {
 int c = (int)p1;
 int sel = 0;
 int a;
 struct PopDown *pd = ActivePopDown;

 if ((c & OFFSET) == 0)
 c = tolower(c);
 a = AltConvert(c);

 while (pd->SelectionTitle != NULL) {
 char *cp = strchr(pd->SelectionTitle,
 SHORTCUTCHAR);
 int sc = tolower(*(cp+1));
 if ((cp && sc == c) 
 (a && sc == a) 
 pd->Accelerator == c) {
 PostMessage(wnd, LB_SELECTION, sel, 0);
 PostMessage(wnd, LB_CHOOSE, sel, TRUE);
 return TRUE;
 }
 pd++, sel++;
 }
 }
 }
 switch ((int)p1) {
 case F1:
 if (ActivePopDown == NULL)
 SendMessage(GetParent(wnd), KEYBOARD, p1, p2);
 else
 DisplayHelp(wnd,
 (ActivePopDown+wnd->selection)->help);
 return TRUE;
 case ESC:
 SendMessage(wnd, CLOSE_WINDOW, 0, 0);
 return TRUE;
 case FWD:
 case BS:
 if (GetClass(GetParent(wnd)) == MENUBAR)
 PostMessage(GetParent(wnd), KEYBOARD, p1, p2);
 return TRUE;
 case UP:
 if (wnd->selection == 0) {
 if (wnd->wlines == ClientHeight(wnd)) {

 PostMessage(wnd, LB_SELECTION,
 wnd->wlines-1, FALSE);
 return TRUE;
 }
 }
 break;
 case DN:
 if (wnd->selection == wnd->wlines-1) {
 if (wnd->wlines == ClientHeight(wnd)) {
 PostMessage(wnd, LB_SELECTION, 0, FALSE);
 return TRUE;
 }
 }
 break;
 case HOME:
 case END:
 case '\r':
 break;
 default:
 return TRUE;
 }
 return FALSE;
}

/* ----------- CLOSE_WINDOW Message ---------- */
static int CloseWindowMsg(WINDOW wnd)
{
 int rtn;
 SendMessage(wnd, RELEASE_MOUSE, 0, 0);
 SendMessage(wnd, RELEASE_KEYBOARD, 0, 0);
 SendMessage(NULL, RESTORE_CURSOR, 0, 0);
 rtn = BaseWndProc(POPDOWNMENU, wnd, CLOSE_WINDOW, 0, 0);
 SendMessage(GetParent(wnd), CLOSE_POPDOWN, 0, 0);
 return rtn;
}

/* - Window processing module for POPDOWNMENU window class - */
int PopDownProc(WINDOW wnd, MESSAGE msg, PARAM p1, PARAM p2)
{
 switch (msg) {
 case CREATE_WINDOW:
 return CreateWindowMsg(wnd);
 case LEFT_BUTTON:
 LeftButtonMsg(wnd, p1, p2);
 return FALSE;
 case DOUBLE_CLICK:
 return TRUE;
 case LB_SELECTION:
 if (*TextLine(wnd, (int)p1) == LINE)
 return TRUE;
 wnd->mnu->Selection = (int)p1;
 break;
 case BUTTON_RELEASED:
 if (ButtonReleasedMsg(wnd, p1, p2))
 return TRUE;
 break;
 case BUILD_SELECTIONS:
 wnd->mnu = (void *) p1;
 wnd->selection = wnd->mnu->Selection;

 break;
 case PAINT:
 if (wnd->mnu == NULL)
 return TRUE;
 PaintMsg(wnd);
 break;
 case BORDER:
 return BorderMsg(wnd);
 case LB_CHOOSE:
 LBChooseMsg(wnd, p1);
 return TRUE;
 case KEYBOARD:
 if (KeyboardMsg(wnd, p1, p2))
 return TRUE;
 break;
 case CLOSE_WINDOW:
 return CloseWindowMsg(wnd);
 default:
 break;
 }
 return BaseWndProc(POPDOWNMENU, wnd, msg, p1, p2);
}

/* --------- compute menu height -------- */
int MenuHeight(struct PopDown *pd)
{
 int ht = 0;
 while (pd[ht].SelectionTitle != NULL)
 ht++;
 return ht+2;
}

/* --------- compute menu width -------- */
int MenuWidth(struct PopDown *pd)
{
 int wd = 0, i;
 int len = 0;

 wd = SelectionWidth(pd);
 while (pd->SelectionTitle != NULL) {
 if (pd->Accelerator) {
 for (i = 0; keys[i].keylabel; i++)
 if (keys[i].keycode == pd->Accelerator) {
 len = max(len, 2+strlen(keys[i].keylabel));
 break;
 }
 }
 if (pd->Attrib & CASCADED)
 len = max(len, 2);
 pd++;
 }
 return wd+5+len;
}

/* ---- compute the maximum selection width in a menu ---- */
static int SelectionWidth(struct PopDown *pd)
{
 int wd = 0;
 while (pd->SelectionTitle != NULL) {

 int len = strlen(pd->SelectionTitle)-1;
 wd = max(wd, len);
 pd++;
 }
 return wd;
}

/* ----- copy a menu command to a display buffer ---- */
int CopyCommand(unsigned char *dest, unsigned char *src,
 int skipcolor, int bg)
{
 unsigned char *d = dest;
 while (*src && *src != '\n') {
 if (*src == SHORTCUTCHAR) {
 src++;
 if (!skipcolor) {
 *dest++ = CHANGECOLOR;
 *dest++ = cfg.clr[POPDOWNMENU]
 [HILITE_COLOR] [BG] 0x80;
 *dest++ = bg 0x80;
 *dest++ = *src++;
 *dest++ = RESETCOLOR;
 }
 }
 else
 *dest++ = *src++;
 }
 return (int) (dest - d);
}







[LISTING THREE]

/* ------------- menu.c ------------- */

#include "dflat.h"

static struct PopDown *FindCmd(MBAR *mn, int cmd)
{
 MENU *mnu = mn->PullDown;
 while (mnu->Title != (void *)-1) {
 struct PopDown *pd = mnu->Selections;
 while (pd->SelectionTitle != NULL) {
 if (pd->ActionId == cmd)
 return pd;
 pd++;
 }
 mnu++;
 }
 return NULL;
}

char *GetCommandText(MBAR *mn, int cmd)
{

 struct PopDown *pd = FindCmd(mn, cmd);
 if (pd != NULL)
 return pd->SelectionTitle;
 return NULL;
}

BOOL isCascadedCommand(MBAR *mn, int cmd)
{
 struct PopDown *pd = FindCmd(mn, cmd);
 if (pd != NULL)
 return pd->Attrib & CASCADED;
 return FALSE;
}

void ActivateCommand(MBAR *mn, int cmd)
{
 struct PopDown *pd = FindCmd(mn, cmd);
 if (pd != NULL)
 pd->Attrib &= ~INACTIVE;
}

void DeactivateCommand(MBAR *mn, int cmd)
{
 struct PopDown *pd = FindCmd(mn, cmd);
 if (pd != NULL)
 pd->Attrib = INACTIVE;
}

BOOL isActive(MBAR *mn, int cmd)
{
 struct PopDown *pd = FindCmd(mn, cmd);
 if (pd != NULL)
 return !(pd->Attrib & INACTIVE);
 return FALSE;
}

BOOL GetCommandToggle(MBAR *mn, int cmd)
{
 struct PopDown *pd = FindCmd(mn, cmd);
 if (pd != NULL)
 return (pd->Attrib & CHECKED) != 0;
 return FALSE;
}

void SetCommandToggle(MBAR *mn, int cmd)
{
 struct PopDown *pd = FindCmd(mn, cmd);
 if (pd != NULL)
 pd->Attrib = CHECKED;
}

void ClearCommandToggle(MBAR *mn, int cmd)
{
 struct PopDown *pd = FindCmd(mn, cmd);
 if (pd != NULL)
 pd->Attrib &= ~CHECKED;
}

void InvertCommandToggle(MBAR *mn, int cmd)

{
 struct PopDown *pd = FindCmd(mn, cmd);
 if (pd != NULL)
 pd->Attrib ^= CHECKED;
}







[LISTING FOUR]


/* ------------- sysmenu.c ------------ */

#include "dflat.h"

int SystemMenuProc(WINDOW wnd, MESSAGE msg, PARAM p1, PARAM p2)
{
 int mx, my;
 WINDOW wnd1;
 switch (msg) {
 case CREATE_WINDOW:
 wnd->holdmenu = ActiveMenuBar;
 ActiveMenuBar = &SystemMenu;
 SystemMenu.PullDown[0].Selection = 0;
 break;
 case LEFT_BUTTON:
 wnd1 = GetParent(wnd);
 mx = (int) p1 - GetLeft(wnd1);
 my = (int) p2 - GetTop(wnd1);
 if (HitControlBox(wnd1, mx, my))
 return TRUE;
 break;
 case LB_CHOOSE:
 PostMessage(wnd, CLOSE_WINDOW, 0, 0);
 break;
 case DOUBLE_CLICK:
 if (p2 == GetTop(GetParent(wnd))) {
 PostMessage(GetParent(wnd), msg, p1, p2);
 SendMessage(wnd, CLOSE_WINDOW, TRUE, 0);
 }
 return TRUE;
 case SHIFT_CHANGED:
 return TRUE;
 case CLOSE_WINDOW:
 ActiveMenuBar = wnd->holdmenu;
 break;
 default:
 break;
 }
 return DefaultWndProc(wnd, msg, p1, p2);
}

/* ------- Build a system menu -------- */
void BuildSystemMenu(WINDOW wnd)
{

 int lf = GetLeft(wnd)+1;
 int tp = GetTop(wnd)+1;
 int ht = MenuHeight(SystemMenu.PullDown[0].Selections);
 int wd = MenuWidth(SystemMenu.PullDown[0].Selections);
 WINDOW SystemMenuWnd;

 SystemMenu.PullDown[0].Selections[6].Accelerator =
 (GetClass(wnd) == APPLICATION) ? ALT_F4 : CTRL_F4;

 if (lf+wd > SCREENWIDTH-1)
 lf = (SCREENWIDTH-1) - wd;
 if (tp+ht > SCREENHEIGHT-2)
 tp = (SCREENHEIGHT-2) - ht;

 SystemMenuWnd = CreateWindow(POPDOWNMENU, NULL,
 lf,tp,ht,wd,NULL,wnd,SystemMenuProc, 0);

#ifdef INCLUDE_RESTORE
 if (wnd->condition == ISRESTORED)
 DeactivateCommand(&SystemMenu, ID_SYSRESTORE);
 else
 ActivateCommand(&SystemMenu, ID_SYSRESTORE);
#endif

 if (TestAttribute(wnd, MOVEABLE)
#ifdef INCLUDE_MAXIMIZE
 && wnd->condition != ISMAXIMIZED
#endif
 )
 ActivateCommand(&SystemMenu, ID_SYSMOVE);
 else
 DeactivateCommand(&SystemMenu, ID_SYSMOVE);

 if (wnd->condition != ISRESTORED 
 TestAttribute(wnd, SIZEABLE) == FALSE)
 DeactivateCommand(&SystemMenu, ID_SYSSIZE);
 else
 ActivateCommand(&SystemMenu, ID_SYSSIZE);

#ifdef INCLUDE_MINIMIZE
 if (wnd->condition == ISMINIMIZED 
 TestAttribute(wnd, MINMAXBOX) == FALSE)
 DeactivateCommand(&SystemMenu, ID_SYSMINIMIZE);
 else
 ActivateCommand(&SystemMenu, ID_SYSMINIMIZE);
#endif

#ifdef INCLUDE_MAXIMIZE
 if (wnd->condition != ISRESTORED 
 TestAttribute(wnd, MINMAXBOX) == FALSE)
 DeactivateCommand(&SystemMenu, ID_SYSMAXIMIZE);
 else
 ActivateCommand(&SystemMenu, ID_SYSMAXIMIZE);
#endif

 SendMessage(SystemMenuWnd, BUILD_SELECTIONS,
 (PARAM) &SystemMenu.PullDown[0], 0);
 SendMessage(SystemMenuWnd, SHOW_WINDOW, 0, 0);
}































































April, 1992
STRUCTURED PROGRAMMING


Start Your Collection Now!




Jeff Duntemann KG7JF


I have been falling a little out of love with objects for some time now. Ok,
that's not entirely true -- but I've definitely come out of the infatuation
that has gripped a lot of us for a couple of years. I've had to admit that
some of my firmest beliefs are just plain wrong, and others are at best skewed
to reality. On the other hand, OOP technology has succeeded spectacularly in a
couple of areas, and there are a few others where OOP's failure is the fault
of our own flawed expectations. If you can't make pasta with a can opener,
it's hardly the can opener's fault.
So if I haven't led off with a funny story this month, well, it's mostly
because the whole affair has left me slightly depressed.


A House Divided


Where I think we blew it big-time is in the are of encapsulation. Scott
Guthery was pretty much right about that, in his seminal challenge to OOP
concepts ("Are the Emperor's New Clothes Object Oriented?") presented in DDJ
in the December 1989 issue. Among other things, he points out that inheritance
breaks encapsulation. On that point at least (although I'm far from agreeing
with him on all of his points) he's right.
The truth is that OOP is a technological house divided against itself. In one
corner is inheritance/polymorphism, and in the other is encapsulation. The
corners are not really related at all, in that you can have inheritance
without encapsulation and vice versa. Beneath the hood,
inheritance/polymorphism is a matter of binding; that is, the mechanism by
which the caller of a routine is given the routine's address. Encapsulation,
on the other hand, is a question of stack frames: Who gets to reference who's
data or methods, and how.
For the word "encapsulation" to mean anything at all, the unit of
encapsulation must be considered the entire inheritance interstate between the
root object and the leaf you're instantiating. In many cases that's the same
as saying, "The world is my neighborhood," which is noble and poetic but tells
us nothing when we have to think about urban renewal.
The point of inheritance/polymorphism is to enhance the expressibility of a
language; that is, broaden the range of things you can model in software. The
point of encapsulation, by contrast, is to reduce the degree of coupling
between software components. Maybe one out of two ain't so bad.


Software VLSI


Turbo Vision, our ongoing dance partner for the last several columns (and a
few more to come) is an interesting example. It's a triumph of data
expression, and (by virtue of Turbo Pascal's OOP extensions) does things we
couldn't even dream of four years ago. On the other hand, the unit of
encapsulation for Turbo Vision is Turbo Vision itself. It's not an engineer's
assortment of standard software ICs; it's a massive piece of software VLSI
that dictates virtually the entire design of any application that uses it.
Is this good or bad? As a concept, it's neither; and to be honest with you,
it's nothing new. Most of the user-interface libraries I've worked with have
been heavily coupled internally, such that you don't pull out a scroll bar
without bringing in just about everything else. As I've said before, the
notion of a standard set of software-engineering modules probably depends on a
standard interface and infrastructure to support those modules and the passage
of data among them. Unless this interface and infrastructure is totally
industry standard, the effort may not be worth the trouble. That means it's
going to have to happen at the operating-system level, which for DOS means
that the notion of logic-block scale "software ICs" is pretty much hopeless.
Turbo Vision, as software VLSI, is pretty damned good, and (for DOS text mode,
at least) it may be the best we can do.
So let's bid encapsulation good-bye for the present, and return to the issue
of expressibility.


Turbo Pascal Collections


Something I have always admired about Smalltalk is its expressibility as a
language. For many years, it had absolutely no peer in being able to model
real-life entities with messy relationships. A large part of this power lay in
Smalltalk's collection classes, classes that gather together other objects and
allow them to be treated in certain ways as a group.
Standard Pascal nods to the concept of collections with arrays, files,
records, and (especially) dynamic data structures on the heap, but strong
typing always gets in the way. A Pascal array collects instances of one type
and one type only. Ditto for files. Records collect only at compile time,
which is truthfully not collecting at all. You have a little more flexibility
with linked lists and other dynamic structures, but not much.
Hiding in Turbo Vision's shadow is a remarkable feature that far too few Turbo
Pascal programmers have ever tried to use: a hierarchy of Pascal collection
classes that do most of what the Smalltalk collections can do. Although
nominally a part of Turbo Vision (because Turbo Vision uses them) TP6
collections in fact are not views and stand alone; and they may be used in
applications incorporating any user interface or none at all. Even if you
consider Turbo Vision too difficult, you should take a close look at the
collections hierarchy.


Collection Fundamentals


All Turbo Vision collections work basically the same way. You declare an
instance of a collection class such as TCollection, the fundamental collection
class. Your collection object contains a method called Insert, which is the
way you add an object to the collection: MyCollection.Insert(PSometype);. Note
well that the parameter to Insert is a pointer to an object, and not the
object itself! This is the part of collections that most throws newcomers:
Collections are inextricably built upon pointers, and you'd better get used to
that. You can pass a pointer by using the address-of operator "@" on a
variable (for example, @MyObject), but as I'll explain a little later, this
often leads to some ugly traps.
Play it straight. Unless you know exactly what you're doing, collect only data
items that have been allocated on the heap.
The collection object keeps track of how many objects it contains in a field
called Count. Count changes automatically when you insert or delete an object,
and always reflects the true number of objects in the collection.
Once inserted, data stored in a collection may be accessed in a number of
ways. The easiest way is simply to consider the collection an array masked by
a procedure call, and access items in the collection by index; see Example 1.
Example 1: Accessing data stored in a collection

 FOR I := 1 TO Count DO
 Writeln
 (MyCollection.At(I)^.SomeText);

The At function method returns a pointer to the item at the index you pass to
the collection is At's single-integer parameter. You must dereference the
pointer to "get at" the data. By design, all Turbo Pascal collections are
ordered collections; that is, there is always a defined order in which the
collected objects exist in the collection. This doesn't mean that the items
are always sorted in the collection (although there is a sorted collection
class which we'll deal with a little later), but simply that you can always
reference the contents of a collection in a defined order.
In a similar way, you can write a new object to a collection at a particular
index by using the AtPut method. The new object replaces any object that was
previously at that index in the collection: MyCollection.AtPut(I,PSometype);.
Again, you don't pass the object you want to put into the collection, but a
pointer to that object. (I'm using the now-widespread convention here that a
type identifier starting with the letter "P" indicates a pointer type.)
The Insert method mentioned earlier inserts at the beginning of the
collection. You can also insert a new object somewhere in the middle of the
collection without overwriting what is already at that index. The AtInsert
method does this: MyCollection.AtInsert(I,PSomeType);. Starting at index I,
the objects in the collection are "shoved over" by one, so that PSomeType^ can
be inserted at index I. Nothing is overwritten.
The other access methods are more complex and depend on the type of collection
you're using. We'll get to that shortly.



Creating and Sizing Collections


Collections are objects, and like all objects, they're created by way of
constructor calls. The constructor that creates a collection from scratch
looks like this: CONSTRUCTOR Init(ALimit, ADelta : Integer);.
You pass in ALimit the number of items you want the collection to be able to
contain. Now, there is an absolute ceiling on the number of individual items a
collection can contain. This ceiling figure is stored in a constant called
MaxCollectionSize, and is currently equal to 16,380 (the maximum number of
4-byte pointers you can store in 64K). ALimit, by contrast, is an arbitrary
limit that you set for yourself, with reasonable expectations that the
collection will not need to contain more items.
It's a question of conserving heap space. Under the covers, a collection is a
dynamically allocated array of pointers located on the heap. The number of
pointers initially allocated for it is the number you pass in ALimit. The
amount of heap space taken up by an empty collection allocated on the heap is
ALimit times four (the size of a pointer) plus another 20 odd bytes for the
collection's other fields. Obviously, if you make ALimit vastly larger than
you'll need, you're wasting space on the heap for pointers you're never going
to use.
Predicting the future is a dicey business, though, and that's what the ADelta
parameter is for. Your collection can grow beyond the number of objects
specified in ALimit. If you fill a collection and try to insert another
object, the collection will automatically grow its capacity by ADelta objects.
ADelta is the "chunk size" by which new capacity is added.
This sounds great, but there's a penalty: Adding capacity forces the
collection to allocate an additional array of pointers on the heap. This
additional array is not contiguous with the original array, and is connected
to the original array by pointers. Indexing through the array thus becomes
more complicated, because once there are multiple pointer arrays, the indexing
code has to test to see if it's at the end of any given array on each index.
This slows down performance for every access, not just accesses into the
additional arrays.
No, you can't really predict the future. But sometimes you have to try.


Sorted Collections


What I've just described is fundamental behavior common to all TP6
collections. This behavior is implemented in the TCollection class, which is
an ordered but not a sorted collection. You can sub-class TCollection for more
specific behavior. Turbo Pascal provides another type of collection class that
automatically inserts the items you add to the collection in sorted order.
TSortedCollection is a child class of TCollection, with the additional code
and fields to support automatic sorting of inserted objects.
You have to provide one piece of the puzzle: a Compare method to override the
empty one present in TSortedCollection. Compare takes pointers to two items
and returns a code based on the sort-order relationship between the two items:
 -1 if Key1 < Key2 0 if Key1 = Key2 1 if Key1 > Key2
The code in the TSortedCollection abstract class calls your Compare function
whenever a new item is added to the sorted collection, so it can determine
where to place it with respect to the items already in the collection.
There's one drawback to TSortedCollection: Duplicate items are not allowed,
and if you attempt to add an item to a collection in which an identical item
already exists, the new item will not be added. In some applications, this can
be a serious drawback; if you're sorting people's names, you'll probably run
into duplicates within 50 added records. So unless you're absolutely sure that
every item you're adding will be unique, you'd better test each item before
you add it to a sorted collection to be sure it isn't already there. If it is,
you'll have to change the item somehow to make it unique. Customer records and
such should be given unique customer ID numbers before trying to collect them
in a TSortedCollection.


An Example: DIRLIST.PAS


Listing One (page 155) presents a simple but very useful collection class
that's a good first example of Turbo Pascal collections. Unit DirList defines
two classes. The first is TDirEntry, which encapsulates a Turbo Pascal
SearchRec record with a path and a nicely formatted string description of the
file specified by the path and the SearchRec. The other class is
TDirEntryCollection, which is a sorted collection of TDirEntry objects.
By using Turbo Pascal's FindFirst and FindNext library procedures,
TDirEntryCollection builds collections of file entry objects that satisfy
wild-card file specs such as *.PAS. Thus if you want to build an application
that acts on groups of files (or one that presents groups of files to the user
for selection), TDirEntryCollection might be just the thing.
There's nothing particularly tricky about the TDirEntry object. It
demonstrates the use of my "when stamp" class as a sort of stand-alone
software IC that provides a specific service; in this case, the formatting of
a 16-bit, DOS-directory file stamp into separate numeric time and date values,
as well as string equivalents of those values. By the way, I've uploaded a new
version of the when stamp unit to the various DDJ listings sources, and you
should obtain the new version if you intend to use it, as I've fixed a few
minor bugs. Look for WHEN2.PAS.


Multiple Constructors


The TDirEntryCollection class is a good example of when multiple constructors
make sense. The mission of a constructor, after all, is to allocate a new
object on the heap and apply some initial values to the object's fields. If
those initial values come from different places, it makes sense to create a
different constructor for each major source of initial data. That way, you
won't have to allocate an empty object and then use a separate statement or
statements to "fill" it.
All three constructors call the method AddDir, which uses FindFirst and
FindNext to search a single directory, specified in Path, for any files
meeting the file spec passed in Mask. Mask may be given a wild-card file spec.
The parameter Attr carries an attribute byte for the search.
The Dos unit defines a list of constants for the various possible attribute
byte values. Only two of them are generally used: AnyFile, which specifies the
file attribute for "normal" files; and Directory, which specifies a
subdirectory. There are also constants for specifying read-only, hidden, and
system files; see Borland's Library Reference for details, under FindFirst.
What differentiates the three constructors is the way they make their calls to
AddDir. InitDir simply calls the parent class's constructor (always do this!)
and then applies AddDir to the directory specified by Path, using Mask to
specify files.
InitCommandLine serves the frequent need to pick up multiple file specs from
the command line. It simply picks up each parameter from the command line
(starting with the parameter number passed in StartParam) and applies AddDir
to each parameter. Passing a number greater than 1 in StartParam allows you to
treat the first one or more parameters as command switches rather than file
specs.
InitTree gets a little more gymnastic and does a search of an entire directory
tree or subtree. This is done through the TreeScan method. TreeScan searches
the tree specified by Path; if Path contains a slash character only, the
entire current drive is searched. If Path were to contain the name of a
subdirectory (such as "\TURBO\SOURCE"), it would search that subdirectory and
any subdirectories found within it.


Iterating Over a Collection Class


Once you've picked up a collection of directory entries from disk, there's the
question of how you work with them. As I mentioned earlier, you can index
through them as if the collection were an array. But the Turbo Pascal
collection classes contain another bit of magic that allows you to iterate a
procedure over all the elements of a collection. What this means is simply
that you can define a procedure and pass a pointer to the procedure to the
collection. The collection will then apply that procedure to all or some of
the items in the collection.
For this to make sense, the best course is to see it in action. Listing Two,
see page 156, is the simplest useful example I could devise for iterating over
a collection class. JFIND.PAS uses the DirList unit to perform a tree search
for one or more file specs passed on the command line. Once it has built up a
collection of all files satisfying the specs entered on the command line, it
displays a brief summary of each file on the screen, identical to the display
format used by DOS DIR.
To do this, JFIND iterates a procedure, DisplayOneFile, over the collection.
The statement that does this looks pretty simple:
FilesFound^.ForEach(DisplayOnefile);. Here, FilesFound (the collection of
files) calls its ForEach method, with a pointer to the DisplayOneFile
procedure as its parameter. ForEach calls DisplayOneFile once for each
TDirEntry object in the collection, with that object as DisplayOneFile's sole
parameter.
Doing this is easy. In setting it up, however, there's a trick to it.
DisplayOneFile must be far local procedure. Yes, that sounds a little like
"jumbo shrimp" or "heavy-metal music," but they mean it. The procedure passed
to ForEach must be local to the block that calls it. Being local to the main
program (that is, being global) doesn't count. What it means, in practice, is
that you pass a pointer to the collection to a shell procedure, and within
that shell procedure make the method call to ForEach. JFIND.PAS does this.
Note the FAR keyword after the proc header for DisplayOneFile.
I haven't the foggiest why things have to be done this way. It's got my
curiosity piqued, however, and if anyone has a good explanation, I'd like to
hear it.


Places for Your Stuff


TP6 collections are in fact more powerful than I've demonstrated here in
DIRLIST.PAS. All of the objects stored in DIRLIST are directory entry objects,
simply because that makes sense for what we're trying to do.
But in fact, you can insert anything into a collection -- anything at all that
you can create a pointer to. And that doesn't mean a separate collection for
each type. You can insert any assortment of types into a single collection,
even a sorted collection, as long as there's some generalized way to define a
sort order for all data types present in the sorted collection.
So a collection becomes very much like the trunk of a car: You can toss in
anything you like, and as long as the trunk has room, it'll take it. This is
two full quantum leaps beyond what we're used to in traditional Pascal
programming. Turbo Vision uses the collection classes to implement collections
of controls and other things, which are not the same type, although they all
have a common ancestor class.
This is a point that needs to be made: Although it's possible to store
non-objects in a collection, doing so cuts you free of some of the things a
collection can do. There is some standard behavior built into the fundamental
TObject class that allows you to work easily with stream I/O, and if TObject
isn't one of your ancestors, you lose that ability. As a matter of policy, you
should try to store only objects that descend from TObject in a Turbo Pascal
collection. It'll save you a lot of extra work when you need to begin using
streams.



Think Pointers, Kid!


Early on in my experience with TP6 collections, I made a mistake that I've
since heard other people have made too. It's easy enough to do if you don't
envision every programming problem in terms of pointers and the heap. What I
did was this: I declared a static global variable called MyGizmo, and then
tried to insert it into a collection like this: MyCollection.Insert@MyGizmo);.
This works -- in that it doesn't blow up or anything, and if you traverse
MyCollection you will in fact discover that it contains a pointer to MyGizmo,
as it should.
The catch is this: You have simply added a pointer pointing to the global
variable MyGizmo to a collection of pointers. If you change the value of
variable MyGizmo and add it again to the collection, you will simply be adding
a second pointer pointing to the identical global variable. Both pointers will
be pointing to the same variable, and the value originally pointed to by the
first pointer will be gone. There is only one copy of MyGizmo in the data
segment, and pointing two pointers at it won't make it two copies.
This problem is simple enough to avoid. Remember the following rules:
TP6 collections are collections of pointers. Only the pointers are present in
the collection. There is no copying of pointer referents.
Unless you have good reason to do otherwise, store only heap-based data in
collections. Allocate data on the heap as the referent of a pointer, and then
add that pointer to the collection.
Work it that way, and you'll do fine.


Opening the Black Box


A couple of columns ago, I complained about the black-box nature of Turbo
Vision, and several people (including a couple at Borland) were nice enough to
point out that the Turbo Vision source code is now shipped as part of the
larger Turbo Pascal Runtime Source package, which anyone can buy from Borland
for $199.95 (or $99 for an upgrade). Having obtained a copy, I'm now not sure
it'll help you unless you're so into Turbo Vision that you probably don't need
it. It's very involved, and I'm beginning to wonder if Turbo Vision's problem
is not that it's a black box, but that it's not nearly black enough.
I think the problem with Turbo Vision is that its complexity is too close to
the surface, and is insufficiently masked. With a little luck, we'll go into
this more deeply in a later column.


Products Mentioned


The Turbo Pascal Runtime Library Source Borland International 1800 Green Hills
Road Scotts Valley, CA 95066 408-438-8400 $199.95 (for upgrade $99)


_STRUCTURED PROGRAMMING COLUMN_
by Jeff Duntemann



[LISTING ONE]

UNIT DirList; { By Jeff Duntemann; for DDJ 4/92 }

INTERFACE

USES Crt,DOS,Objects, { Standard Borland units }
 When2; { From DDJ 1/91 }
TYPE
 String40 = STRING[40];
 PDirEntry = ^TDirEntry;
 TDirEntry =
 OBJECT(TObject)
 Path : PathStr; { Predefined in Dos unit as STRING[79] }
 Entry : SearchRec; { Also predefined in the Dos unit }
 DirLine : String40; { Preformatted directory info string }
 CONSTRUCTOR Init(APath : PathStr; ASearchRec : SearchRec);
 PROCEDURE FormatDirLine;
 END;
 PDirEntryCollection = ^TDirEntryCollection;
 TDirEntryCollection =
 OBJECT(TSortedCollection)
 CONSTRUCTOR InitDir(ALimit,ADelta : Integer;
 Path, Mask : STRING;
 Attr : Word);
 CONSTRUCTOR InitCommandLine(ALimit,ADelta : Integer; StartParam : Integer);
 CONSTRUCTOR InitTree(Alimit,ADelta : Integer;

 Path,Mask : STRING;
 Attr : Word);
 FUNCTION Compare(Key1,Key2 : Pointer) : Integer; VIRTUAL;
 PROCEDURE AddDir(Path,Mask : STRING;
 Attr : Word);
 PROCEDURE TreeScan(Path,Mask : STRING; Attr : Word);
 END;

IMPLEMENTATION
VAR
 Stamp : When; { Global When stamp for time/date string processing }
{----------------------------------------}
{ Methods: TDirEntry }
{----------------------------------------}
CONSTRUCTOR TDirEntry.Init(APath : PathStr; ASearchRec : SearchRec);
BEGIN
 TObject.Init;
 Path := APath;
 Entry := ASearchRec;
 FormatDirLine;
END;
PROCEDURE TDirEntry.FormatDirLine;
VAR
 DotPos : Integer;
 WorkString,Blanker : String40;
BEGIN
 Stamp.PutWhenStamp(Entry.Time);
 DirLine := ' ';
 Blanker := DirLine;
 {If the entry has the directory attribute, format differently: }
 IF (Entry.Attr AND $10) <> 0 THEN { Bit 4 is the directory attribute }
 BEGIN
 Insert(Entry.Name,DirLine,1); { No extensions on subdirectory names }
 Insert('<DIR>',DirLine,14) { Tell the world it's a subdirectory }
 END
 ELSE
 {This compound statement separates the file from its extension }
 { and converts the file size to a string. Note that we did not }
 { insert a file size figure into DirLine for subdirectory entries. }
 BEGIN
 DotPos := Pos('.',Entry.Name);
 IF DotPos > 0 THEN { File name has an extension }
 WorkString := Copy(Entry.Name,1,DotPos-1) +
 Copy(Blanker,1,9-DotPos) + '.' +
 Copy(Entry.Name,DotPos+1,Length(Entry.Name)-DotPos)
 ELSE { File name has no extension }
 WorkString := Entry.Name +
 Copy(Blanker,1,8-Length(Entry.Name)) + '.';
 Insert(WorkString,DirLine,1);
 Str(Entry.Size:7,WorkString);
 Insert(WorkString,DirLine,15);
 END;
 Insert(Stamp.GetDateString,DirLine,24);
 Insert(Stamp.GetTimeString,DirLine,34); { Finally, insert the time }
 Delete(DirLine,42,Length(DirLine)-42);
END;
{----------------------------------------}
{ Methods: TDirEntryCollection }
{----------------------------------------}

CONSTRUCTOR TDirEntryCollection.InitDir(ALimit,ADelta : Integer;
 Path,Mask : STRING;
 Attr : Word);
BEGIN
 TSortedCollection.Init(Alimit,ADelta);
 AddDir(Path,Mask,Attr);
END;

CONSTRUCTOR TDirEntryCollection.InitCommandLine(ALimit,ADelta : Integer;
 StartParam : Integer);
VAR
 I : Integer;
 SR : SearchRec; { Defined in the Dos unit }
 DEP : PDirEntry;
 Path : PathStr; { Defined in the Dos unit as STRING[79]; }
 Dir : DirStr; { Defined in the Dos unit as STRING[67]; }
 Name : NameStr; { Defined in the Dos unit as STRING[8]; }
 Ext : ExtStr; { Defined in the Dos unit as STRING[4]; }
BEGIN
 TSortedCollection.Init(ALimit,ADelta);
 FOR I := StartParam TO ParamCount DO
 BEGIN
 FSplit(ParamStr(I),Dir,Name,Ext);
 AddDir(Dir,Name+Ext,AnyFile);
 END;
END;

CONSTRUCTOR TDirEntryCollection.InitTree(Alimit,ADelta : Integer;
 Path,Mask : STRING;
 Attr : Word);
BEGIN
 TSortedCollection.Init(Alimit,ADelta);
 TreeScan(Path,Mask,Attr);
END;

FUNCTION TDirEntryCollection.Compare(Key1,Key2 : Pointer) : Integer;
BEGIN
 IF (PDirEntry(Key1)^.Path + PDirEntry(Key1)^.Entry.Name) =
 (PDirEntry(Key2)^.Path + PDirEntry(Key2)^.Entry.Name) THEN
 Compare := 0
 ELSE
 IF (PDirEntry(Key1)^.Path + PDirEntry(Key1)^.Entry.Name) <
 (PDirEntry(Key2)^.Path + PDirEntry(Key2)^.Entry.Name) THEN
 Compare := -1
 ELSE Compare := 1;
END;

PROCEDURE TDirEntryCollection.AddDir(Path,Mask : STRING;
 Attr : Word);
VAR
 I : Integer;
 SR : SearchRec;
 DEP : PDirEntry;
BEGIN
 FindFirst(Path+Mask,Attr,SR);
 IF DosError = 0 THEN
 BEGIN
 DEP := New(PDirEntry,Init(Path,SR));
 Insert(DEP);

 REPEAT
 FindNext(SR);
 IF DosError = 0 THEN
 BEGIN
 DEP := New(PDirEntry,Init(Path,SR));
 Insert(DEP);
 END;
 UNTIL DosError <> 0;
 END;
END;

PROCEDURE TDirEntryCollection.TreeScan(Path,Mask : STRING;
 Attr : Word);
VAR
 I : Integer;
 SR : SearchRec;
 DEP : PDirEntry;
 NextDirectory : STRING;
BEGIN
 fillchar(SR,sizeof(SR),0);
 { We look for and search any subdirectories first: }
 IF Path <> '\' THEN Path := Path + '\';
 FindFirst(Path+'*.*',Directory,SR);
 WHILE (DOSError <> 2) AND (DOSError <> 18) DO
 BEGIN
 IF ((SR.Attr AND Directory) = Directory )
 AND (SR.Name[1] <> '.') THEN { We have a subdirectory }
 BEGIN
 NextDirectory := Path + SR.Name;
 TreeScan(NextDirectory,Mask,Attr);
 END;
 FindNext(SR);
 END;
 { At this point, we're in a directory that has no unsearched }
 { subdirectories, so we can search for files matching Mask: }
 AddDir(Path,Mask,Attr);
END;

{ No initialization section }
END.






[LISTING TWO]

PROGRAM JFind; { By Jeff Duntemann; from DDJ 4/92 }

USES DOS,CRT,Printer, { Standard Borland units }
 DirList; { From DDJ for 4/92 }
VAR
 Console : TEXT;
 FileSpecs : STRING;
 I : Integer;
 FilesFound : PDirEntryCollection;

PROCEDURE DisplayFoundFiles(FilesFound : PDirEntryCollection);

{ This is the FAR local routine passed to the iterator method. }
{ It's called once for each item in the collection: }
PROCEDURE DisplayOneFile(Target : PDirEntry); FAR;
BEGIN
 Write(Console,Copy(Target^.DirLine,13,Length(Target^.DirLine)),' ');
 Writeln(Console,Target^.Path,Target^.Entry.Name);
END;

BEGIN
 { This is how you iterate a procedure over a collection: }
 FilesFound^.ForEach(@DisplayOneFile);
END;

BEGIN { JFIND Main }
 Assign(Console,''); { This allows us to use Standard Output }
 Rewrite(Console); { for Write/Writeln statements }
 IF ParamCount = 0 THEN
 BEGIN
 Writeln(Console,'>>>JFIND<<< by Jeff Duntemann');
 Writeln(Console);
 Writeln(Console,'Invocation syntax:');
 Writeln(Console);
 Writeln(Console,' JFIND <filespec>,[<filespec>..] CR');
 Writeln(Console);
 END
 ELSE
 BEGIN
 FilesFound := New(PDirEntryCollection,Init(256,16));
 FOR I := 1 TO ParamCount DO
 FilesFound^.TreeScan('\',ParamStr(I),AnyFile);
 IF FilesFound^.Count > 0 THEN
 BEGIN
 DisplayFoundFiles(FilesFound);
 Writeln(Console);
 END
 ELSE
 Writeln(Console,'No files match that file spec.');
 END;
END.























April, 1992
GRAPHICS PROGRAMMING


Raw Speed and More




Michael Abrash


As usual, there's an awful lot to cover this month, so I'll have to make this
quick. That's a shame, because this absolutely true story is even lovelier in
the long version, as told by -- well, I promised I wouldn't even hint at who
he is, for reasons that will soon be obvious.
Years ago, this friend of mine -- let's call him Bert -- went to Hawaii with
three other fellows to celebrate their graduation from high school. This was
an unchaperoned trip, and they behaved pretty much as responsibly as you'd
expect four teenagers to behave, which is to say, not; there's a story about a
rental car that, to this day, Bert can't bring himself to tell. They had a
good time, though, save for one thing: no girls.
By and by, they met a group of girls by the pool, but the boys couldn't get
past the hi-howya-doin stage, so they retired to their hotel room to plot a
better approach. This being the early '70s, and them being slightly tipsy
teenagers with ranging hormones and the effective combined IQ of four
eggplants, it took them no time at all to come up with a brilliant plan:
streaking. The girls had mentioned their room number, so the boys piled into
the elevator, pushed the button for the girls' floor, shucked their clothes as
fast as they could, and sprinted to the girls' door. They knocked on the door
and ran on down the hall. As the girls opened their door, Bert and his crew
raced past, toward the elevator, laughing hysterically.
Bert was by far the fastest of them all. He whisked between the elevator doors
just as they started to close; by the time his friends got there, it was too
late, and the doors slid shut in their faces. As the elevator began to move,
Bert could hear the frantic pounding of six fists thudding on the closed
doors. As Bert stood among the clothes littering the elevator floor, the
thought of his friends stuck in the hall, naked as jaybirds, was just too
much, and he doubled over with helpless laughter, tears streaming down his
face. The universe had blessed him with one of those exceedingly rare moments
of perfect timing and execution.
The universe wasn't done with Bert quite yet, though. He was still contorted
with laughter -- and still quite thoroughly undressed -- when the elevator
doors opened again. On the lobby.
And with that, we come to this month's topics, raw speed and hidden surfaces.


Raw Speed, Part 1: Assembly Language


I would like to state, here and for the record, that I am not an assembly
language fanatic. Frankly, I prefer programming in C; assembly language is
hard work, and I can get a whole lot more done with fewer hassles in C.
However, I am a performance fanatic, performance being defined as having
programs be as nimble as possible in those areas where the user wants fast
response. And, in the course of pursuing performance, there are times when a
little assembly language goes a long way.
We're now in the fourth month of working on the X-Sharp 3-D animation package.
In real-time animation, performance is sine qua non -- Latin for "Make it fast
or find another line of work" -- so some judiciously applied assembly language
is in order. Last month, we got up to a serviceable performance level by
switching to fixed-point math, then implementing the fixed-point
multiplication and division functions in assembler in order to take advantage
of the 386's 32-bit capabilities. There's another area of the program that
fairly cries out for assembly language: matrix math. The function to multiply
a matrix by a vector (XformVec()) and the function to concatenate matrices
(ConcatXforms()) both loop heavily around calls to fixedMul(); a lot of
calling and looping can be eliminated by converting these functions to pure
assembly language.
Listing One, page 157, is the module FIXED.ASM from the current version of
X-Sharp, with XformVec() and ConcatXforms() implemented in assembly language.
The code is heavily optimized, to the extent of completely unrolling the loops
via macros so that looping is eliminated altogether. FIXED.ASM is highly
effective; the time taken for matrix math is now down to the point where it's
a fairly minor component of execution time, representing less than ten percent
of the total. It's time to turn our optimization sights elsewhere.


Raw Speed, Part II: Look it Up


It's a funny thing about Turbo Profiler: Time spent in the Borland C++ 80x87
emulator doesn't show up directly anywhere that I can see in the timing
results. The only way to detect it is by way of the line that reports what
percent of total time is represented by all the areas that were profiled; if
you're profiling all areas, whatever's not explicitly accounted for seems to
be the floating-point emulator time. This quirk fooled me for a while, leading
me to think sine and cosine weren't major drags on performance, because the
Sin() and cos() functions spend most of their time in the emulator, and that
time doesn't show up in Turbo Profiler's statistics on those functions. Once I
figured out what was going on, it turned out that not only were sin() and
cos() major drags, they were taking up over half the total execution time by
themselves.
The solution is a lookup table. Listing One contains a function called
CosSin() that calculates both the sine and cosine of an angle, via a lookup
table. The function accepts angles in tenths of degrees; I decided to use
tenths of degrees rather than radians because that way it's always possible to
look up the sine and cosine of the exact angle requested, rather than
approximating, as would be required with radians. Tenths of degrees should be
fine enough control for most purposes; if not, it's easy to alter CosSin() for
finer gradations yet. GENCOS.C, the program used to generate the lookup table
(COSTABLE.INC), included in Listing One, can be found in the X-Sharp archive.
GENCOS.C can generate a cosine table with any integral number of steps per
degree.
FIXED.ASM speeds X-Sharp up quite a bit, and it changes the performance
balance a great deal. When we started out with 3-D animation, calculation time
was the dragon we faced; more than 90 percent of the total time was spent
doing matrix and projection math. Additional optimizations in the area of math
could still be made (using 32-bit multiplies in the backface-removal code, for
example), but fixed-point math, the sine and cosine lookup, and selective
assembler optimizations have done a pretty good job already. The bulk of the
time taken by X-Sharp is now spent drawing polygons, drawing rectangles (to
erase objects), and waiting for the page to flip. In other words, we've slain
the dragon of 3-D math, or at least wounded it grievously; now we're back to
the dragon of polygon filling. We'll address faster polygon filling soon, but
for the moment, we have more than enough horsepower to have some fun with.
First, though, we need one more feature: hidden surfaces.


Hidden Surfaces


So far, we've made a number of simplifying assumptions in order to get the
animation to look good; for example, all objects must currently be convex
polyhedrons. We'll deal with that one down the road a little, but first we
have to address a still more fundamental limitation, which is that right now,
objects can never pass behind or in front of each other. In short, it's time
to have a look at hidden surfaces.
There are a passel of ways to do hidden surfaces. Way off at one end (the slow
end) of the spectrum is z-buffering, whereby each pixel of each polygon is
checked as it's drawn to see whether it's the frontmost version of the pixel
at those coordinates. At the other end is the technique of simply drawing the
objects in back-to-front order, so that nearer objects are drawn on top of
farther objects. The latter approach, depth sorting, is the one we'll take
today. (Actually, true depth sorting involves detecting and resolving possible
ambiguities when objects overlap in z; simply sorting on z, which we'll be
doing this month, is more precisely known as the "painter's algorithm.")
Depth sorting is fast but less than perfect. For one thing, it doesn't address
the issue of nonconvex objects, so we'll have to stick with convex polyhedrons
for now. For another, there's the question of what part of each object to use
as the sorting key; the nearest point, the center, and the farthest point are
all possibilities -- and, whichever point is used, depth sorting doesn't
handle some overlap cases properly. Figure 1 illustrates one case in which
back-to-front sorting doesn't work, regardless of what point is used as the
sorting key.
For photo-realistic rendering, these are serious problems.
For fast PC-based animation, however, they're manageable. Choose objects that
aren't too elongated; arrange their paths of travel so they don't intersect in
problematic ways; and, if they do overlap incorrectly, trust that the glitch
will be lost in the speed of the animation and the complexity of the screen.
Listing Two, page 159, shows the key routines for depth sorting, from the
X-Sharp file OLIST.C. Objects are now stored in a linked list. The initial,
empty list, created by InitializeObjectList(), consists of a sentinel entry at
either end, one at the farthest possible z coordinate, and one at the nearest.
New entries are inserted by AddObject() in z-sorted order. Each time the
objects are moved, before they're drawn at their new locations, SortObjects()
is called to z-sort the object list, so that drawing will proceed from back to
front. The z-sorting is done on the basis of the objects' center points; a
center-point field has been added to the object structure to support this, and
the center point for each object is now transformed along with the vertices.
That's really all there is to depth sorting -- and now we can have objects
that overlap in x and y.
I wish I could reproduce the latest X-Sharp demonstration program here; I
really do. It's a lot of fun, with 11 spinning cubes and a whirling faceted
ball bouncing back and forth between the screen and deep space at varying
speeds. The number of faces has nearly doubled since last month, to 138, and
the number of vertices is up to 150, but thanks to FIXED.ASM, the animation is
snappier than ever, and of course there's a much-enhanced sense of depth,
thanks to the visual cues made possible by depth sorting. We've reached the
point of genuinely dynamic animation on a 386, even with a slowpoke VGA; my
daughter thinks it's a little scary having those things hurtling toward her so
fast. On a 486 with a fast (Tseng ET-4000) VGA, the animation is too fast; the
illusion of reality is broken. We should all have such problems.
I'd like to list the demo program in its entirety, but I can't. As I explained
last month, X-Sharp is way too large, over 2000 lines now; even the changes
since last month would blow my page budget out of the water. Instead, I've
made the full source available in the file XSHARPn.ARC in the DDJ Forum on
CompuServe, on M&T Online, and in the graphic.disp conference on Bix.
Alternatively, you can send me a 360K or 720K formatted diskette and an
addressed, stamped diskette mailer, care of DDJ, 411 Borel Ave., San Mateo, CA
94402, and I'll send you the latest copy of X-Sharp. There's no charge, but
it'd be very much appreciated if you'd slip in a dollar or so to help out the
folks at the Vermont Association for the Blind and Visually Impaired. It's not
every day that you can make a difference so easily -- and, believe me, their
works does make a difference.
I'm available on a daily basis to discuss X-Sharp on M&T Online and Bix (user
name mabrash in both cases).


Rounding


FIXED.ASM contains the equate ROUNDING_ON. When this equate is 1, the results
of multiplications and divisions are rounded to the nearest fixed-point
values; when it's 0, the results are truncated. The difference between the
results produced by the two approaches is, at most, 2{-16}; you wouldn't think
that would make much difference, now, would you? But it does. When the
animation is run with rounding disabled, the cubes start to distort visibly
after a few minutes, and after a few minutes more they look like they've been
run over. In contrast, I've never seen any significant distortion with
rounding on, even after a half-hour or so. I think the difference with
rounding is not that it's so much more accurate, but rather that the errors
are evenly distributed; with truncation, the errors are biased, and biased
errors become very visible when they're applied to right-angle objects. Even
with rounding, though, the errors will eventually creep in, and
reorthogonalization will become necessary at some point. We'll have a look at
that soon.
The performance cost of rounding is small, and the benefits are highly
visible. Still, truncation errors become significant only when they accumulate
over time, as, for example, when rotation matrices are repeatedly concatenated
over the course of many transformations. Some time could be saved by rounding
only in such cases. For example, division is performed only in the course of
projection, and the results do not accumulate over time, so it would be
reasonable to disable rounding for division.


Having a Ball



For three months, we've had nothing to look at but triangles and cubes. It's
time for something a little more visually appealing, so the demonstration
program now features a 72-sided ball. What's particularly interesting about
this ball is that it's created by the GENBALL.C program in the BALL
subdirectory of X-Sharp, and both the size of the ball and the number of bands
of faces are programmable. GENBALL.C spits out to a file all the arrays of
vertices and faces needed to create the ball, ready for inclusion in
INITBALL.C. True, if you change the number of bands, you must change the Color
array in INITBALL.C to match, but that's a tiny detail; by and large, the
process of generating a ballshaped object is now automated. In fact, we're not
limited to ball-shaped objects; substitute a different vertex and face
generation program for GENBALL.C, and you can make whatever convex polyhedron
you want; again, all you have to do is change the Color array correspondingly.
You can easily create multiple versions of the base object, too; INITCUBE.C is
an example of this, creating 11 different cubes.
What we have here is the first glimmer of an object-editing system. GENBALL.C
is the prototype for object definition, and INITBALL.C is the prototype for
general-purpose object instantiation. Certainly, it would be nice to have an
interactive 3-D object editing tool and resource management setup, and perhaps
we'll do that someday. We have our hands full with the drawing end of things
at the moment, though, and for now it's enough to be able to create objects in
a semiautomated way.


Eating Crow and Other Delicacies


Back in November, I said, "Hicolor programming unavoidable requires handling
broken rasters" (Hicolor mode features 32K colors, by means of the Sierra
Hicolor DAC combined with a Hicolor-capable Super-VGA.) Any absolute assertion
on my part seems to be the cue for readers to come swarming out of the
woodworks with examples to the contrary. In this case, M&T Online user
rfrederick was quick to point out that setting the bitmap width -- the offset
from the start of one line to the start of the next -- to 1928 bytes in
640x480 Hicolor mode eliminates broken rasters (displayed lines that span
banks). A bitmap width of 1928 is selected by setting the Row Offset register
(CRTC register 13h) to 241, like so: outpw(Ox3d4, Ox13 \ (241<<8));. Once the
width is set, all you have to do is use 1928 for the offset from one row to
the next, rather than 1280, and you're all set. Actually, the breaks are still
there, but they can be ignored because they happen off to the right of the
displayed portion of the bitmap. To get the Hicolor drawing code from the
November 1991 column to support 1928-wide Hicolor bitmaps, just change
BitmapWidthInBytes from 640*2 to 1928.
rfrederick credits this information to the folks at Everex. Thanks to all
involved for passing it along.


Coming Up


There are a lot of wonderful things to add to X-Sharp, including shading of
many sorts, antialiasing, support for non-convex objects, faster polygon
drawing and clipping, 3-D clipping, texture mapping, still better performance
-- you get the idea. Personally, I'm excited as all get-out about this stuff,
and I'll certainly cover more of it in the near future. Graphics is a broad
field, though, and you folks have diverse interests; I'd like to hear what
you're interested in seeing in this space. More 3-D animation? More animation
of other types? Programming techniques for PC graphics hardware, such as
24-bpp VGAs, the S3 Super-VGA accelerator, and XGA? JPEG? Graphics operations
such as seed fills and fast (and I do mean fast!) line drawing? Color mapping
to a 256-color palette? Dithering? Something else?
It's a big list, because this is a big -- and tremendously exciting -- field.
So long as there's code to be written for a topic (after all, this is the
Graphics Programming column), I'm game. Let me know what you'd like to see.


_GRAPHICS PROGRAMMING COLUMN_
by Michael Abrash



[LISTING ONE]

; 386-specific fixed point routines.
; Tested with TASM 3.0.
ROUNDING_ON equ 1 ;1 for rounding, 0 for no rounding
 ;no rounding is faster, rounding is
 ; more accurate
ALIGNMENT equ 2
 .model small
 .386
 .code
;=====================================================================
; Multiplies two fixed-point values together.
; C near-callable as:
; Fixedpoint FixedMul(Fixedpoint M1, Fixedpoint M2);
; Fixedpoint FixedDiv(Fixedpoint Dividend, Fixedpoint Divisor);
FMparms struc
 dw 2 dup(?) ;return address & pushed BP
M1 dd ?
M2 dd ?
FMparms ends
 align ALIGNMENT
 public _FixedMul
_FixedMul proc near
 push bp
 mov bp,sp
 mov eax,[bp+M1]
 imul dword ptr [bp+M2] ;multiply
if ROUNDING_ON
 add eax,8000h ;round by adding 2^(-17)
 adc edx,0 ;whole part of result is in DX
endif ;ROUNDING_ON
 shr eax,16 ;put the fractional part in AX
 pop bp

 ret
_FixedMul endp
;=====================================================================
; Divides one fixed-point value by another.
; C near-callable as:
; Fixedpoint FixedDiv(Fixedpoint Dividend, Fixedpoint Divisor);
FDparms struc
 dw 2 dup(?) ;return address & pushed BP
Dividend dd ?
Divisor dd ?
FDparms ends
 align ALIGNMENT
 public _FixedDiv
_FixedDiv proc near
 push bp
 mov bp,sp

if ROUNDING_ON
 sub cx,cx ;assume positive result
 mov eax,[bp+Dividend]
 and eax,eax ;positive dividend?
 jns FDP1 ;yes
 inc cx ;mark it's a negative dividend
 neg eax ;make the dividend positive
FDP1: sub edx,edx ;make it a 64-bit dividend, then shift
 ; left 16 bits so that result will be in EAX
 rol eax,16 ;put fractional part of dividend in
 ; high word of EAX
 mov dx,ax ;put whole part of dividend in DX
 sub ax,ax ;clear low word of EAX
 mov ebx,dword ptr [bp+Divisor]
 and ebx,ebx ;positive divisor?
 jns FDP2 ;yes
 dec cx ;mark it's a negative divisor
 neg ebx ;make divisor positive
FDP2: div ebx ;divide
 shr ebx,1 ;divisor/2, minus 1 if the divisor is
 adc ebx,0 ; even
 dec ebx
 cmp ebx,edx ;set Carry if remainder is at least half as
 adc eax,0 ; large as the divisor, then use that to
 ; round up if necessary
 and cx,cx ;should the result be made negative?
 jz FDP3 ;no
 neg eax ;yes, negate it
FDP3:
else ;!ROUNDING_ON
 mov edx,[bp+Dividend]
 sub eax,eax
 shrd eax,edx,16 ;position so that result ends up
 sar edx,16 ; in EAX
 idiv dword ptr [bp+Divisor]
endif ;ROUNDING_ON
 shld edx,eax,16 ;whole part of result in DX;
 ; fractional part is already in AX
 pop bp
 ret
_FixedDiv endp
;=====================================================================

; Returns the sine and cosine of an angle.
; C near-callable as:
; void CosSin(TAngle Angle, Fixedpoint *Cos, Fixedpoint *);

 align ALIGNMENT
CosTable label dword
 include costable.inc

SCparms struc
 dw 2 dup(?) ;return address & pushed BP
Angle dw ? ;angle to calculate sine & cosine for
Cos dw ? ;pointer to cos destination
Sin dw ? ;pointer to sin destination
SCparms ends

 align ALIGNMENT
 public _CosSin
_CosSin proc near
 push bp ;preserve stack frame
 mov bp,sp ;set up local stack frame

 mov bx,[bp].Angle
 and bx,bx ;make sure angle's between 0 and 2*pi
 jns CheckInRange
MakePos: ;less than 0, so make it positive
 add bx,360*10
 js MakePos
 jmp short CheckInRange

 align ALIGNMENT
MakeInRange: ;make sure angle is no more than 2*pi
 sub bx,360*10
CheckInRange:
 cmp bx,360*10
 jg MakeInRange

 cmp bx,180*10 ;figure out which quadrant
 ja BottomHalf ;quadrant 2 or 3
 cmp bx,90*10 ;quadrant 0 or 1
 ja Quadrant1
 ;quadrant 0
 shl bx,2
 mov eax,CosTable[bx] ;look up sine
 neg bx ;sin(Angle) = cos(90-Angle)
 mov edx,CosTable[bx+90*10*4] ;look up cosine
 jmp short CSDone

 align ALIGNMENT
Quadrant1:
 neg bx
 add bx,180*10 ;convert to angle between 0 and 90
 shl bx,2
 mov eax,CosTable[bx] ;look up cosine
 neg eax ;negative in this quadrant
 neg bx ;sin(Angle) = cos(90-Angle)
 mov edx,CosTable[bx+90*10*4] ;look up cosine
 jmp short CSDone

 align ALIGNMENT

BottomHalf: ;quadrant 2 or 3
 neg bx
 add bx,360*10 ;convert to angle between 0 and 180
 cmp bx,90*10 ;quadrant 2 or 3
 ja Quadrant2
 ;quadrant 3
 shl bx,2
 mov eax,CosTable[bx] ;look up cosine
 neg bx ;sin(Angle) = cos(90-Angle)
 mov edx,CosTable[90*10*4+bx] ;look up sine
 neg edx ;negative in this quadrant
 jmp short CSDone

 align ALIGNMENT
Quadrant2:
 neg bx
 add bx,180*10 ;convert to angle between 0 and 90
 shl bx,2
 mov eax,CosTable[bx] ;look up cosine
 neg eax ;negative in this quadrant
 neg bx ;sin(Angle) = cos(90-Angle)
 mov edx,CosTable[90*10*4+bx] ;look up sine
 neg edx ;negative in this quadrant
CSDone:
 mov bx,[bp].Cos
 mov [bx],eax
 mov bx,[bp].Sin
 mov [bx],edx

 pop bp ;restore stack frame
 ret
_CosSin endp
;=====================================================================
; Matrix multiplies Xform by SourceVec, and stores the result in
; DestVec. Multiplies a 4x4 matrix times a 4x1 matrix; the result
; is a 4x1 matrix. Cheats by assuming the W coord is 1 and the
; bottom row of the matrix is 0 0 0 1, and doesn't bother to set
; the W coordinate of the destination.
; C near-callable as:
; void XformVec(Xform WorkingXform, Fixedpoint *SourceVec,
; Fixedpoint *DestVec);
;
; This assembly code is equivalent to this C code:
; int i;
;
; for (i=0; i<3; i++)
; DestVec[i] = FixedMul(WorkingXform[i][0], SourceVec[0]) +
; FixedMul(WorkingXform[i][1], SourceVec[1]) +
; FixedMul(WorkingXform[i][2], SourceVec[2]) +
; WorkingXform[i][3]; /* no need to multiply by W = 1 */

XVparms struc
 dw 2 dup(?) ;return address & pushed BP
WorkingXform dw ? ;pointer to transform matrix
SourceVec dw ? ;pointer to source vector
DestVec dw ? ;pointer to destination vector
XVparms ends

 align ALIGNMENT

 public _XformVec
_XformVec proc near
 push bp ;preserve stack frame
 mov bp,sp ;set up local stack frame
 push si ;preserve register variables
 push di

 mov si,[bp].WorkingXform ;SI points to xform matrix
 mov bx,[bp].SourceVec ;BX points to source vector
 mov di,[bp].DestVec ;DI points to dest vector

soff=0
doff=0
 REPT 3 ;do once each for dest X, Y, and Z
 mov eax,[si+soff] ;column 0 entry on this row
 imul dword ptr [bx] ;xform entry times source X entry
if ROUNDING_ON
 add eax,8000h ;round by adding 2^(-17)
 adc edx,0 ;whole part of result is in DX
endif ;ROUNDING_ON
 shrd eax,edx,16 ;shift the result back to 16.16 form
 mov ecx,eax ;set running total

 mov eax,[si+soff+4] ;column 1 entry on this row
 imul dword ptr [bx+4] ;xform entry times source Y entry
if ROUNDING_ON
 add eax,8000h ;round by adding 2^(-17)
 adc edx,0 ;whole part of result is in DX
endif ;ROUNDING_ON
 shrd eax,edx,16 ;shift the result back to 16.16 form
 add ecx,eax ;running total for this row

 mov eax,[si+soff+8] ;column 2 entry on this row
 imul dword ptr [bx+8] ;xform entry times source Z entry
if ROUNDING_ON
 add eax,8000h ;round by adding 2^(-17)
 adc edx,0 ;whole part of result is in DX
endif ;ROUNDING_ON
 shrd eax,edx,16 ;shift the result back to 16.16 form
 add ecx,eax ;running total for this row

 add ecx,[si+soff+12] ;add in translation
 mov [di+doff],ecx ;save the result in the dest vector
soff=soff+16
doff=doff+4
 ENDM

 pop di ;restore register variables
 pop si
 pop bp ;restore stack frame
 ret
_XformVec endp
;=====================================================================
; Matrix multiplies SourceXform1 by SourceXform2 and stores the
; result in DestXform. Multiplies a 4x4 matrix times a 4x4 matrix;
; the result is a 4x4 matrix. Cheats by assuming the bottom row of
; each matrix is 0 0 0 1, and doesn't bother to set the bottom row
; of the destination.
; C near-callable as:

; void ConcatXforms(Xform SourceXform1, Xform SourceXform2,
; Xform DestXform)
;
; This assembly code is equivalent to this C code:
; int i, j;
;
; for (i=0; i<3; i++) {
; for (j=0; j<4; j++)
; DestXform[i][j] =
; FixedMul(SourceXform1[i][0], SourceXform2[0][j]) +
; FixedMul(SourceXform1[i][1], SourceXform2[1][j]) +
; FixedMul(SourceXform1[i][2], SourceXform2[2][j]) +
; SourceXform1[i][3];
; }

CXparms struc
 dw 2 dup(?) ;return address & pushed BP
SourceXform1 dw ? ;pointer to first source xform matrix
SourceXform2 dw ? ;pointer to second source xform matrix
DestXform dw ? ;pointer to destination xform matrix
CXparms ends

 align ALIGNMENT
 public _ConcatXforms
_ConcatXforms proc near
 push bp ;preserve stack frame
 mov bp,sp ;set up local stack frame
 push si ;preserve register variables
 push di

 mov bx,[bp].SourceXform2 ;BX points to xform2 matrix
 mov si,[bp].SourceXform1 ;SI points to xform1 matrix
 mov di,[bp].DestXform ;DI points to dest xform matrix

roff=0 ;row offset
 REPT 3 ;once for each row
coff=0 ;column offset
 REPT 4 ;once for each column
 mov eax,[si+roff] ;column 0 entry on this row
 imul dword ptr [bx+coff] ;times row 0 entry in column
if ROUNDING_ON
 add eax,8000h ;round by adding 2^(-17)
 adc edx,0 ;whole part of result is in DX
endif ;ROUNDING_ON
 shrd eax,edx,16 ;shift the result back to 16.16 form
 mov ecx,eax ;set running total

 mov eax,[si+roff+4] ;column 1 entry on this row
 imul dword ptr [bx+coff+16] ;times row 1 entry in col
if ROUNDING_ON
 add eax,8000h ;round by adding 2^(-17)
 adc edx,0 ;whole part of result is in DX
endif ;ROUNDING_ON
 shrd eax,edx,16 ;shift the result back to 16.16 form
 add ecx,eax ;running total

 mov eax,[si+roff+8] ;column 2 entry on this row
 imul dword ptr [bx+coff+32] ;times row 2 entry in col
if ROUNDING_ON

 add eax,8000h ;round by adding 2^(-17)
 adc edx,0 ;whole part of result is in DX
endif ;ROUNDING_ON
 shrd eax,edx,16 ;shift the result back to 16.16 form
 add ecx,eax ;running total

 add ecx,[si+roff+12] ;add in translation

 mov [di+coff+roff],ecx ;save the result in dest matrix
coff=coff+4 ;point to next col in xform2 & dest
 ENDM

roff=roff+16 ;point to next col in xform2 & dest
 ENDM

 pop di ;restore register variables
 pop si
 pop bp ;restore stack frame
 ret
_ConcatXforms endp
 end






[LISTING TWO]

/* Object list-related functions. */
#include <stdio.h>
#include "polygon.h"

/* Set up the empty object list, with sentinels at both ends to
 terminate searches */
void InitializeObjectList()
{
 ObjectListStart.NextObject = &ObjectListEnd;
 ObjectListStart.PreviousObject = NULL;
 ObjectListStart.CenterInView.Z = INT_TO_FIXED(-32768);
 ObjectListEnd.NextObject = NULL;
 ObjectListEnd.PreviousObject = &ObjectListStart;
 ObjectListEnd.CenterInView.Z = 0x7FFFFFFFL;
 NumObjects = 0;
}

/* Adds an object to the object list, sorted by center Z coord. */
void AddObject(Object *ObjectPtr)
{
 Object *ObjectListPtr = ObjectListStart.NextObject;

 /* Find the insertion point. Guaranteed to terminate because of
 the end sentinel */
 while (ObjectPtr->CenterInView.Z > ObjectListPtr->CenterInView.Z) {
 ObjectListPtr = ObjectListPtr->NextObject;
 }

 /* Link in the new object */
 ObjectListPtr->PreviousObject->NextObject = ObjectPtr;

 ObjectPtr->NextObject = ObjectListPtr;
 ObjectPtr->PreviousObject = ObjectListPtr->PreviousObject;
 ObjectListPtr->PreviousObject = ObjectPtr;
 NumObjects++;
}

/* Resorts the objects in order of ascending center Z coordinate in view
space,
 by moving each object in turn to the correct position in the object list. */
void SortObjects()
{
 int i;
 Object *ObjectPtr, *ObjectCmpPtr, *NextObjectPtr;

 /* Start checking with the second object */
 ObjectCmpPtr = ObjectListStart.NextObject;
 ObjectPtr = ObjectCmpPtr->NextObject;
 for (i=1; i<NumObjects; i++) {
 /* See if we need to move backward through the list */
 if (ObjectPtr->CenterInView.Z < ObjectCmpPtr->CenterInView.Z) {
 /* Remember where to resume sorting with the next object */
 NextObjectPtr = ObjectPtr->NextObject;
 /* Yes, move backward until we find the proper insertion
 point. Termination guaranteed because of start sentinel */
 do {
 ObjectCmpPtr = ObjectCmpPtr->PreviousObject;
 } while (ObjectPtr->CenterInView.Z <
 ObjectCmpPtr->CenterInView.Z);

 /* Now move the object to its new location */
 /* Unlink the object at the old location */
 ObjectPtr->PreviousObject->NextObject =
 ObjectPtr->NextObject;
 ObjectPtr->NextObject->PreviousObject =
 ObjectPtr->PreviousObject;

 /* Link in the object at the new location */
 ObjectCmpPtr->NextObject->PreviousObject = ObjectPtr;
 ObjectPtr->PreviousObject = ObjectCmpPtr;
 ObjectPtr->NextObject = ObjectCmpPtr->NextObject;
 ObjectCmpPtr->NextObject = ObjectPtr;

 /* Advance to the next object to sort */
 ObjectCmpPtr = NextObjectPtr->PreviousObject;
 ObjectPtr = NextObjectPtr;
 } else {
 /* Advance to the next object to sort */
 ObjectCmpPtr = ObjectPtr;
 ObjectPtr = ObjectPtr->NextObject;
 }
 }
}







































































April, 1992
PROGRAMMER'S BOOKSHELF


Beyond the Official Rules




Andrew Schulman


If you wanted to learn to play a nice game of chess, you could, I suppose, get
a copy of the rules of the game and stare at them good and hard until you
became an expert player. I suspect, however, that you would do a lot better by
only briefly examining the rules and then playing hundreds of games, thinking
each time about why you lost and how you perhaps could have won.
You might also realize that others must have traveled this path before you,
and that they must have written down some of what they learned. In other
words, beyond the actual rules of the game, which can be spelled out precisely
in just a few pages, there must be a much larger body of written material that
spells out some of the complex implications of those simple rules. You know:
Move your pawn in this way, and your king will suffer a general-protection
violation in the next move; that sort of thing.
Visit your local bookstore, go to the sports and games section, and open up a
few books on chess. They are not mere restatements of the rules of the game.
Because the rules of chess occupy less than ten pages, no publisher in the
world could get away with restating the official rules and then reissuing them
in a flashy cover.
Now go to the store's section of computer books. (It is a commentary on the
sad state of the world that this section will probably be much larger than the
sports and games section!) Randomly select a few books on DOS or Windows
programming. The chances are quite good that these will be mere restatements
of Microsoft's DOS and Windows programming manuals, that is, of the official
rules for the games of DOS and Windows programming.
All this is just a roundabout way of saying that computer books ought to
convey some experience the author has had, some strategies he or she has
developed, some tricks or a style, or some traps to watch out for. There is no
point in writing books containing the official rules.
There is no point in buying or reading such books, either; that's what the
manuals are for. Now, I know, the manuals stink. Of course they do! But, if
what you mean by "stink" is "don't tell me what I need to know to write my
program," then the manuals are sort of supposed to stink. The rules of chess
tell you how the pieces move, not how to play the game. Likewise, if you want
something that shows how the moves of programming can be used in combination,
or what move to perhaps use in a given situation, then you need something
other than a manual: either a knowledgeable, patient coworker who's been
through it already, or a book whose author plays this same role.
This month, then, I would like to introduce you to a few new coworkers who may
-- I know, I'm getting my metaphors totally confused here -- give you a few
pointers the next time you sit down to play the Game of DOS or the Game of
Windows.


Writing Reusable Code


I normally don't buy product-specific computer books with titles such as Using
Microsoft Assembler 6.0A or Borland C++ 3.O for Programmers (programmers, as
opposed to, say, aerobics instructors or district regional sales managers who
might be using a C++ compiler). Such books are, of course, a perfect
invitation to the "let's just reissue the manuals" syndrome I'm talking about.
I'm also no longer much of a Turbo Pascal programmer. (It was once all I used,
until DDJ technical editor Ray Valdes told me about this language called C. )
Just the same, I bought Neil Rubenking's Turbo Pascal 6.0 Techniques and
Utilities the moment it came out. I'd had several e-mail exchanges with Neil
while he was working on the book, and it seemed like he was putting a hell of
a lot of original material into it. Pretty unusual, when anyone with enough
coordination to manage a handheld scanner seems to be able to "write" a book
these days.
Rubenking (who is technical editor of PC Magazine) has written an absolutely
superb book on advanced Turbo Pascal programming. This book is a model of how
product-specific computer books ought to be written. There's nothing in here
that Neil hasn't checked out himself, seemingly nothing that was just lifted
from Borland's manuals. In addition to Turbo Pascal itself, Rubenking provides
detailed coverage of the Turbo Vision application framework, the BASM
assembler, and the third-party Object Professional library from Turbo-Power.
The main thrust of Rubenking's book is to show how the object-oriented aspects
of Turbo Pascal can solve real-world programming problems. The best examples
of this are the book's two lengthy discussions of "data structures that map
DOS information" and "objects for access to DOS data." Using object-oriented
methodology to tackle something grungy such as undocumented DOS internals?
That's right. About 70 pages are devoted to showing how Turbo Pascal objects
can put a consistent, reusable interface on some of the most inconsistent,
non-reusable data structures around: undocumented DOS internals. Because
undocumented DOS data structures change from one version of the DOS to the
next, and because a program doesn't know what version it's dealing with until
run time, code that works with undocumented DOS is usually sprinkled with
version checks, fields in data structures treated as byte offsets rather than
fields, and so on. Neil's book shows the right way to manage these structures;
his code for Turbo Pascal is applicable to C++.
The emphasis in this book is always on library building, that is, on how to
write reusable code. In addition to my favorite, the DOS-access objects, two
other nice examples are a cellular-automation package and an
arbitrary-precision math package. There is an entire chapter on performance,
excellent discussions of using TResourceFile, and a good introduction to
internationalization.
Neil's book made me once again feel that maybe I ought to be using Turbo
Pascal.


Revealing Chip Bugs


It must by now be difficult to write a reference manual for the Intel 8Ox86
microprocessor family without restating what's appeared in countless previous
books on the subject. Robert Hummel's new reference, The Processor and
Co-processor, naturally has no choice but to repeat a lot of what's already
available in other books on the Intel architecture. After all, what can you
say about instructions such as LAR or FSAVE that isn't in Intel's manuals?
Well, it turns out that there's one quite important thing that Intel doesn't
talk about in its manuals or any other public forum: BUGS! Yes, like any other
piece of complex software, the microcode in the Intel microprocessors has
bugs. While Intel does provide "errata" for these chip anomalies under a
nondisclosure agreement, it is naturally reluctant to make such information
widely available.
Hummel's book now does just that. Besides an entire chapter on "Bugs and
Incompatibilities," many of the individual reference entries for the processor
and coprocessor instructions contain notes on bugs and undocumented features.
The way in which these chip bugs are presented does seem a little odd at
times. Intel gives these bugs "errata" numbers, but these numbers do not
appear in Hummel's book. Thus, the two best-known bugs, Erratum 17 and Erratum
21, are of course described, but not under those names. In addition, the
multiply bug found on some B1 steppings of the 80386 (including the one I'm
using right now), does not seem to appear anywhere in the book. Incidentally,
code to test for the multiply bug can be found in the excellent, but now sadly
difficult-to-find book, Advanced 80386 Programming Techniques, by James L.
Turley (Osborne/McGraw-Hill, 1988).
Hummel's book contains a handy program to test for the processor and
coprocessor type. If you also need to check for individual step levels such as
B1, check out Jeff Prosise's "Tutor" column in the February 11, 1992 issue of
PC Magazine.
It's interesting to compare Hummel's book with Rakesh Agarwal's 8Ox86
Architecture and Programming (Prentice Hall, 1991), which I reviewed in the
March 1991 issue of DDJ. Hummel's treatment of bugs, incompatibilities, and
anomalies is nowhere to be found in Agarwal's book. On the other hand, while
the individual instruction descriptions in Hummel and Agarwal both include an
section on the "algorithm" used by the instruction, where Hummel might say
"Algorithm: None," Agarwal will show three pages of dense pseudocode. There's
actually something to be said for both approaches; Agarwal can be a little
overwhelming at times.
One case where Hummel says "Algorithm: None," and Agarwal shows three pages of
pseudocode happens to be the undocumented LOADALL instruction. While both
books cover the form of this instruction found on the 80286, neither has it
for the 80386. For that, you need to consult an amazing article, "The LOADALL
Instruction," by Robert Collins (Tech Specialist, October 1991).
Besides the material on chip bugs and incompatibilities, other good sections
of Hummel's book include discussions of the two different ways of doing
floating-point emulation, and a chapter on mixing 16-bit and 32-bit code. I
still think Agarwal's is the best book on this subject, but hard-core
Intelophiles will definitely find Hummel's book, with its discussion of chip
bugs, a worthwhile addition to their bookshelf.


Windows Devices


One of the great, less-explored areas of Windows programming is the Microsoft
Windows Device Driver Kit (DDK). The DDK should interest a much larger group
of programmers than those actually responsible for writing Windows device
drivers. When Windows programmers think they've stumbled across something
that's undocumented, it often turns out they've simply come up against
something that's in the DDK. The DDK contains some of the most interesting and
powerful things in Windows. In addition to manuals and tools the DDK contains
source code for most of the 16-bit and 32-bit device drivers in Windows: an
amazing resource!
But those DDK manuals are really bad! (And at around $350 for the DDK, they're
not cheap either.) The DDK manuals contain such pearls of technical writing as
"Exit: Any allocated stuff associated with the application is freed." The term
"termination" is never used when the words "nuke" and "crash" can be used
instead. In other words, the programmers wrote this stuff. There's great
information here, but it's so badly presented it almost makes you feel respect
and gratitude for the Windows Software Development Kit (SDK) manuals. Well,
almost.
The DDK manuals cry out for someone to present this information in a more
mature (and cheaper!) form. Dan Norton's new book, Writing Windows Device
Drivers, takes a first step toward this goal. I wish he had departed more from
the DDK style (for example, like the DDK, he presents C prototypes for
functions currently callable only from assembler), but it is good just having
some of this information available in a more palatable form.
I did have one big problem with this book: it does not present a single sample
program. A coupon in the back of the book can be used to order a disk of
sample programs, and the DDK itself comes with tons of sample code, but
Norton's book really needed to walk the reader through the process of creating
both Windows device drivers and 32-bit virtual device drivers (VxDs). Norton
seems to have thought too much of the reader who already owns the DDK, and not
enough of the numerous Windows programmers who don't, and who will buy his
book just to find out a little bit about what goes on in the netherworld of
Windows.
Certainly, any programmer interested in learning more about what makes Windows
tick ought to get Norton's book. Even with the problems I've cited, there's
lots of good stuff here. For example, an appendix on "Device Driver Support
Functions" presents a number of useful undocumented Windows API calls (such as
the Get/SetSelectorBase/Limit functions) that can be called from both device
drivers and normal Windows applications. Another section describes the useful
undocumented system-timer calls in SYSTEM.DRV.








April, 1992
OF INTEREST





Version 3.0 of HyBase, Answer Software's DBMS product for the Macintosh, is
now available. HyBase combines relational and object-oriented features with
knowledge-based rules and a client/server architecture. You can build classic
data tables using relational techniques, then extend the tables by creating
object-oriented data types. Applications can be extended modularly, using
virtual objects. Object methods can be written in C, Pascal, or HyBase's
userextensible programming language.
HyBase runs under both Systems 6 and 7, and a client API, report writer, and
developer's toolkit are available for C or Pascal.
Single-user versions cost $450; network versions, $2000. Reader service no.
20.
Answer Software Corp. 20045 Stevens Creek Blvd. Cupertino, CA 95014
408-253-7515
Tempus Software has announced two development tools for Actor 4.0: Project
Browser and the Enhanced Development Environment (EDEN). The tools allow you
to organize and manage code developed for multiple projects using a multilevel
directory structure. They separate each project's code and resources,
increasing control and protection of the work. The hierarchical organization
gives all projects access to common code, eliminating the need to directly
modify the original Actor files.
EDEN provides access to source files in the hierarchical directory structure
and the ability to distribute class source code across the different levels.
Time stamping, dependent source file tracking, notification and loading of
changed files, and automatic load file generation are also included.
ProjectBrowser gives you control over the entire development directory
structure. It is a stand-alone Windows application that lets you peruse and
maintain project directories and source, image, and executable files. Thus,
source file editing, image creation, and resource compiling are all possible,
as is the capability to search files in any combination of directories for
text or methods.
Actor 3.1 and 3.2 require EDEN3; 4.0 requires EDEN4. Both cost $95.
ProjectBrowser works with all versions of Actor and is included with EDEN3 and
4 in the developer's package, which costs $295. Reader service no. 27.
Tempus Software P.O. Box 8750 Incline Village, NV 89450 702-831-8204
Version 3 of T-Base, a picture and document imaging library, is shipping from
Videotex Systems. T-Base lets you add pictures and document images to database
management applications written in C, C++, and Xbase. It supports the PCX file
format. New features include graphics commands; automatic image scaling; image
scaling on-the-fly; support for 1024x768, 256-color monitors; and automatic
color correction for multiple VGA images with different palettes.
T-Base is hardware independent: It automatically detects your hardware
configuration and adjusts itself to it. Additional features include support
for Super-VGA, VGA, EGA, CGA, and monochrome display; display of one or
multiple images in any location with or without existing text; support for
Laser-Jet printing; and network compatibility.
T-Base comes bundled with Chroma-Tools, an advanced color manipulation and
image conversion utility for converting images in a variety of file formats
into a single format.
T-Base costs $495. Reader service no. 25.
Videotex Systems Inc. 8499 Greenville Ave., Suite 205 Dallas, TX 75231
800-888-4336 or 214-343-4500
An embeddable version of Graph-in-the-Box Executive, New England Software's
presentation graphics program, is now available. Executive receives data from
host programs and then produces graphs, charts, and text and organizational
diagrams. It can also analyze the data.
There are two methods of embedding Executive: Use either the GBXINT utility or
an interrupt call. In the former, you add GBXINT to an application (with only
one line of code). When a graph function is needed, the utility exits the host
program, retrieves Executive from RAM, performs the function as a DOS batch
file, and returns to the program. Alternatively, prewritten interrupt calls in
C, Basic, and Pascal can be added directly to the host program, but the
interrupt call must be added to the program at each point a Graph-in-the-Box
function is needed (50 extra lines of code).
Executive can output DIF, CGM, HPGL, EPS, and PIC files. With the runtime
version you can change the look and feel to match the host.
The suggested retail price is $299.95. Runtime version prices start at $5000.
Reader service no. 28.
New England Software Greenwich Office Park 3 Greenwich, CT 06831 203-625-0062
PSW/Power Software has released the Loose Data Binder (LDB). The LDB is a C++
persistent container class library. It has a stack-queue-dequeue-list-array
interface and built-in sort, search, and iterate functions, giving it full
container capabilities in one flat class. An LDB can be saved on a stream for
later reloading while multiple references to its elements are automatically
resolved. This allows for any type of complex network of containers an
application may require. A streamable class is provided for packaging data in
a persistent wrapping.
The Loose Data Binder retails for $60. Source code is included. Reader service
no. 30.
PSW/Power SoftWare P.O. Box 10072 McLean, VA 22102-8072 703-759-3838
Now shipping from Microway is EZ-WIN32, a 32-bit Windows Extender. Microway
has extended its protectedmode 386 compilers for Windows, allowing you to
recompile your 32-bit applications and run them using Windows' 386 enhanced
mode.
EZWIN32 implements character-based I/O, making it possible to create and run
32-bit, character-based apps in a window. EZWIN32 supports features of DOS and
UNIX not available in Windows, such as STDIO, batch files, and file
redirection. All are built into EZWIN32's command interpreter and are part of
the NDP Windows compilers.
NDP Fortran, C, C++, and Pascal compilers for Windows cost $595 each; upgrades
are $395. Reader service no 29.
MicroWay P.O. Box 79 Kingston, MA 02364 508-746-7341
THE Audio Solution has announced Midpak 1.0, a set of DOS-based MIDI music
drivers that provide MIDI output on all Roland systems and Roland emulation
for the Adlib, SoundBlaster, and ProAudio sound boards.
Midpak incorporates the MIDI drivers developed by Miles Design and used by
most DOS game publishers. Midpak provides a simple API that allows MIDI music
playback from either the DOS command line or from within your own software
using a single procedure call.
Midpak costs $149.95. Includes drivers, custom instrument file, DOS utilities,
sample MIDI files, and source code examples. Reader service no. 24.
THE Audio Solution P.O. Box 11688 Clayton, MO 63105 314-567-0267
Debug*2000 is Computer Innovations' full-screen, multiwindow, source-level
debugger for C, C++, and assembly language programming for UNIX SVR4 on the
386 and 486. It works with all SVR4 standard-conformant tools.
Debug*2000 has permanent and pop-up windows, menus, lists, and intuitive
keystrokes. It lets you view programs at both high and low levels. In addition
to source-level debugging, Debug*2000 has machine disassembly, register,
stack, and data-dump windows. It provides hardware breakpointing, handles
signals, reads core dumps, and handles normal debugging tasks.
Computer Innovations C++ (version 2.1) for UNIX System SVR4 is also available.
It includes a complete C++ implementation of the language and associated class
libraries for streams and complex arithmetic. An advanced C++ header-file
generator for converting existing C header files to C++ is included.
The single-license price for DEBUG* 2000 or C++, Version 2.1 is $595. Reader
service no. 21.
Computer Innovations Inc. 980 Shrewsbury Ave. Tinton Falls, NJ 07724
800-922-0169 or 908-542-5920
Voyetra Technologies is shipping the Sound Factory PC Sound Developer's Kit
(SDK), which lets you add digital sound and synthesized music to Windows or
DOS applications using Voyetra's device-independent APIs. The APIs require
minimal code changes and provide control over MIDI, FM sound cards, digital
audio, and CD audio. The Sound Factory SDK includes utilities for playing MIDI
and digital audio from the DOS command line, as well as WinDAT and DosDAT,
utilities for creating digital audio files in .WAV or VOC formats in Windows
and DOS, respectively.
Sound Factory includes function call descriptions and program examples for
Voyetra's Multimedia Player (VMP) and for its low-level APIs. The VMP plays
MIDI files that conform to the standard MIDI file format and are created using
a MIDI sequencing program. The low-level APIs provide device independence and
control over the sound hardware. Windows DLLs are also available for use with
those programs that can call them.
The Sound Factory SDK costs $299.95; evaluation kits are $24.95. Reader
service no. 25.
Voyetra Technologies 333 Fifth Avenue Pelham, NY 10803 800-233-9377 or
914-738-4500


Errata


In the February "Of Interest" column, we printed an incorrect phone number for
Rogue Wave Software. The correct number is 503-754-2311, and DDJ apologizes
for the error.









April, 1992
SWAINE'S FLAMES


New Math




Michael Swaine


My Cousin Corbett dropped by yesterday full of ideas about improving computer
math. Primary elections around the country had him thinking about the
limitations of the binary two-party system, he told me. "Binary math only
seems natural in machines based on the movement of electrons," he explained,
"but in a quark-based computer, ternary math would be natural."
Ignoring my objections about silicon design problems, he also brushed off the
purely numeric issues. "We could rewrite all the math software there is so
that it used trits, or twits, or whatever ternary digits are called. The real
problem comes in the logical interpretation of that third twit."
"Doesn't modal logic attempt to solve that problem?" I asked.
"Too presumptive," he replied. "It's not clear yet whether twit three should
be regarded as a positive vote for a third party candidate or as a protest
vote against all the incumbents."
He was more clear about the virtues of randomness in software design. "The
neural net people have something called 'simulated annealing,' based on the
familiar principle that any complicated system needs a whack on the side now
and then. All kinds of algorithms seem to work better if you introduce a
stochastic element, and there's some guy in Georgia who is getting incredible
image-compression ratios from fractal algorithms. And you know fractals are
close kin to chaos theory."
"Annealing is a technique from metallurgy," I recalled, "involving the heating
of metals.
"Heat and randomness are the same thing, of course," he said, adding that he
had been looking into the possibility of introducing a kind of beneficent
global warming into software generally, through some sort of cybernetic ozone
hole. An approach he referred to as Ozoneless Obfuscation of Program Structure
would use black-box software components written by strangers to introduce
subtle unpredictabilities into programs.
"This is all pretty theoretical," I said. "Are you working on anything more
practical?"
"Of course," he answered. "What issue could be more practical than numeric
precision? And yet we still don't have systems that deal appropriately with
precision concerns."
"I disagree. You can typically specify the precision with which you want to
work."
"Ah," he responded, "but no numeric representation or math package lets you
attach a degree of importance to precision. Here's what I mean: Every digit of
a ten-digit number gets treated with equal importance. But are they equally
important? No; for any practical numerical purpose, rounding errors get less
important as you move to the right in the number. You can see that the tools
we have don't address this need if you consider the transmission of numeric
data. Although an error in bit position n of a number is more serious than an
error in bit position n+1, no communication package provides error correction
that treats the data according to this differential importance."
I objected that he was talking semantics. Ultimately, I argued, importance is
a matter of the interpretation the user puts on the data, something you
couldn't possibly build into the math routines.
To my surprise, he agreed. "The problem is real enough, but solving it does
require some user involvement, to provide that semantic component you referred
to."
"It may be a practical problem, but it doesn't sound like you have a practical
solution."
"Oh, but I do. I've designed a new computer that lets the user supply that
semantic component. I call it the Analogical Engine, and it's intended for
engineers, architects, general contractors, and anyone who needs to work with
numbers that represent real measurements."
The machine, he explained, is small, cheap, and energy efficient, and operates
on principles quite different from today's conventional von Neumann
architecture. Even the I/O is distinctive: Measurements are entered by
manually positioning sliding bars, allowing the user to control the precision
of the input. Results are read off the same sliding bars, and the precision of
the output is a direct function of the eyesight of the user. The interface is
familiar to anyone who has read a ruler.
The markings on the sliding bars, though, are not spaced linearly as on a
ruler, but scaled to logarithmic and cosine and other useful functions. This
has the interesting feature of washing out some computational differences. For
example, squaring a number takes no longer than adding two numbers. But you do
get differences where you ought to get them: computing two digits of precision
takes less time and effort than computing four.
Corbett is currently designing a clip-on holder for the machine so that it can
be worn on the belt or in a shirt pocket.































May, 1992
May, 1992
EDITORIAL


NOSE-TO-NOSE, MANO-A-MANO




Jonathan Erickson


Maybe it's time for computer industry movers and shakers to step back and see
what they can learn from other industries the airline business, for instance.
There are a lot of practices airlines adhere to that no enterprise should
emulate. so let's narrow it down to. say, handling legal disputes.
A case in point is the recent squabble between Southwest Airlines of Dallas
and Stevens Aviation of Greenville. South Carolina over the use of an
advertising slogan. When I first saw Southwest's motto "Just Plane Smart" a
few months ago, I thought it was pretty clever. Lawyers and copywriters at
Stevens must have agreed, because their company has been using the
catch-phrase "Plane Smart" for quite a while.
Instead of slugging it out in court for a few years, the two companies took a
logical and refreshing step they decided to arm wrestle for the rights to the
slogan.
The rules were simple: a best-of-three contest between the chairmen of both
companies, winner take all. After two rounds (both companies brought in
ringers, including a former Texas arm- wrestling champion), the match was tied
when Southwest's chairman. Herb Kelleher, 61, grasped the mitt of Stevens
chairman Kurt Herwald, 38, who happens to lift weights as a hobby. Herwald
easily pinned Kelleher (who was jokingly carried away on a stretcher), while
Stevens walked away with the rights to the slogan. In a Texas-size gesture,
Herwald granted Southwest the rights to continue using "Just Plane Smart."
Both companies won. They got tons of good publicity, had some fun, and got
back to the business they know best-running airlines. (The next chance I get.
I'll fly Southwest.)
Let's now suppose that computer industry disputes could be handled in the same
fashion and, for the sake of example, a good place to start might be the beef
between Apple and Microsoft. At the core of this long-running proceeding is
Apple's claim that Microsoft Windows infringes on Apple copyrights. (Apple
recently amended its claim and is now asking for $5.55 billion: $3.12 billion
for supposedly lost profits resulting from reduced sales and lower selling
prices of Apple products, and $2.43 billion because Windows-based products
such as Excel for Windows and Word for Windows have cut into Apple sales.)
I can see it now. John Sculley-nose-to-nose, cheek-to-cheek, mano-a-mano--with
Bill Gates, winner take all. They could meet halfway, perhaps on stage at the
Shakespeare festival in Medford, Oregon, and they could bring their seconds,
maybe Steve Ballmer from Microsoft and Debi Coleman from Apple. (My money's on
Coleman.) The rules would be the same as with Southwest vs. Stevens but
with--you guessed it--handshaking protocol.


The Handwriting on the Wall (Well okay, on page 96)


In conjunction with Ron Avitzur's article on handprinting recognition last
month, we announced the upcoming DDJ Recognition Contest. Details of the
contest have been finalized and are described on page 96 of this issue. We're
looking forward to your participation, and as we said last month, its time to
start your recognition engines.


Paying for the Free Press with a Rubber Check


The same bunch that made it politically correct to write bad checks has
another rubber ace up its collective sleeve, one that might eventually make it
possible for them to hush up scandals like that currently bouncing around
Washington.
House Resolution 2407, a bill designed to guard federally funded animal
research, is the wolf in sheep's clothing. This measure would prohibit anyone
from giving research materials to unauthorized persons. More frightening, HR
2407 would establish criminal penalties against the press for publishing said
information. If, for example. a whistle-blower leaked information to a
newspaper, the newspaper and reporter could be prosecuted for publishing the
story.
Legitimate security concerns aside, HR 2407 would set unheard of precedents.
If laws like this currently existed, for instance, we'd never have known of
House members' right to bounce checks. Of course, that would've been fine with
those members who didn't suffer from writer's block.




























May, 1992
LETTERS


String Class Proposition


Dear DDJ,
It was with some consternation that I read Mr. Schmalzl's Letter in the
February 1992 DDJ While I understand that hundreds of thousands (if not
millions) of C programmers have used character arrays whose terminal element
is a zero value to represent a string, one must admit the shortcomings of that
method. For example, the strcpy function makes assumptions about its target
representing at least as many characters as its source. Even the most diligent
C programmer occasionally violates that stricture--sometimes with disastrous
results. Many other C string-manipulation functions rely on the same
strictures, and such a situation is obviously intolerable in a strongly typed
language such as C++. In addition. as any communications programmer knows, the
value zero does show up in a string occasionally, requiring the offended
programmer to reimplement some of the standard functions, but with slightly
different semantics and arguments.
A proper standardized C++ string class will go far to alleviate the problem.
It will provide (as does the current set of C library functions) a
standardized means of data handling.
Steve Teale's String class is a good beginning. Steve, however, should
consider the implementation of the copy descriptor, which is a string
descriptor which describes a portion or all of the storage owned by another
string descriptor. The concept is implemented in hardware on the old Burroughs
Medium Systems, on software via the SUBSTR pseudovariable in the IBM S/370
PL/I Optimizing Compiler, and as a combination of the two in the DIGITAL VAX
VMS system.
Steve, in concert with the ANSI standardization committee, should also
consider the problems associated with implementation of a class structure
which mimics a native computer construct such as an int. For example, the
aforementioned PL/I compiler tracks at compile time the life span of
compiler-generated temporary strings and destroys them when no longer needed.
In the current generation of procedural extensible languages (both C++ and
ADA), this tracking must be implemented imperfectly) by the programmer at run
time for nonnative constructs. In my personal version of the STRING class, the
code necessary for implementation of the TEMPORARY flag (to recover storage
used by intermediate STRINGS during infix operations) is approximately
one-sixteenth of the code in that class. The alternative is to either prohibit
complex expressions involving non-assignment infix operators, or overload the
NEW operator in some ungainly way. This problem was extant in SIMULA 67, the
classic ancestor of C++, and needs to be solved in an elegant fashion by the
standardizers of C++ in order for the language to be regarded as truly
extensible.
Doug Campbell
Culver City, California


Patent Polemics


Dear DDJ,
In answer to my letter in defense of software patents ("Letters," November
1991) you printed two letters opposing my views (see "Letters," January 1992).
Roger Schlafly correctly pointed out the fact that I had erred when I stated
that the Constitution guarantees inventors the right to patent their
inventions. He is correct that it only authorizes the Congress to set up a
patent system. However, the Supreme Court did rule that inventors could patent
software and they have been doing so for ten years, so I would assert that
there seems to be some legal basis for the patenting of software.
Mr. Schlafly stated that "software patents were formerly disallowed because
algorithms were thought to be in the realm of abstract ideas--not because of a
lack of utility." I believe that the understanding of the word algorithm may
be the problem here. It could be that programmers understand the word in a
broader sense than they should. I do not know. I do believe, however, that it
would be at least as useful for your magazine to publish an article by a
competent legal authority on this subject as it was to publish the argument in
opposition.
Mr. Schlafly asks "does anyone seriously think any patents have promoted
progress?" Though he is referring only to software patents, the question as
stated is appropriate. The countries which have had the best patent systems
have been the ones with the most dynamic economies. The United States has long
been the envy of the world in this regard and has afforded the lone inventor
the greatest protection. The Soviet Union granted no property rights to
inventors. Yes, I do believe that patents have promoted progress. And those
for software will continue to do so.
I stated in my previous letter that I had invented a more efficient method of
displaying text on a computer screen. Mr. Gallagher shows through his derisive
attack upon me that not only could he not see the problem and its solution,
but when a problem was suggested he failed to even imagine what it might be.
The patent he would deny me might help to guarantee compensation for the
efforts I have made in the past to develop a mind which is able to invent and
the effort needed to develop this particular invention. Is my invention of
value? Only time and great effort will tell. The protection of a patent will
surely encourage my efforts.
Howard R. Davis III
Atlanta, Georgia
Dear DDJ,
The issue of software patents is a simmering problem which I fear may
ultimately kill what remains of the U.S. microcomputer industry. For the past
two decades, various bits and pieces of the industry have slowly vanished,
being taken over by Japanese and other Asian companies. Now all that remains
of the industry are microprocessors and software, and microprocessors depend
on the large installed base of software. Oh sure, there are a few
semiconductor manufacturers hanging in there, and there are a couple of hard
disk manufacturers, but they don't represent very large shares of their
respective markets.
What I fear is that five to ten years from now, large U.S. software publishers
(who will probably hold most of the software patents) struggling against
increasing competition will resort to patent infringement lawsuits as a means
of raising revenue. (Just look at Apple!) These lawsuits will virtually kill
software innovation in this country. (Can you imagine what Lotus or Microsoft
products would be like today had there not been competition from upstart
companies like Borland?) This will allow foreign software publishers to become
the source of software innovatuon. Once that happens, it won't be long before
foreign companies can take control of microprocessors as well. And that will
be the end of the U.S. microcomputer industry.
Dave Eriquat
San Francisco, California
Dear DDJ,
I read with great interest your November 1990 article, "Software Patents," and
the ensuing letters it generated. While I agree with The League for
Programming Freedom that the granting of absurd patents hurts software
innovation, there still is room--nay, a requirement--for patents and
copyrights.
Part of the problem is that patent law can't keep up with the rapid advances
in technology. Jerry Saunders, chairman of Advanced Micro Devices, was quoted
as saying that, "If cars evolved at the rate of semiconductors, we would all
be driving Rolls-Royces today that go a million miles an hour and cost 25
cents."
The problem with patents is that they last for 17 years. In "slower"
industries, this allows a company to recoup its costs of research and
development, and to make a profit. Upon expiration, the patented item is still
likely to be useful to competitors. Does a 17-year patent, however, make sense
for an industry that can make itself obsolete in fewer than five years?
Instead of saying no to patents, perhaps for computer hardware and software,
we should propose a reduction in the patent's term.
Another part of the problem is our confusion as to what should be patented and
what should be copyrighted. While it's accepted that underlying source code
can, and should, be copyrighted, lawsuits by Apple and Lotus ask if "look and
feel" can be protected by copyright law. Opponents provide analogies comparing
software to books, or to the layout of an automobile's instrument panel,
arguing that the ubiquity of these interfaces make them property of no one.
Perhaps.
I don't think we should copyright "look and feel." I do, however, think we
should patent "look and feel" under a reduced term. Copyright law protects
expression--complete works--something with a beginning, middle, and end; a
story rather than a sentence, a dance number rather than a pirouette, a
spreadsheet rather than an interface. When determining whether to patent
software or to copyright it, we should ask ourselves if the code by itself
does anything. For example, a sorting algorithm by itself does and expresses
nothing in the sense of a complete work and should not be considered for a
copyright. A patent would be more appropriate. Similarly, a user interface by
itself does nothing. In contrast, a user interface endowed with sorting
capabilities can do something, and marks the beginnings of expression which
can be copyrighted.
It was unfortunate that silly patents were approved, and I agree that no more
patents should be granted until competent lawyers and programmers are involved
in the process, and until the patent law is amended to accommodate the rapid
change in technology.
Michael Yam
New York, New York


A Little History


Dear DDJ,
I take exception to Tim Cooper's statement in "Letters," February 1992 that
"MS-DOS ... is the Fortran of operating systems."
I am no fan of DOS, and putting Fortran into the same class as DOS shows a
lack of computer history. Whereas when DOS came out it added nothing to the
world of operating systems, when Fortran arrived on the scene it brought
incredible advances to programming. Among these was the development of the
object library. There was no longer a need to recompile every module, which is
what C still does with its insipid include statement. Granted, some Fortran
compilers have this function too, but a good working library does not change
enough to warrant recompilation for every program. Nor would one want to
recompile, considering the size of some libraries. And I know that object
libraries are machine dependent, but hardly any code is fully transportable
from one compiler to the next. Each manufacturer has their own little
extension to the language that makes it a headache to transport code.
DOS exists only because the original PCs lacked the power, RAM, and hardware
necessary to have an advanced operating system. Fortran exists because it made
programming simpler. DOS will eventually die out, but with the new Fortran
standard definition it is just possible we will see a resurgence in Fortran,
as old code is brought up to today's more powerful processors. Fortran is no
longer the spaghetti code you were taught in school.
Which brings me to a request: Could you enlighten your readers to the new
Fortran standard definition? I think most will be quite surprised.
Denys Tull
Cincinnati, Ohio
Dear DDJ,
In the March 1992 "Programmer's Bookshelf" column, Ray Duncan covers a number
of books that detail the history of the personal computer industry. Computer
history being an interest of mine, I have a few more books to add to the list:
The Devouring Fungus, by Karla Jennings (W.W. Norton, 1990), is a humorous
look at computers and the people involved with them. It has many tales and
stories from the early computer days up to the present--a few classic stories
and a few new ones. The book can be found in the Humor section.
Fumbling the Future: How Xerox invented, then ignored, the first personal
computer, by Douglas Smith and Robert Alexander (Quill, 1988). The title
pretty much covers the contents of the book. A good portion of the book
details the business aspects of the Xerox Alto, but there is an interesting
section covering the technical aspects of the development of the Alto. This
title is found in the Business section. As an aside, the Smithsonian has an
Alto on display in the Information Age exhibit.
The Computer Entrepreneurs: Who's making it big and how in America's upstart
industry, by Levering, Katz, and Moskowitz (New American Library, 1984). Short
articles on over 60 people involved in the personal computer industry. Covers
persons from famous companies like Atari, Apple, and Kaypro, and from
not-so-famous companies like Human Edge Software and Wicat Systems. Included
many of the early companies and personalities.
Timothy Swenson

Alexandria, Virginia


Cobol Lives


Dear DDJ,
I take exception to Michael Swaine's referral to Cobol as a dead language
("Programming Paradigms," October 1991). I had programmed in Fortran, Basic,
and dBase before learning Cobol two years ago. I have been reviewing C++ more
recently. Of these languages, I prefer Cobol for the following reasons:
Well-written Cobol is easily readable and allows assisted review by
nonprogrammers who are expert in what the program is trying to accomplish.
By requiring predefinition of all variables, unwanted variables created
on-the-fly through misspellings need not be a problem. This is a problem with
Fortran, Basic, and dBase.
Through the use of COPY files, Cobol lends itself well to the use of data
dictionaries common to all files.
Modules written in other languages can be linked to Cobol programs and called
as subroutines. Thus, the best language for a given task can be used with the
Cobol program doing the overall coordination.
Cobol's lack of ability to address system resources directly serves as a
safety feature in complex safety-critical applications; one does not have to
worry about a Cobol program interfering with other software.
I think the first reason is probably the best for using Cobol. Of course, to
accomplish this goal, the programmer must have good facility with the English
language. Interestingly, the decline in the popularity of Cobol has paralleled
the general decline in American literacy.
I do use other software tools when I program in Cobol. But this reflects a
valid notion that one uses the best tool for the task at hand. Because I use
these tools with Cobol does not mean that I can use them in place of Cobol.
As a physician, I can assure you that brain death means lack of brain
activity. As a highly active systems analyst and programmer. I can assure you
that there is plenty of Cobol activity, at least in our institution. Thus,
reports of Cobol's demise are premature.
Stephen J. Levine, M.D.
Oklahoma City, Oklahoma












































May, 1992
UNTANGLING PUBLIC-KEY CRYPTOGRAPHY


The key to secure communications




Bruce Schneier


Bruce has an MS in computer science and has worked in cryptography and data
security for a number of public and private concerns. He can be reached at
Counterpane Systems, 730 Fair Oaks Ave., Oak Park, IL 60302.


Complex systems have been used throughout history to protect secret messages
from prying eyes. From Roman times to today, these systems have been based on
some sort of cryptographic algorithm and a key. People with the key can use
the algorithm to encrypt messages into some unintelligible garble, then to
decrypt that garble back into the message. People without the key can only
read garble. The sophistication of the algorithm has increased over the years,
particularly with the invention of computers, but the basic idea remains
unchanged. The algorithm is like a safe. Someone opens the safe with a key,
puts a message in, and slams the door shut. Only someone else with a key can
open the safe and read the message.
Whit Diffie and Martin Hellman changed all this in 1976 in a paper entitled
"New Directions in Cryptography" where they described Public-Key Cryptography
(PKC). Instead of one key, PKC has two keys, one public and the other private.
Moreover, it is impossible to deduce the private key from the public key. A
person with the public key can encrypt a message but not decrypt it--only
someone with the private key can decrypt the message. It's as if someone
welded a mail slot onto the cryptographic safe. Anyone can slip messages into
the slot, but only someone with the private key can open the safe and read the
messages.


Public-Key Cryptography


PKC algorithms are complicated protocols, not ideally suited to encrypt long
messages. A common implementation is to use PKC to transfer the key for
another cryptographic algorithm and then use that algorithm to encrypt and
decrypt messages. The Digital Encryption Standard (DES) algorithm is ideal for
this sort of application. For example, if Alice and Bob want to exchange data
securely, they first agree to set up a DES system, Alice then generates a
random DES key, encrypts it using Bob's public key, and sends it to him. Bob
could send her his public key directly, or if this were a large network his
key might be posted on some central bulletin board. Bob then decrypts Alice's
message using his private key, and then both of them encrypt their
communications using DES with the same key.
How does PKC address the problem of key distribution and key management? Well,
if Alice and Bob want to set up a secure communications channel using DES,
they both need the same key. Alice could choose one at random, but she still
has to get it to Bob. She could hand it to him sometime beforehand, but that
requires foresight. She could send it to him via registered mail, but that
takes time and is no real guarantee of security. With PKC, there is no
problem. Without prior arrangements, they can both have the same DES key, and
no adversary listening in on the communications channel has anything except a
public key, an encrypted DES key, and a day's worth of DES-encrypted traffic.


Protocols and Applications


PKC has implications far beyond simple data encryption. It allows people to do
things securely over computer networks that are impossible any other way. In
this section, I'll discuss applications such as password protection, digital
signatures, and simultaneous contract signing. Other applications might
include fair coin tosses, mental poker, and bit commitment.
Password Protection. Conventional password protection schemes, where the host
computer stores the password in encrypted form, have serious security
problems. For one, when the user types his password into the system, anyone
who has access to his data path can read it. He might be accessing his
computer through a convoluted transmission path that passes through four
industrial competitors, three foreign countries, and two forward-thinking
universities, any one of which can look at his login sequence as it passes
through its machine. Two, anyone with access to the processor memory of the
system can see the password before the system encrypts and compare it with the
encrypted password in the password file.
PKC solves the problem by allowing the host computer to keep a file of every
user's public key; each user keeps his own private key. This private key is
both long and nonmnemonic, and will probably be processed automatically by the
user's hardware or communications software. This requires an intelligent and
"trusted" terminal, but neither the host nor the communications path needs to
be secure. When logging in, the host sends the user some random string. The
user encrypts the string with his private key and sends it back to the host.
The host then decrypts the message using the user's public key. If the
decrypted string matches what the host sent the user in the first place, the
computer allows the user access to the system. No one else has access to the
user's private key, so no one else can impersonate the user. And more
importantly, the user never sends his private key over the transmission line
to the host. No one listening in on the interaction can get any information
that would enable him to deduce the private key and impersonate the user.
Digital Signatures. One of the properties of PKC is that either key can be
used for encryption. Encrypt a document using your private key, and you have a
secure digital signature. Anyone with the public key can decrypt the document,
so anyone can read it. Only you have access to your private key, so no one
else could have signed it. And finally, any modification to the encrypted
document will produce gibberish when decrypted, so no one can modify the
signed document. In reality, the problem with this protocol is that it will
take a lot of time to generate a PKC digital signature on an entire document.
It is easier to hash the document using a one-way hash function (MD5, as
described in the September 1991 DDJ, for example), producing a small
fingerprint, and then sign the fingerprint with a private key.
Improved Key Exchange. Implementing digital signatures during a DES-key
exchange protocol circumvents a potential security problem. What if an
adversary sits in the middle of the communications channel and sends data to
and receives data from both Alice and Bob? He could pretend to be Alice and
send Bob a different DES key. Bob's public key is public, so he would have no
trouble getting it. Bob, who would be fooled, would complete the protocol and
then encrypt all of his data using this different key and then send it back to
"Alice." The adversary would then be able to read all of the data Bob sent.
And then on the other end, the adversary could send Alice a different public
key in which to encrypt the DES key. Alice, who would also be fooled, would
encrypt the DES key such that the adversary could read it. Now the adversary
could read all of Alice's data as well. If the adversary were fast enough, he
could decrypt Bob's data and then reencrypt it for Alice, and then decrypt
Alice's data and then reencrypt it for Bob. The two of them would have no idea
that someone sitting between them was reading all of their supposedly secure
data.
With digital signatures, a central trusted authority can sign both Alice's and
Bob's public keys. The signed keys would include a signed certification of who
they belonged to. Now both know that the public key they received over the
communications link (or downloaded from a central BBS) actually belongs to the
other person. The DES key exchange can then proceed. Finally, to ensure that
Alice and Bob are not impostors, both Alice and Bob initiate the challenge and
reply protocol in the password protection example. If both protocols are
successfully completed, each knows that the person they are communicating with
is actually the other person.
Fair Coin Tosses. Using PKC, Alice and Bob can flip a coin over some
communications media, even if they don't trust each other; see Figure 1. The
protocol, which assumes that the PKC algorithm commutes, is as follows:
1. Alice and Bob both generate a public/private key pair.
2. Alice generates two messages, one indicating heads and the other indicating
tails. These messages should contain some unique random string, so that she
can verify their authenticity later on in the protocol. Alice encrypts both
messages with her public key and sends them to Bob.
3. Bob, who cannot read either message, chooses one at random. He encrypts it
with his public key and sends it back to Alice.
4. Alice, who can not read the message sent back to her, decrypts it with her
private key and then sends it back to Bob.
5. Bob decrypts the message with his private key to reveal the results of the
coin toss. He sends the decrypted message to Alice.
6. Alice reads the result of the coin toss and verifies that the random string
is correct.
7. Both Alice and Bob reveal the public and private keys so that both can
verify that the other did not cheat.
Figure 1: Fair coin tosses using PKC
This protocol is self-enforcing. Either party can immediately detect cheating
on the part of the other party, and no trusted third-party is required to
participate in either the actual protocol or any adjudication after the
protocol has been completed. To see how this works, let's try to cheat.
If Alice wanted to cheat and force heads, she has three potential ways of
affecting the outcome. One, she could encrypt two "heads" messages in step #2.
Bob would discover this when Alice revealed her key pair at step #7. Two, she
could incorrectly decrypt the message in step #4. However, she could not
figure out how to decrypt the message to force another message, only
gibberish. Bob would discover this in step #5. Three, she could lie about the
validity of the message in step #6. Bob would discover this also in step #7,
when Alice could not prove that the message was not valid. Of course, Alice
could refuse to participate in the protocol at any step, at which point
Alice's attempted deception would be immediately obvious to Bob.
If Bob wanted to cheat and force tails, his options are just as poor. He could
incorrectly encrypt a message at step #3, but Alice would discover this when
she looked at the final message at step #6. He could improperly perform step
#5, but this would also result in gibberish, which Alice would discover at
step #6. He could claim that he could not properly perform step #5 because of
some cheating on the part of Alice, but this form of cheating would be
discovered at step #7. Finally, he could send a tails message to Alice at step
#5 regardless of the message he decrypted, but Alice would immediately be able
to check the message for authenticity at step #6.
Mental Poker. A similar protocol allows Alice and Bob to play poker with each
other. Instead of Alice making and encrypting two messages. one for heads and
one for tails, she makes 52 messages, one for each card in the deck. Bob
chooses five messages at random, encrypts them with his public key, and then
sends them back to Alice. Alice decrypts the messages and sends them back to
Bob, who decrypts them to determine his hand. He then sends five more messages
to Alice, who decrypts them to determine her hand. During the game, additional
cards can be dealt to either player by repeating the same procedure. At the
end of the game. Alice and Bob both reveal their key pairs so that both can be
assured that the other did not cheat.
Bit Commitment. Let's say Alice wants to commit to a prediction, but does not
want to reveal that prediction to Bob until sometime later. Bob, on the other
hand, wants to make sure that Alice cannot change her mind after she has
committed to her prediction. Magicians like to use sealed envelopes handed to
random members of the audience, but PKC can provide a method immune from any
sleight of hand. First, both Alice and Bob each generate some random bit
strings. Bob hands Alice his string. Alice creates a message consisting of her
random string, the bit (or number of bits) she wishes to commit to, and Bob's
random string. She then encrypts it with her public key and sends the result
back to Bob. Bob cannot decrypt the message, so he does not know what the bit
is. If the message did not contain Alice's random string, he would be able to
encrypt all possible messages with Alice's public key and compare them with
what Alice handed him. Alice's secret random string prevents him from using
this attack to determine her bit. When it comes time for Alice to reveal her
bit, she decrypts it using her private key. Bob then ensures himself that the
bit is valid by checking that his random string is accurate. If the message
did not contain Bob's random string, Alice could secretly decrypt the message
she handed Bob with a variety of keys until she found one that gave her a bit
other than the one she committed to. Bob's random string prevents her from
using this trick to change her mind.
Oblivious Transfer. Imagine a situation in which Alice sends Bob two messages.
Bob has a 50 percent chance of receiving either one message or the other (but
not both), and Alice has no way of knowing which message he received. This may
not sound very useful at first glance, but bear with me for a moment. First,
the protocol:
1. Alice generates two public-key key pairs, or four keys in all. She sends
both public keys to Bob.
2. Bob chooses a key in a conventional cryptographic algorithm (DES, for
example). He picks one of Alice's public keys at random and encrypts his DES
key with it. He sends the encrypted key to Alice without telling her which of
her public keys he used to encrypt it.
3. Alice decrypts Bob's key with both of her private keys. In one of the
cases, she uses the correct key and successfully decrypts Bob's DES key. In
the other case, she uses the wrong key and only manages to generate a
meaningless pile of bits that nonetheless looks like a random DES key. She has
no idea which is which.
4. Alice encrypts one message with each of the DES keys she generated in the
previous step (one real and one meaningless) and sends them to Bob.
5. Bob attempts to decrypt both of Alice's messages, but successfully decrypts
only one of them. At this point the oblivious transfer is complete. Bob has
received one of the two messages (the one encrypted in his DES key), and Alice
has no way of knowing which.
6. After the protocol is complete and the results of the transfer can be made
public, Alice must give Bob her private keys so that he can verify that she
did not cheat. After all, she could have encrypted the same message with both
keys in step #4.
The protocol is secure against an attack by Alice because she has no way of
knowing which of the two DES keys is the real one. It is secure against an
attack by Bob because there is no way he can get Alice's private keys to
determine the DES key with which the other message was encrypted. This may
still seem like nothing more than a more complicated way to flip coins over a
modem, but it has some far reaching implications when used in more complicated
protocols.
Simultaneous Contract Signing. Alice and Bob want to enter into a contract.
They've agreed on the wording, but neither wishes to sign without making sure
the other signs as well. This would be no problem face to face, but doing the
same thing over a communications channel requires an intricate protocol:

1. Alice and Bob both randomly select 100 pairs of DES keys. There is nothing
special about the pairs; they are just grouped in sets of two for the
protocol.
2. Alice and Bob both generate a pair of messages. "This is the left half of
my signature" and "This is the right half of my signature," for example. The
messages will probably also include a digital signature of the contract, as
defined previously, and a time stamp. The contract is considered signed if the
other party can produce both halves of this signature pair.
3. Alice and Bob both encrypt their message pairs in each of the DES key
pairs, the left message with the left key in the pair and the right message
with the right key in the pair.
4. Alice and Bob send each other their pile of 200 encrypted messages, making
sure the other knows which messages are which halves of which pairs.
5. Alice and Bob send each other every key pair using the oblivious transfer
protocol. That is, Alice sends Bob either the left key or the right key of
each of the 100 pairs, and Bob does the same. Now both Alice and Bob have the
encrypted half of each signature pair, but neither he nor she knows which
halves the other one has.
6. Alice and Bob both decrypt the halves they can, and make sure that the
decrypted messages are valid.
7. Alice and Bob each send each other the first bits of all 200 DES keys.
8. Alice and Bob repeat step #7 for the second bits of all 200 DES keys, then
for the third bits, and so on until all the bits of all the DES keys have been
transferred.
9. Alice and Bob decrypt the remaining halves of the message pairs and the
contract is signed.
Why does all this work? Let's assume Alice wants to cheat and see what
happens. In steps #4 and #5, Alice could disrupt the protocol by sending Bob
nonsense bit strings. Bob would catch this in step #7, when he tried to
decrypt whatever half he received. Bob could then stop safely, because Alice
could not decrypt the encrypted halves that Bob sent her. If Alice were very
clever, she could disrupt only half the protocol. She could send the left half
of each pair correctly, but send a gibberish string for the right half. Bob
has only a 50 percent chance of receiving the right half, so half the time she
could get away with it. However, this only works if there is one key pair. If
there were only two pairs, she could get away with this sort of deception 25
percent of the time. That is why there are 100 key pairs in this protocol.
Alice has to correctly guess the outcome of 100 oblivious transfer protocols.
She only has a 1 in 2100 chance of doing this, so Bob can safely assume that
if he didn't catch her deception in step #7, then there was none.
Alice could also send Bob random bits in step #8. Bob won't know that she is
sending him random bits until he receives the whole key and tries to decrypt
the message halves. But again Bob has probability on his side. He has already
received half of the keys, and Alice does not know which half. Alice is sure
to send him a nonsense bit to a key he has already received, and he will
immediately know that Alice is trying to deceive him.
Maybe Alice will just go along with step #8 until she has enough bits of the
keys to break the DES messages, and then stop transmitting bits. DES has a
56-bit-long key. If she receives 40 of the bits, she only has to try 65,536
keys in order to read the message--certainly within the realm of a computer.
But Bob will have exactly the same number of bits of her keys (or one less bit
at the most), so he can do the same thing. Alice has no real choice but to
continue the protocol.
Certified Mail. The same simultaneous oblivious transfer protocol used for
contract signing could also be used for computer certified mail. Alice sends
Bob the decryption key for some document, which she does not want to release
unless Bob sends her some message indicating receipt. Bob, on the other hand,
does not want to give Alice a receipt without getting the document. Oblivious
transfer can solve this problem without having to resort to a trusted third
party to enforce the protocol.


Algorithms


There are a number of approaches to implementing PKC, some of which I'll
describe in this section. However, I'll play fast and loose with complexity
theory, but only in the interest of comprehensibllity. For those of you who
want the whole story, check the references. For everyone else, if the
newspapers ever report that P = NP, ignore most of this section.
MerkLe-Hellman Knapsacks. The knapsack problem was one of the first proposed
candidates for a public-key algorithm. The problem is simply stated: Given a
list of different weights and the total weight of a closed knapsack, determine
which particular weights are in the knapsack. For example, the list of
different weights might be (9, 13, 15, 16, 18). If the total weight of the
knapsack is 43, then the weights in the knapsack would be (9, 16, 18). In
general, this problem cannot be solved except by brute force analysis.
However, a certain subclass of the problem can be solved easily. Called
"superincreasing knapsacks," they are knapsack problems where the list of
different weights are such that each weight is greater than the sum of all
previous weights: for example (1, 3, 6, 12, 25). Ralph Merkle and Martin
Hellman designed a public-key algorithm around a method of transforming a
superincreasing knapsack problem, which is easy to solve, into a conventional
knapsack problem, which is hard to solve. The public key uses the conventional
knapsack problem, and the private key uses the transformation method. This
protocol has been broken.
The RSA Algorithm. Of all the public-key algorithms proposed over the years,
RSA is by far the easiest to understand and implement and the most popular.
(See the accompanying text box entitled "Public-Key Cryptography Meets the
Real World.") Named after the three inventors, Ron Rivest, Adi Shamir, and
Leonard Adelman, who first introduced the algorithm in 1978, it has since
withstood years of extensive cryptoanalysis. Although the analysis neither
proved or disproved security, it does indicate a confidence level in the
theoretical underpinnings of the algorithm.
RSA gets its security from the difficulty of factoring large numbers. The
public and private keys are functions of a pair of very large (100 to 200
digits or even larger) prime numbers. The algorithm calculates both keys from
the prime numbers, and determining one key from the other is conjectured to be
equivalent to factoring the product of the two primes.
To generate the two keys, choose two large prime numbers, p and q. Compute the
product n=p*q. Then randomly choose the public key, e, such that e has no
factors in common with (p-1)*(q-1). The easiest way to do this is to select
another prime number for e, one larger than either (p-1) or (q-1). Finally,
compute the private key, d, such that e*d=1(mod(p-11)*(q-1)). In other words,
d=e[-1](mod(p-1)*(q-1)). An algorithm for this computation, developed by
Euclid, is given in Figure 2. The numbers e and n are the public key; the
numbers d and n are the private key. The two primes, p and q, are no longer
needed, but should not be revealed.
Figure 2. (a) Algorithm to compute d such that e * d (mod n)=1; (b) sample
run.

 (a)

 inverse (a, n)
 {
 g[0] = n;
 g[1] = a;
 v[0] = 0;
 v[1] = 1;
 i = 1;
 do {
 g[i]+1 = g[i]-1 mod gi;
 v[i]+1 = v[i]-1 - (g[i]-1 div g[i]) * g[i];
 i ++;
 }
 while (g != 0);
 if (v[i]-1 >= 0) return v[i]-1;
 else return v[i]-1 + n;
 }

 (b)

 i g[i] v[i]

 0 3220 0
 1 79
 1 2
 60 -40
 3 19
 41 4
 3 -163
 5 1 1019
 6 0



To encrypt a message m, first divide it into numerical blocks such that each
block has a unique representation modulo n (with binary data, choose the
largest power of 2 less than n). That is, if both p and q are hundred-digit
primes, then n will have about 200 digits, and each message block, rm, should
be 200 digits long. The encrypted message, c, will be made up of similarly
sized message blocks c of about the same length. The encryption formula is
simply c{i} = m{1}[e] (mod n).
To decrypt a message, take each encrypted block c{i} and compute m{i} =
c{i}[d] (mod n). Because cd = (m{i}[e])[d] = m{i}[ed] =
m{i}[(k(p-1)*(q-1)+1)[i]] = m{i}*m{i}[(k(p-1)*(q-1)) = m{i}*1 = m{i}, all (mod
n), the formula recovers the message. The message could just as easily have
been encrypted with d and decrypted with e: the choice is arbitrary. I'll
spare you the number theory as to why this works; most any current text on
cryptography will go into it in detail.
A short example will probably go a long way to making this clearer. If p = 47
and q = 71, then n = p*q=3337. The encryption key e must have no factors in
common with (p-1)*(q-1) = 46*70 = 3220. Choose e (at random) to be 79. In that
case, d = 71[-1] (mod 3220) = 1019. Figure 1 shows how this number was
calculated. Publish e and n, and keep d secret. Discard p and q.
To encrypt the message m = "DRDOBBS" = 6882326879666683, first break it into
small blocks. Three-digit blocks work nicely in this case. The message will be
encrypted in six blocks, m{i}, where, m{1} = 688, m{2} = 232, m{3} = 687, m{4}
= 966, m{5} = 668 and m{6} = 3. The first block is encrypted as 68879 (mod
3337) = 1570 = c{1}. Performing the same operation on the subsequent blocks
generates an encrypted message c = 1570 2756 2714 2276 2423 158.
Decrypting the message requires performing the same exponentiation using the
decryption key of 1019. So, 15701019 (mod 3337) 688 = m{1}. The rest of the
message can be recovered in this manner.
If factoring a 200-digit number takes forever, how much easier can it be to
find 100 digit prime numbers? Not much, if you use factoring methods to find
these primes. However, there are a number of tests that can determine if a
number is prime with a confidence of over 50 percent (possibly more). If a
number n passes two of these tests, then the confidence rises to 75 percent.
The chances of the number failing 10 tests are less than 1 in 1024. Here is
the algorithm with the number of tests set at 100:
1. Choose a random number, n, to test.
2. Make sure that n is not divisible by any small primes. Testing 2, 3, 5, 7,
and 11 will speed up the algorithm significantly.
3. Choose 100 random numbers, a{1}, a{2} ... a{100} from the interval
[1..n-1].
4. Calculate a{i}[(n-1)/2] = 1 (mod n) for all ai = ai..a100.
5. If a{i}[(n-1)/2] = 1 (mod n) for all i, then n is composite.
If a{i}[(n-1)/2]! = 1 or -1 (mod n) for all i, then n is composite.
If a{i}[(n-1)/2] = 1 or -1 (mod n) for all i, then n is prime.
This test will fail to accurately determine if a number is either prime or
composite 1 in 2[100] tries, or about 1 in 10[30]. If for some reason you need
more confidence that the number is prime, choose a larger number of random
numbers to test against. On the other hand, if you consider that the odds of
the number being composite are less than the odds of you getting killed the
next time you drive your car, you might not worry about it so much.
It is conjectured that the security of RSA depends wholly on the problem of
factoring large numbers. Certainly that is the most obvious means of attack.
Any adversary will have the public key, c, and the modulo, n. In order to find
the decryption key, d, he has to factor n. Right now the best factoring
algorithms take on the order of O(e[sqrt(ln n*ln(ln n))]) steps to solve. If n
is a 200-bit number, factoring will take on the order 2.7*10[11] steps; for a
664-bit (200-digit) n, on the order or 1.2*10[23] steps. Assuming a computer
can perform a million steps per second (a generous assumption, considering
some of the steps include long division with these monster numbers), it will
take 3.8* 10[9] years to factor a 664-bit number. If someone discovers a
faster factoring algorithm or if someone finds another way to break RSA, then
the whole scheme will fall apart. However, people have been working on
factoring algorithms since the invention of mathematics, and it is unlikely
that any such algorithms are waiting to be discovered. Even if computing power
increased a million-fold, factoring a 664-bit number will still take almost
four thousand years. If you need more security, increase the length of n by a
couple dozen bits.
El Gamal. A variant of the El Gamal public-key algorithm has been proposed as
a digital signature standard. (See "Public-Key Cryptography Meets the Real
World.") To generate a key pair, first choose a prime p, and q = a prime
divisor of p-1. Compute g = h[(p-1)/q] mod p, where h is any integer 0<h<p
such that h[(p-1)/q] mod p>1. The three numbers, p, q, and g are public, and
can be common to an entire group of users. The private key, x, is a random
integer less than q. The public key, y, is g[x] mod p.
To sign a message m, first generate a random integer, k, less than q. This
integer must be different for each different signature. The digital signature
consists of two numbers: r = (g[k] mod p) mod q, and s = (k[-1](m+xr)) mod q.
In reality, m will more likely be the hash of a much longer message.
To verify a signature, compute v = ((g[(m(s-1 mod q) mod q)]*y[(r(s-1 mod q)
mod 1)]) mod p) mod q. If v = r, then the signature is verified. Enough math
for today; check the references if you need proof that this works.


The Future


PKC implementations are becoming increasingly important in the electronic
world about us. Software implementations of PKC have been adopted by
Microsoft, Lotus, Apple, Novell, and many other companies and it can
efficiently be implemented in some of the newer hardware architectures as
well. For example, a Japanese manufacturer of an encrypting fax machine has
demonstrated an RSA protocol for key exchange. A European company has
developed a smart card that performs RSA encryption by itself, allowing RSA
protocols to be implemented at money machines. A system of verifiable but
untraceable messages has been developed that would allow secret balloting over
the telephone. Another company is working on a digital cash system.
Conventional electronic money will never completely replace cash because both
drug dealers and congressmen have the objection to it: There is always an
audit trail. Using PKC protocols, a system of electronic money can be
implemented that is untraceable until someone tries to cheat. It is currently
in use on a public transportation system in Europe. PKC has the potential of
restoring the individual security and privacy that the electronic age has
taken away.


Bibliography


Denning, D. Cryptography and Data Security. Reading, Mass.: Addison-Wesley,
1982.
Diffie, W., and M. Hellman. "New Directions tn Cryptography." IEEE
Transactions on Information Theory (November, 1976).
El Gamal, T. "A Public Key Cryptosystem and a Signature Scheme Based on
Discrete Logarithm." IEEE Transactions on Information Theory (July, 1985).
Federal Information Processing Standards Publication, August 19, 1991.
"Digital Signature Standard." DRAFT, National Institute of Standards and
Technology.
Federal Information Processing Standards Publication, January 22, 1992,
"Secure Hash Standard." DRAFT, National Institute of Standards and Technology.
Patterson, W. Mathematical Cryptology. Totowa, NJ.: Rowman & Littlefield,
1987.
Rivest, R.L., A. Shamir, and L. Adelman. "A Method for Obtaining Digital
Signatures and Public Key Cryptosystems." Communications of the ACM (February,
1978).
Salomaa, A. Public-key Cryptography. Berlin, Germany: Springer-Verlag, 1990.
DDJ
The State of DES
Ever since the Digital Encryption Standard (DES) was approved by the National
Bureau of Standards, it's been the subject of intense criticism and debate.
Based on an algorithm developed by IBM, DFS was inexpliciably modified by the
National Security Agency in ways that seemed to make it weaker. Did the
National Security Agency (NSA) put a "trap door" into the algorithm, allowing
only themselves to break it? Did they deliberately weaken the algorith so that
only they would have the resources to build a massive parallel machine to
break it? And why was it designed the way it was, anyway? The government
classified IBM's design notes, so no one knows.
DES can be broken by trying all of the 2[56] possible keys. This is a "brute
force" attack; there are only 2[56] keys, so the correct key has to be one of
them. And until last year, that was the best that could be achieved. Using a
technique they developed called "differential cryptanalysis," Eli Biham and
Adi Shamir have now succeeded in breaking certain implementations of DES using
only 2[47] encryption steps. This is a significant blow to the security of
DES, but there are some caveats. This is a chosen plaintext attack. The 2[47]
steps use special predetermined plaintext blocks. If the cryptanalyst cannot
introduce those particular plaintext blocks (that is, he or she has to listen
to both the plaintext messages and the encrypted traffic until those
particular blocks happen to appear), the attack will fail. Also, this is an
attack against the electronic-codebook implementation of DES. Any feedback
schemes will render this attack more complicated than brute force.
So, while Biham and Shamir are making great strides against DES, this does not
mean that all of the DES equipment already fielded is suddenly worthless. I
wouldn't use DES for long-term security (such as diplomatic information that
needs to remain secure for 40 years or more), but for short-term secret data
(like electronic funds transfers), DES is still as secure as it always was.
Whatever that means.

Public-Key Cryptography Meets the Real World
On September 20, 1983, U.S. patent number 4,405,829, titled "Cryptographic
Communication System and Method"--informally known as the RSA algorithm--was
awarded to the Massachusetts Institute of Technology. In 1984, RSA Data
Security Inc. was formed to develop, license, and market the RSA algorithm.
Lotus has since integrated RSA encryption and authentication in Notes, Digital
Equipment Corp. uses RSA a part of their Distributed Systems Security
Architecture, Novell uses RSA authentication as part of Netware, and Apple has
licensed RSA for use in its Open Collaboration Environment (OCE) to provide
both privacy and authentication. And Microsoft, Sun Microsystems, and IBM have
licensed RSA for use in future versions of their operating systems and other
products.
RSA, isn't the only public-key patent that's been awarded. Both Merkle-Hellman
knapsacks (patent number 4,218,582) and an exponentiation public-key algorithm
(4,424,414) are patented until around the turn of the century. The first
public-key patent, which some people claim covers all of public-key
cryptography, (4,200,770) was issued April 29, 1980. These patents, along with
the RSA patent, are controlled by Public Key Partners, a group that includes
RSA Data Security Inc.
These patents don't mean that excellent, although unauthorized, RSA software
packages are not available. In 1991, Phil Zimmerman released a public-domain
program called Pretty Good Privacy (PGP), which includes an RSA
digital-signature scheme. Written for the IBM PC, it has since been ported to
UNIX, VMS, Atari, and Amiga. After legal threats by RSA Data Security,
Zimmerman agreed not to distribute or update the program, although programmers
from other countries have worked on improvements, and a major update of PGP
was released in April from New Zealand, beyond the reach of the RSA patent. It
is available on computer bulletin boards worldwide.
Internet has been working with RSA Data Security on its own version of a
personal public-key security program, called Privacy Enhanced Mail (PEM). This
standard, which will be approved sometime this year, will provide protocols
for RSA encryption and authentication for Internet mail users. RSA will
release a Toolkit for Interoperable Privacy-Enhanced Messaging (TIPEM) to
assist developers in writing applications which implement PEM protocols. TIPEM
is a highly efficient set of routines written in C, and portable to most
platforms. RSA Data Security will also provide a hardware-independent
reference implementation of the PEM protocols, which the company plans to
license free for academic and laboratory use. Distributed products will
require proper licenses.
There are significant differences between PGP and the PEM prototols. PEM, for
example, has centralized key management using a trusted Certification
Authority. This forces all key generation to involve a central point, forces
transference of trust, and is generally tailored for large organizations. PGP
is more of a grass roots program. It has a decentralized, yet secure,
key-management scheme--anyone can generate their own RSA key pair. Each pair
of users communicates regardless of anyone else in the network. PGP is
designed with guerilla-style key management for the masses.
The PEM header files are somewhat worrisome. According to the standard, each
encrypted message has an unencrypted header file which contains quite a lot of
information about the message. The header contains who sent the message, who
the message is for, which encryption algorithm was used, which message-digest
algorithm was used, and when the message was encrypted. This information is
available to anyone listening on the communications channel, allowing for some
pretty lucrative traffic analysis. Even if no one knows what you are saying,
they know who you are talking to, when you are talking to them, and the amount
of things you are saying. PEM actually requires you to sign your messages,
precluding anonymous messages. PGP, on the other hand, keeps as much
information secret as possible, and signatures are optional. The only thing
unencrypted in the header file is the ID of the receiver. If the receiver has
the appropriate private key and decrypts the message, only then does he learn
who sent the message and when or if it was signed. As little information is
sent unencrypted as possible.
Although free, PGP is well-designed, with possibly the most sophisticated key
management features available. PGP supports IDEA file encryptions, the MD5
Message Digest algorithm, and RSA digital signatures. Data compression is
automatically provided before encryption to reduce file lengths and eliminate
redundancy. Complete source code for PGP is available.
Meanwhile, the National Institute of Standards and Technology (NIST), with the
help of the National Security Agency (NSA), has proposed their own public-key
Digital Signature Standard (DSS): an algorithm based on an unpatented variant
of the El Gamal algorithm and a Secure Hash Algorithm (SHA) modeled after the
MD4 Message Digest algorithm. (Various patent-infringement suits have been
threatened by Public-Key Partners, however.) DSS is slower than RSA, slightly
slower generating signatures and significantly slower in verifying signatures.
Still, hardware is getting faster all the time, and precomputation can make
DSS faster than RSA in certain implementations. The standard fixes the key
length at 512 bits, which most cryptographers consider too small for long-term
security. Some cryptographers argued that the use of a common modulus among a
group of users makes for an easier target than the RSA algorithm, where the
modulus is different for each individual user. But it is possible to use an
individual modulus for each user, so this is not a problem. Other
cryptographers, claiming to have found a "trap door," pointed out a tiny
subset of moduli that are easy to break; this can be protected against with
minimal effort. And finally, the standard addresses digital signatures but
makes no mention of encryption. The comment period was supposed to end in
November 1991, but NIST extended it through February. Last December, the NIST
advisory board recommended that DSS as written had grave problems. While
significant, this action was mostly political, as the advisory board consists
primarily of industry representatives (many of whom have already licensed the
RSA algorithm). Anything could happen next.
Information Security Corp. (ISC), of Deerfield. Ill., has already released an
IBM PC program called "Secret Agent" that, uses public-key cryptography.
Secret Agent supports DES file encryption (in Cipher Block Chaining mode), DSS
digital signatures (and the SHA), and El Gamal for key management. Secret
Agent's key management isn't as sophisicated as PGP's, but if you trust NIST's
algorithms, Secret Agent has the advantage of being available today and
perfectly legal.
Next Inc, entered the fray in February 1992 with their own public-key
encryption system. Their "Fast Elliptic Encryption" (FEE) algorithm is an
implementation of a well documented, elliptic-curve public-key algorithm, the
details of which are incredibly complicated. While elliptic curve encryption
appears to be, key bit for key bit, more secure than RSA, there is still some
skepticism among researchers that a mathematical breakthrough will render the
scheme useless. In any case, Next scientists invented a series of mathematical
speedups that make the algorithm fast enough to actually use. Pending NSA
approval, Next plans for FEE to provide both security and authentication in
future versions of the Next operating system. They are also attempting to
patent their mathematical speedups, although potential legal complications
with at least two other elliptic-curve patent applications and a similar
mathematical speedup patent application will make this an interesting
exercise. And there's always the specter of a lawsuit from Public-Key
Partners, based on their claim that patent 4,200,770 covers all of public-key
cryptography. Finally, there are indications from Next that they will allow
others to implement FEE without paying royalties.
This all will become moot on September 20, 2000, when the RSA patent expires
and the algorithm enters the public domain. Unless the Next scheme turns out
to be secure and royalty-free, we will be caught between a federal agency that
no one trusts and a company that many do not like.
--B.S.

































































May, 1992
FLETCHER'S CHECKSUM


Error correction at a fraction of the cost




John Kodis


John has developed process-control systems for the steel industry, spacecraft
telemetry systems for NASA, a real-time distributed database for a
long-distance telephone company, and a GPS navigation system for a marine
electronics firm. He recently completed his MS in computer science and can be
reached through the DDJ office.


As computers have become increasingly ubiquitous, data-communication
applications such as electronic mail, FAX, remote file transfers, and computer
teleconferencing have become commonplace. Noise and bandwidth limitations have
always been and will continue to be a major issue in computer communications,
even as new technologies such as ISDN replace the old-fashioned, noisy,
voice-grade telephone line.
The most common way of dealing with noisy transmission channels is to split
the data into records and send a few extra bytes of checksum information with
each data record. Checksum information is a numeric value which is computed
from the data being transmitted and then used by the receiver to verify
correct transmission. Over the years, a number of methods have been developed
to perform this verification.
The earliest methods involved the transmission of parity bits. An additional
bit was appended to each byte, which caused the byte to contain an even number
of 1 bits. If a byte with an odd number of 1 bits was received, a transmission
error must have occurred. While this method, known as "vertical parity,"
provides some verification, errors can still easily go undetected. For
example, any character received with an even number of bit errors will pass
this simple parity test, rather than being reported as an invalid character.
Another method, known as "horizontal parity," uses a similar technique--and
suffers from similar problems. In horizontal parity, several bytes of data are
grouped into a block. An additional parity byte is then appended to the end of
the block. The value of this parity byte is calculated so that an even number
of 1 bits will have been sent in each bit position of the block. This
calculation can be performed by simply exclusive-ORing together all the bytes
in the block. Here again, two single-bit errors can cancel each other if they
occur in the same bit position in a block.
A more powerful error detection method is the Cyclic Redundancy Check, or CRC.
Based on results from the theory of cyclic groups, CRC schemes are among the
most reliable means of error detection in use. In most CRC algorithm
implementations, a 16-bit CRC checkword is formed by treating all the bits of
a transmission block's message portion as one large number, and computing the
remainder of this number after division by a known 17-bit divisor. This 16-bit
CRC checkword is adequate to detect all single-bit errors, all burst errors of
16 or fewer bits in length, and all double-bit errors separated by fewer than
65,536 bits (or 8192 bytes). This is a respectable showing for a technique
requiring so little additional data.
Another desirable property of CRC is that the generation and testing of the
checkword can be readily implemented in hardware by sequentially shifting,
masking, and exclusive-ORing the bits of the bytes being transmitted.
Unfortunately, most computer architectures lack the instructions required to
allow this type of bit-oriented calculation to be performed efficienily.
Because of this, software CRC generation and testing tends to be processor
intensive. A good assembly-language implementation of CRC will typically
require at least six machine instructions for each bit transmitted, or over 50
machine instructions for each byte of data being transmitted.
At high data rates, slow processors can have performance problems.


Enter Fletcher


In the late '70s, an error-detection technique was developed which provides
error-detection properties nearly equal to those of CRC but requires much less
computational effort. This method, known as the "Fletcher checksum," was
devised by John G. Fletcher of Lawrence Livermore Labs. The algorithm and an
analysis of its performance characteristics were published in the IEEE
Transactions on Communications in January 1982. One version of the Fletcher
checksum (known as the one's complement version) has since been adopted for
use in the class-4 transport layer of the ISO network protocol. In spite of
all this. Fletcher's checksum is not as well known as more commonly used
techniques.
In his paper on this algorithm, Fletcher analyzed several common measures of
the error-detection ability of his checksum algorithm and compared the results
to those for a CRC algorithm. The results were generally similar for both
algorithms, but CRC had a slight edge in most categories.
Both the CRC and Fletcher's checksum will detect all single-bit errors. Both
allow a small fraction of errors to get through undetected: 0.001526 percent
using a CRC vs. 0.001538 percent using Fletcher's checksum. Both detect all
double-bit errors as long as the erroneous bits are close enough to each
other. However, the definition of "close enough" varies from 2040 bits for
Fletcher's algorithm to 65,535 bits for the CRC algorithm. In practical terms,
this means that a CRC will detect all double-bit errors so long as a block
size of under 8191 bytes is used, while Fletcher's checksum can only guarantee
this performance on blocks of fewer than 255 bytes.


The Algorithm


The check field in Fletcher's algorithm is made up of two 8-bit check values.
The first is initially set to 0. As each byte in the message is processed, its
value is added to this check value, and the remainder on division by 255 is
saved as the new first check value. The second check value is computed in a
similar manner. Its value is initialized to 0, and updated as each byte in the
message is processed. This time, however, it is updated by adding the current
value of the first check value. Again, the remainder on division by 255 is
saved as the new second check value. Example 1(a) shows pseudocode for this.
Example 1: (a) Pseudocode for the first part of Fletcher's algorithm; (b)
pseudocode for the second part of Fletcher's algorithm.
 (a)

 integer i, sum1, sum2;

 sum1 = 0;
 sum2 = 0;
 for i from 1 to message_length do
 sum1 = ( sum1 + message[i] ) modulo 255;
 sum2 = ( sum2 + sum1 ) modulo 255;
 end for

 (b)

 check1 = 255 - (( sum1 + sum2 ) modulo 255);
 message[message_length+1] = check1;
 check2 = 255 - (( sum1 + check 1 ) modulo 255);
 message[message_length+2] = check2;


After these two check values have been computed, they must be incorporated
into the data stream so that they can be checked at the receiving end. One
approach is to simply append them to the end of the message block. This would
provide good error detection, but many of the desirable properties discussed
previously would be lost. To avoid this, the checksum itself is not
transmitted with the message. Rather, a value is computed from the checksum
bytes, which causes the receiver to end up with a checksum of 0 when the
checksum of all data from the start of the message data through the end of the
checksum field is generated. The first of these two values is 255 minus the
modulo-255 remainder of the sum of the 2 message-checksum bytes. The second
value is 255 minus the modulo-255 remainder of the sum of the first message
checksum byte and previously computed first checksum byte. Pseudocode for the
second part of the computation is shown in Example 1(b).



A Real-world Example


As far as the theory goes, that's about all there is to Fletcher's checksum
algorithm. In practice, however, additional details must be taken into
account--for example, how to do it fast in assembly language.
I received my real-world introduction to Fletcher's checksum in 1989 while
working as a contractor at NASA. A team of NASA engineers had developed an
experimental local area network which had a series of custom-built processing
nodes connected through fiber-optic cable. The network was operational, but
was not meeting its performance expectations. I was called in to investigate
and suggest improvements.
After some examination, I determined that when the network was operating at
maximum capacity, over half of the CPU time was spent in a single subroutine:
the subroutine that calculated the first part of Fletcher's checksum for the
2040-byte data packets. This checksum calculation was perfonned using the
lines of C shown in Example 2(a). The corresponding assembly language (for the
Motorola 680x0) generated by the C statements is shown in Example 2(b).
Example 2: (a) An unoptimized C implementation of the basic checksum; (b) the
corresponding assembly language.
 (a)

 register unsigned char *ptr;
 register short int i, len, sum1, sum2;

 sum1 = sum2 = 0;
 for (i=0; i<len; i++)
 {
 sum1 += *ptr++;
 if (sum1 >= 255) sum1 -= 255;
 sum2 += sum1;
 if (sum2 >= 255) sum2 -= 255;
 }

 (b)

 moveq #0,d5
 move.w d5,d4 ;
 sum1 = sum2 = 0;
 moveq #0,d6
 bra.s L_29 ;
 for(i=0; i<len; i++)
 {
 L_28 moveq #0,d0
 move.b (a3)+,d0
 add.w d0,d4 ;
 sum1 += *ptr++;
 cmp.w #255,d4
 blt.s L_32
 sub.w #255,d4 ;

 if(sum1 >= 255) sum2 -= 255;
 L_32 add.w d4,d5 ;
 sum2 += sum1;
 cmp.w #255,d5
 blt.s L_36
 sub.w #255,d5 ;

 if(sum2 >= 255) sum2 -= 255;
 L_36 addq.w #1,d6
 L_29 cmp.w d7,d6
 blt.s L_28 ;
 }


This is reasonably efficient code generation: All variables are kept in
registers, there are no superfluous loads or stores, and there are no simple
peephole optimizations which would speed things up. Even so, the 13
instructions in this loop require an average of 84 clock cycles to execute,
assuming that each of the two subtract operations is executed on half of the
passes. On the 10-MHz 68010 processor used in this project, this calculation
required 8.4 microseconds per byte, or about 17 milliseconds to process the
2040-byte data buffers used in this network.
The most obvious improvement to the existing algorithm is to eliminate the
last IF statement, replacing it with a remaindering operation performed upon
exit from the loop. This is possible because of the linearity of the modulus
operator, which provides that the modulo-255 sum of a series of numbers is the
same as the modulo-255 remainder of the sum of the series. It is now
necessary, however, to use a long (32-bit) integer for sum2. This will prevent
overflow, so that the remainder can be computed correctly. A long integer
representation of sum2 cannot overflow with a buffer smaller than (2[32])/255
bytes. Because this is just over the 2[24]-byte address space limit of the
68010 processor, overflow will not be a problem.
The initial value of len need not be preserved inside the subroutine, so the
FOR loop can be changed to a slightly quicker WHILE loop which decrements len
in the loop's exit test. This in turn can be coded in assembly language as a
DBRA machine instruction. With these changes, the C-version algorithm can be
modified, as shown in Example 3(a).
Example 3: (a) An improved C version of the basic checksum; (b) the
corresponding assembly language.

 (a)


 register unsigned char *ptr;
 register short int sum1, len;
 register unsigned long int sum2;

 sum1 = sum2 = 0;
 while (len--)
 {
 sum1 += *ptr++;
 if (sum1 >= 255) sum1 -= 255;
 sum2 += sum1;
 }
 sum2 %= 255;

 (b)

 loop moveq #0,d3
 move.b (a3)+,d3
 add.w d3,d0 ;
 sum1 += *ptr++;

 cmp.w #255,d0
 blt.s no_sub
 sub.w #255.d0 ;
 if (sum1 >= 255) sum1 -= 255;

 no_sub add.1
 d0,d1 ; sum2 += sum1;
 dbra d2,loop ;
 while (len--);


When the C code is turned into assembly language, the register usage is as
follows:
D0 holds sum1, the first checksum value.
D1 holds sum2, the second checksum value.
D2 holds len, the length in bytes of the buffer.
A0 holds ptr, a pointer into the buffer.
The assembly-language code for the inner loop is as shown in Example 3(b).
(These examples do not show the code to load the registers before entering the
main processing loop, nor the code to calculate the remainder of sum2 and
store the values after exiting the main processing loop.)
These changes drop the inner loop to eight instructions per byte with an
execution time of 56 cycles, or 5.6 microseconds per byte. While this is a
significant improvement, a few tricks remain.
The most obvious problem is that after adding a byte in memory into the
current sum1 value, three instructions are required to maintain the proper
modulo-255 remainder for this value. Because sum1 should remain in the 0-254
range, it would be helpful if byte arithmetic could be used to perform this
addition, thus reducing the first three instructions to a single add.b
(a0)+,d0 instruction. As it turns out, this change will almost work correctly.
A byte addition such as this will be performed using modulo-256 arithmetic, so
if an overflow occurs, the resulting sum will be 1 smaller than the correct
value.
Luckily, the 68010 instruction set includes an extended add instruction
designed for use in performing extended precision arithmetic. This instruction
adds two registers and increments the result if the extended bit was set when
the instruction was entered. The extended bit will be set by the previous add
instruction whenever the sum exceeds 255. By keeping a value of 0 in a
register and performing an extended add of this 0 value to the sum of the
previous addition, the increment required to compensate for the error caused
by using modulo-256 arithmetic is performed whenever an overflow has occurred
in the previous addition.
There is, however, one special case in which this scheme fails: when the
addition to the sum1 value stored in register DO results in a value of 255.
This isn't in the 0-254 range required by the checksum algorithm, and because
it doesn't exceed 255, it isn't large enough to generate the overflow which
would force it into this range. A close examination of the algorithm reveals
that this can be worked around outside the loop. If this situation arises at
the end of the buffer, sum1 will contain a value of 255 instead of its proper
value of 0. This can be easily tested for on exiting the loop. If this
situation occurs anywhere other than at the end of the loop, the next nonzero
byte to be added to the sum1 value will cause an overflow, which will correct
the condition. The sum1 value is also required to form the sum2 value. Because
the only important part of the sum2 value is the modulo-255 remainder, adding
the erroneous value of 255 to this value is equivalent (that is, it is
modulo-255 congruent) to adding the correct value of 0.
Using D4 to hold the constant 0 value required for the add extended trick, the
inner loop reduces to the four instructions shown in Example 4. This optimized
version is a considerable improvement over the initial implementation. Only
four instructions are executed for each byte of data being checksummed. The
three adds require eight, four, and six machine cycles, respectively, while
the decrement and branch instruction takes ten machine cycles, for a total of
28 machine cycles per byte. With this loop reduced to the point where nearly a
third of the execution time is used by the decrement and branch instruction,
another significant improvement in execution time can be obtained by unrolling
the loop.
Example 4: The final optimized version of the basic checksum.


 ; Register Usage
 ; D0 the first checksum value (sum1)
 ; D1 the second checksum value (sum2)
 ; D2 the length in bytes of the buffer (len)
 ; D4 contains zero, for the addx.b instruction
 ; A0 pointer to the buffer (ptr)

 loop add.b (a0)+,d0 ;
 sum1 += *ptr++;
 addx.b d4,d0 ;
 if (sum1 >= 256) sum1 += 1;
 add.1 d0,d1 ;
 sum2 += sum1;

 dbra d2,loop ;
 while (len--);


The way I finally implemented this routine was to unroll the inner loop 16
times. Before entering this loop, an address register is loaded with the
address of the appropriate occurrence of the three-instruction addition
sequence to be used the first time through the loop. This is based on the
modulo-16 remainder of the byte count. If this value is 0, the loop will be
entered at its end; if this value is 1, the loop will be entered at the last
set of add instructions, and so on. The byte-count value is then set to
one-sixteenth of its actual value to compensate for the loop being unrolled.
With this final enhancement, three add instructions must be executed per byte,
using 18 machine cycles per byte. In addition, one decrement and branch
instruction must be executed for each 16 bytes, using an average of
ten-sixteenths of a machine cycle per byte. This is over four times faster
than the original version of the algorithm.
The revision of the checksum-generation code resulted in a routine over four
times faster (18 5/8 vs. 84 machine cycles) than the original implementation.
This optimization resulted in a near doubling of the throughput of the network
using this function. This is a classic case of finding the one routine which
consumes the majority of processing time, and obtaining a big overall
performance improvement by tuning only that one routine. While the new
checksum routine is machine dependent due to its being written in Motorola
680x0 assembly language, comparable instructions are available in all
contemporary computers with which I am familiar, making the technique broadly
applicable.


Fletcher vs. CRC


An interesting question is how the performance of Fletcher's checksum
algorithm compares with the performance of a well-tuned, assembly-language
implementation of a CRC-generation algorithm.
To get a rough order-of-magnitude idea of the relative performance of these
two algorithms, I examined several routines which generate the CRC values used
in the XModem protocol. The best CRC implementations have inner loops which
involve two AND instructions, two shift instructions, a conditional jump, and
one or two exclusive-OR instructions for each bit in a message. This averages
to 6.5 instructions per bit, or 52 instructions for each byte in a
message--nearly 20 times more instructions per byte than Fletcher's technique.
In summary, Fletcher's checksum technique provides excellent error detection
using a simple algorithm that can be implemented in only a few lines of code.
Compared to CRC error detection, Fletcher's algorithm provides a comparable
level of data integrity with a fraction of the computational effort. It's an
excellent choice where high-link speeds must be achieved while using moderate-
or low-performance computer hardware.
$CDDJ


_FLETCHER'S CHECKSUM_
by John Kodis


Example 1:

(a)

 integer i, sum1, sum2;

 sum1 = 0;
 sum2 = 0;
 for i from 1 to message_length do
 sum1 = ( sum1 + message[i] ) modulo 255;
 sum2 = ( sum2 + sum1 ) modulo 255;
 end for.


(b)

 check1 = 255 - (( sum1 + sum2 ) modulo 255);
 message[message_length+1] = check1;
 check2 = 255 - (( sum1 + check1 ) modulo 255);
 message[message_length+2] = check2;




Example 2:


(a)

 register unsigned char *ptr;
 register short int i, len, sum1, sum2;

 sum1 = sum2 = 0;
 for (i=0; i<len; i++)
 {
 sum1 += *ptr++;
 if (sum1 >= 255) sum1 -= 255;

 sum2 += sum1;
 if (sum2 >= 255) sum2 -= 255;
 }


(b)

 moveq #0,d5
 move.w d5,d4 ;sum1 = sum2 = 0;
 moveq #0,d6
 bra.s L_29 ;for(i=0; i<len; i++) {
 L_28 moveq #0,d0
 move.b (a3)+,d0
 add.w d0,d4 ; sum1 += *ptr++;
 cmp.w #255,d4
 blt.s L_32
 sub.w #255,d4 ; if(sum1 >= 255) sum1 -= 255;
 L_32 add.w d4,d5 ; sum2 += sum1;
 cmp.w #255,d5
 blt.s L_36
 sub.w #255,d5 ; if(sum2 >= 255) sum2 -= 255;
 L_36 addq.w #1,d6
 L_29 cmp.w d7,d6
 blt.s L_28 ;}



Example 3:

(a)
 register unsigned char *ptr;
 register short int sum1, len;
 register unsigned long int sum2;

 sum1 = sum2 = 0;
 while (len--)
 {
 sum1 += *ptr++;
 if (sum1 >= 255) sum1 -= 255;
 sum2 += sum1;
 }
 sum2 %= 255;

(b) loop moveq #0,d3
 move.b (a3)+,d3
 add.w d3,d0 ; sum1 += *ptr++;

 cmp.w #255,d0
 blt.s no sub
 sub.w #255,d0 ; if (sum1 >= 255) sum1 -= 255;

 no_sub add.1 d0,d1 ; sum2 += sum1;
 dbra d2,loop ; while (len--);



Example 4:

 ; Register Usage

 ; D0 the first checksum value (sum1)
 ; D1 the second checksum value (sum2)
 ; D2 the length in bytes of the buffer (len)
 ; D4 contains zero, for the addx.b instructions
 ; A0 pointer into the buffer (ptr)

 loop add.b (a0)+,d0 ; sum1 += *ptr++;
 addx.b d4,d0 ; if (sum1 >= 256) sum1 += 1;
 add.l d0,d1 ; sum2 += sum1;
 dbra d2,loop ; while (len--);




















































May, 1992
THE WINDOWS COMMUNICATIONS API


Porting your programs from DOS to Windows


 This article contains the following executables: WINCOM.ARC


Mike Sax


Mike is a Windows programming and development consultant. He has been
programming in Windows since the advent of version 1.02 and can be reached on
CompuServe at 75470,1403.


Windows 3 communications programs suffer from excess overhead mainly because
they can't concentrate on just one task at a time. Tasks that must be attended
to include monitoring the communications port; giving control to individual
applications; keeping the user interface running interactively while
communications are in progress; using virtual memory; and accessing the disk
while communications are in progress. As a result, even well-implemented
communications programs run somewhat slower under Windows 3 than under DOS at
speeds of 9600 baud or higher. All this has given Windows 3 communications a
bad name.
Still, Windows communications services provide a portable way to gain full
control over the communications port and perform buffered I/O. Furthermore,
under Windows 3.1, communications support has been significantly enhanced by
fully exploiting the FIFO stack capabilities of the 16550 UART chip built into
most newer PCs. In addition, Windows 3.1 applications can be notified through
messages when certain communications events occur, so the application does not
have to constantly poll the communications port in a continuous loop.
In this article, I'll discuss the Windows communications API and some key
issues involved in porting DOS applications to the Windows environment.


XModem and the Windows Communications API


The Windows communications API is a collection of 16 functions. For starters,
when you open a Windows communications port using the OpenComm function, the
return value is an integer, which is used to identify the port in subsequent
calls to the Windows communications API.
Once you have opened the comm port, you set the communications parameters
(baud rate, handshaking, parity, stop bits, and so on) by filling in a data
structure called the "Device Control Block" (DCB) and passing it to the
Set-CommState function. The DCB data structure provides control over the comm
port, and filling it in can be quite complex. The SetComPortParameters
function in COMM.C (Listing One) does all the hard work for you. (Listing Two
is COMM.H.).
If a communications error occurs, Windows locks the comm port until you call
the GetCommError function. This function lets you retrieve information about
the error, then unlocks the comm port. Because calling GetCommError after
every line in which you access the comm port requires extra code, you might
want to use higher-level functions such as those in Listing One.
After low-level functions have been addressed, you can move on to higher-level
issues. In discussing these issues, I'll implement the XModem file-transfer
protocol for Windows. For a discussion on the internal workings of XModem,
refer to Al Stevens's "C Programming" column in the April 1989 issue of DDJ
from which I "borrowed" the core code.
There are two important issues to consider when converting Al's XModem
implementation to Windows. First, the user interface code is different. In
Windows, you have a dialog box that shows status information, and this enables
the user to abort the transfer by pressing a Cancel button. Secondly, the
current version of Windows implements non-preemptive multitasking for all
Windows applications. This means that your program has to give control to the
system to allow other Windows applications to run. In this discussion, I'll
put the UI issues aside, focusing instead on the core-application multitasking
requirements.
Your system and all other Windows programs will be locked during tbe entire
XModem transfer, and any attempt by the user to use the keyboard or mouse will
fail. We not only have to give control to other Windows applications, but also
to process any messages that arrive in the application's message queue.
Consequently, we need to write a function, DoEvents, which processes all
messages in our message queue and gives control to Windows for a short period
of time.
Before writing DoEvents, you should be aware that both the GetMessage() and
PeekMessage() Windows functions retrieve a message from the application's
message queue. The main difference between them is that PeekMessage() will
immediately return control to your program, even if no messages are waiting. A
call to GetMessage() will not return until there is a message in your message
queue, so this function cannot be used for background processing. Another
difference is the return value of these functions. PeekMessage() returns TRUE
if it has retrieved a message from the message queue, or FALSE if the message
queue is empty. GetMessage() always returns TRUE, except when the message it
retrieves is a WM_QUIT message.
The Yield() function gives control to other Windows applications if no
messages are waiting in the message queue. But if there are messages in the
queue and you are only using Yield(), how will the application process these
messages? The answer is simple: It won't. The messages will remain
unprocessed, and other Windows applications will not get control until we call
PeekMessage() or GetMessage() to empty the message queue.
DoEvents() in XMODEM.C calls PeekMessage until all the messages in the message
queue have been processed (in which case PeekMessage will return FALSE). When
no more messages are left to process, PeekMessage automatically gives control
to other applications.
When you receive a message using PeekMessage, you can use TranslateMessage and
DispatchMessage, just as you would in a normal message loop. Because we're
using a modeless dialog box, you also have to call the IsDialogMessage()
function to make the status window not only look like a dialog box but also
act like one.
Also, if you receive a WM_QUIT message inside the main message loop, this
means that Windows wants our program to terminate as soon as possible, so we
post the WM_QUIT message using PostQuitMessage and abort the XModem transfer
by setting the gbUserCanceled global variable to TRUE, just as if the user had
pressed the Cancel button in our status dialog.
The rest of the XModem code isn't much different from the original DOS code in
Al Stevens's article. I just replaced a few calls to the communications
functions with calls to the high-level functions in Listing One and replaced
the console I/O calls with calls to SetDlgItemText and SetDlgItemInt.


Writing a Windows Terminal Emulator


In DOS, it's easy to implement TTY terminal emulation because the DOS text
screen is basically a sophisticated TTY terminal. But Windows uses a graphical
user interface, and there's no such thing as a standard TTY control. To create
a terminal emulator for Windows, I implemented a new window class, TERMINAL,
which is a virtual DOS text screen in a window, much like the screen you get
when running a DOS application in Windows Enhanced Mode.
If you want to use the terminal control in your application, you should call
InitTerminal at the start of the WinMain function. This function initializes
global variables that contain the handle and the dimensions of the OEM text
font.
The cbWndExtra field in the class structure is set to the size of a local
memory handle. When a new terminal window is created, and the terminal window
function receives a WM_CREATE message, a local memory block will hold the
virtual screen buffer along with other information, such as the current cursor
position, the number of rows and columns displayed, and so on. The local
memory handle is then stored in the first extra bytes of the newly created
window. Of course, when the window is destroyed and receives a WM_DESTROY
message, the local memory block is freed.
If the terminal window is too small to display the entire virtual screen, the
user should be able to scroll the window. When the window is resized and it
receives a WM_SIZE message, the function sets the new scroll ranges for the
horizontal and vertical scroll bars. When the user clicks the scroll bar, the
terminal window receives a WM_HSCROLL or WM_VSCROLL message. Because the code
to handle vertical and horizontal scrolling is similar, we have created a
HandleScroll function, which checks the value of wParam and returns the new
value of the scroll-bar position.
The terminal window should also respond to the WM_GETDLGCODE. When the
terminal window is part of a dialog box, the dialog box sends this message to
the control to determine who should processes certain keyboard messages (such
as pressing the tab and arrow keys). The terminal-window function returns
DLGC_WANTALLKWYS to indicate that it wants to processes all key codes.
What should happen when the user enters characters in the window? We might
send the characters to the comm port inside the terminal control's window
function, but it's always a good idea to write modular, reusable code. The
standard Windows controls send notification messages to their parent windows
when something interesting happens. A notification message is basically a
WM_COMMAND message with wParam equal to the child control's ID and more
information in lParam. So when the terminal window receives a WM_CHAR message,
it sends a notification message to its parent window, with the low word of
lParam equal to the character code.
Likewise, we could use a function to print characters on a terminal window,
but it's more in the "spirit of Windows" to use a message for this. All the
standard controls have user-defined messages (WM_USER plus a certain value) to
control their behavior, so why shouldn't our terminal control do the same? To
print a character to the terminal window, you can send a TW_SENDCHAR message
to the terminal window.


Putting it all Together


At this point, we have high-level communications functions. a Windows XModem
implementation, and a terminal-window control. It's time to put it all
together and create a terminal program.
The terminal program's main window is a dialog box that has its own class. In
addition to a terminal window and a status bar, it contains a few comboboxes
to let the user change the current communications settings. We use a dialog
box because the user has to be able to navigate through the comboboxes using
the tab keys.
Standard Windows dialog boxes automatically use the default dialog box class.
But if we include a CLASS statement in the dialog definition in our .RC or
.DLG file, the dialog box will have its own class. The advantage of having our
own class is that we don't have to use the default fields in the standard
dialog box class structure.
When you send or post a message to a window, Windows will call the window
function defined in the lpfnWndProc field of the class structure. So normally,
we should have a window function for every class we define. But because we
want our main window to act like a standard dialog box, we let the lpfnWndProc
point to DefDlgProc, the Windows function used in standard dialog boxes.
When the user presses Escape or Enter in one of the comboboxes, the focus
should be set to the terminal window again. To avoid subclassing the
comboboxes, we can create two invisible buttons which have the standard IDs
used for OK and Cancel--IDOK and IDCANCEL, respectively. This way, the Windows
dialog-box code will immediately send a WM_COMMAND message to our dialog
function with wParam either equal to IDOK or IDCANCEL. If we receiv such a
message, we set the focus to the terminal window.

When the user selects a new value from the combobox to change any of the
communications parameters, the dialog function will receive a WM_COMMAND
notification message with the combobox child ID in wParam and CBN_SELCHANGE in
the high word of lParam. The dialog function will automatically retrieve the
current communications parameters from the comboboxes and use the high-level
SetComParameters function from Listing One. If the attempt to change the
current communications parameters fails, an error message will be shown in the
status bar.
Because we also have to receive characters from the comm port in the
background, we have created a DoEvents function, just like in the XModem
transfer. But the DoEvents function in our terminal program returns a BOOL
instead of void. This is because DoEvents will return FALSE if the last
message retrieved was a WM_QUIT message. So in the WinMain message loop we can
process characters from the comm port as long as DoEvents return TRUE.
When the user enters a character in the terminal window, the terminal window
will send a notification message to our main window. When the dialog function
retrieves a WM_COMMAND message with wParam equal to the terminal window's ID,
the dialog function sends the value in the high word of lParam to the
communications port.


Conclusion


We now have a fully functional Windows terminal program in less than 1200
lines. Windows programming can be tough sometimes, but the results are almost
always rewarding. The message-oriented nature of Windows makes it easy to
write reusable code in Windows. The high-level communications functions, the
XModem transfers, and the TTY terminal control can be incorporated into any
program without changes. Windows communications may not be perfect, but it
sure has come a long way. And it's getting better every time a new version of
Windows is released.


_THE WINDOWS COMMUNICATIONS API_
by Mike Sax



[LISTING ONE]

////////////////////////////////////////////////////////////////////////////
// COMM.C - by Mike Sax for Dr. Dobb's Journal
// This file implements a few higher-level communicatons functions under
// Microsoft Windows. This file contains eight public functions:
// int OpenComPort(int nPort);
// BOOL CloseComPort(int nPortID);
// BOOL SetComPortParameters(int nPortID, int nSpeed, char chParity,
// int nDataBits, int nStopBits, BOOL bXOnXOff, BOOL bHardware);
// int CharsWaitingToBeRead(int nPortID);
// int ComReadChar(int nPortID);
// int ComReadChars(int nPortID, char *pchBuffer, int cbBuffer);
// BOOL ComWriteChar(int nPortID, int nChar);
////////////////////////////////////////////////////////////////////////////

#define USECOMM 1 // for 3.1 windows.h
#include <windows.h>
#include "comm.h"

// Opens comm port (1 = COM1) and returns its ID value.
// If an error occurs, return value is negative
int OpenComPort(int nPort)
 {
 char szPort[10];

 wsprintf(szPort, "COM%d", nPort);
 // Open the port with a 4K input queue and a 2K output queue
 return OpenComm(szPort, 4096, 2048);
 }
// Closes comm port specified by nPortID: returns TRUE if success,
// FALSE if failure
BOOL CloseComPort(int nPortID)
 {
 if (nPortID < 0)
 return FALSE;
 FlushComm(nPortID,0); // Flush transmit queue
 FlushComm(nPortID,1); // Flush receive queue
 return !CloseComm(nPortID);
 }
// Sets communications parameters of port specified by nPortID:
// returns TRUE if success, FALSE if failure
BOOL SetComPortParameters(int nPortID, int nSpeed, char chParity,

 int nDataBits, int nStopBits, BOOL bXOnXOff, BOOL bHardware)
 {
 DCB dcb;

 if (nPortID < 0)
 return FALSE;
 dcb.Id = nPortID;
 dcb.BaudRate = nSpeed;
 dcb.ByteSize = (BYTE)nDataBits;
 // Convert chParity to uppercase:
 AnsiUpperBuff(&chParity, 1);
 dcb.Parity = (chParity == 'N') ? NOPARITY :
 (chParity == 'O') ? ODDPARITY :
 (chParity == 'E') ? EVENPARITY :
 (chParity == 'M') ? MARKPARITY : SPACEPARITY;
 dcb.StopBits = (BYTE)((nStopBits == 1) ? ONESTOPBIT :
 (nStopBits == 2) ? TWOSTOPBITS : ONE5STOPBITS);
 dcb.RlsTimeout= 0;
 dcb.CtsTimeout = bHardware ? 30 : 0;
 dcb.DsrTimeout = 0;
 dcb.fBinary = TRUE;
 dcb.fRtsDisable = FALSE;
 dcb.fParity = FALSE;
 dcb.fOutxCtsFlow = (BYTE)bHardware;
 dcb.fOutxDsrFlow = FALSE;
 dcb.fDummy = 0;
 dcb.fDtrDisable = FALSE;
 dcb.fOutX = (BYTE)bXOnXOff;
 dcb.fInX = (BYTE)bXOnXOff;
 dcb.fPeChar = FALSE;
 dcb.fNull = FALSE;
 dcb.fChEvt = FALSE;
 dcb.fDtrflow = (BYTE)FALSE;
 dcb.fRtsflow = (BYTE)bHardware;
 dcb.fDummy2 = 0;
 dcb.XonChar = 17;
 dcb.XoffChar = 19;
 dcb.XonLim = 4096 / 4; // Receive buffer size / 4
 dcb.XoffLim = dcb.XonLim;
 dcb.EofChar = 26;
 dcb.EvtChar = 0;
 dcb.TxDelay = 0;
 return !SetCommState(&dcb);
 }
// Returns the number of characters waiting in the input queue
int CharsWaitingToBeRead(int nPortID)
 {
 COMSTAT ComStat;

 if (nPortID < 0)
 return 0;
 GetCommError(nPortID, &ComStat);
 return ComStat.cbInQue;
 }
// Read character from port specified by nPortID:
// returns -1 if no character available
int ComReadChar(int nPortID)
 {
 int iResult = 0;


 if (nPortID < 0)
 return -1;
 if (ReadComm(nPortID, (LPSTR)&iResult, 1) != 1)
 {
 iResult = -1;
 GetCommError(nPortID, NULL);
 }
 return iResult;
 }
// Read character from port specified by nPortID:
// returns the number of characters read or -1 if an error occurs.
int ComReadChars(int nPortID, char *pchBuffer, int cbBuffer)
 {
 int iResult = 0;

 if (nPortID < 0)
 return -1;
 iResult = ReadComm(nPortID, pchBuffer, cbBuffer);
 if (iResult < 0)
 {
 iResult = -1;
 GetCommError(nPortID, NULL);
 }
 return iResult;
 }
// Write a character to the port specified by nPortID
// returns TRUE if success, FALSE if failure
BOOL ComWriteChar(int nPortID, int nChar)
 {
 if (nPortID < 0)
 return FALSE;
 if (1 != WriteComm(nPortID, (LPSTR)&nChar, 1))
 {
 GetCommError(nPortID, NULL);
 return FALSE;
 }
 return TRUE;
 }





[LISTING TWO]

BOOL ComWriteChar(int nPortID, int nChar);
int ComReadChar(int nPortID);
int ComReadChars(int nPortID, char *pchBuffer, int cbBuffer);
int CharsWaitingToBeRead(int nPortID);
BOOL SetComPortParameters(int nPortID, int nSpeed, char chParity,
 int nDataBits, int nStopBits, BOOL bXOnXOff, BOOL bHardware);
BOOL CloseComPort(int nPortID);
int OpenComPort(int nPort);




































































May, 1992
IPX: THE GREAT COMMUNICATOR
 This article contains the following executables: IPX.ARC


Speeding up a Novell network with a streams interface




Rahner James


Rahner is an independent consultant living near Sacramento, Calif. He can be
reached by phone at 916-722-1939 or through CompuServe at 71450,757.


When I first began using networks, the most obvious means of communication
between nodes was to open a file and have all the nodes access that file.
Records would be written to that open file, flushed, and read by another node.
With only a couple of nodes on a network communicating sparingly in this
fashion, the method appeared to work.
With a moderate load, however, the network seemed to lose pep. Sometimes the
nodes would fail to flush their buffers properly and data would never pass
into the open file. Often the entire system would bog down in the message
pool, inhibiting normal data-access functions.
I knew there must be a better way, and happily, Novell provided an apparent
answer to my need for speed--Internetwork Packet Exchange, or lPX. IPX is an
example of the third layer of what the International Standards Organization
(ISO) proposed standard calls the "Open System Interconnection" (OSI) model,
as it relates to Novell's Netware. IPX lets programmers perform high-speed,
peer-to-peer communication on Novell's Netware. IPX is the lowest level of
communication that can be performed on a network without resorting to direct
access of the hardware. Novell refers to the functions that enable this type
of communication as "IPX/SPX Communication Services." (SPX stands for
Sequenced Packet Exchange.) Here, I'll refer to IPX/SPX functions as "XPX."
In this article, I'll discuss IPX/SPX and present a library of IPX
functions--implemented as a stream--that significantly improves IPX throughput
without additional server time. I've also written a working program called
TEST1.C that uses the major XPX functions. TEST1.C is run on multiple network
nodes and continually transfers packets to any other nodes from which it
receives broadcasts. The library and test sample program are available
electronically; see page 3 in this issue.


An XPX Backgrounder


Normal Novell operating-system accesses are through the MS-DOS INT 21h window.
XPX, however, uses a slightly different mechanism. Before any function calls
can be made, the application must call the MS-DOS multiplex and get the vector
to the XPX entry point. The assembly code segment for this is shown in Example
1. All XPX routines are accessed by making far calls using the IPX_Vector.XPX
uses register BX to hold an XPX command number. An XPX call is not kind to
unused registers and, in general, sensitive registers should be saved
(especially BP). By using a far call (rather than MS-DOS's INT 21h), most IPX
functions can be called from a background process without the programmer
worrying about trashing the system. All XPX functions return the status of
their result in AL.
Example 1: Assembly code calls MS-DOS multiplex and gets vector to the XPX
entry point.

 mov ax, 7a00h ; Function 7Ah, AL = 0
 int 2fh ; MS-DOS Multiplex interrupt
 ; Returns with AL == 0FFh if xPX
 ; exists and ES:DI == xPX vector
 inc al ; Set ZERO if AL == -1
 jnz outta_here ; Quit if xPX isn't around
 mov IPX_Vector, di ; Save the xPX entry factor
 mov IPX_Vector+2, es


In my mind, the XPX functions fall into two major categories:
initialization/information and communication. The initialization/information
functions start up the XPX internals, open communication pathways, and give
the application information about how things have been setup. The
communication functions are responsible for sending/receiving packets of
information to/from peers on the network.
All network interfaces (Ethernet, Arc-net, and so on) require a unique 6-byte
node ID to differentiate between them. The node ID only differentiates a
network interface within a single network. Therefore, two networks can have
interfaces with identical node IDs. Because a single computer can have
multiple network interfaces or a network interface can be placed in another
computer, a node ID does not necessarily specify a single, distinct computer
system.
Every Novell network has a 4-byte ID number that uniquely differentiates it
from any other Novell networks to which it is connected. If no other networks
are connected, any number is unique.
Every node connected to a network can open several different channels of
communication, called "sockets." Sockets allow applications to differentiate
types of communication performed by a node. One socket can be used to
broadcast one message to all nodes while another socket can be used to send
and/or receive messages from specific nodes or groups of nodes. By default,
XPX can support up to 20 sockets on a single node. Through configuration, the
number of sockets that may be open can be increased to 150. A 2-byte number is
used as a socket ID.
Numerically, the network, node, and socket IDs are in Motorola format. The
format is actually of no consequence because the magnitude, order, and content
are not relevant to the application or XPX. There are special cases of each
type of ID, but all are palindromes, so byte order is still not important. The
4-byte network ID can be viewed as a 32-bit long integer with no significance
associated with its magnitude. The 6-byte node ID can be viewed as a non-ASCII
string, such as a filename, with no naming convention. The 2-byte socket ID
can be a short integer that also has no significance associated with its
magnitude.


XPX Structures


The basic control structure used with both IPX and SPX is the Event Control
Block (ECB); see Table 1. The ECB is passed to XPX to describe its associated
channel and buffers. The first 34 bytes of the ECB contain control and
addressing information. A list of "associated fragments" immediately follows
this 34-byte header.
Table 1: Event Control Block format
Region Description

0-3 Link to next ECB, filled and used by XPX. While XPX is not
 using this ECB, the application can use this field for its own
 ECB management.

4-7 Far pointer to the Event Service Routine (ESR) associated with
 this ECB. This can be NULL, if asynchronous processing is not

 desired.

8 In Use Flag. Used by XPX to show the current state of ECB
 processing. Set to 0 when XPX is done with the ECB.

9 Completion Code. Set by XPX when the ECB In Use Flag is set to
 0. Valid only when XPX has finished with the ECB. A 0
 indicates that the ECB task was completed successfully. Any
 other value indicates an error condition.

10-11 Socket ID. Set by the application to tell XPX with which
 socket the ECB is to communicate.

12-27 Used internally by XPX.

28-33 Immediate Address. This is the local node ID with which this
 ECB is to communicate. If ECB is being sent (a talker) using
 IPX (rather than SPX), this field should be filled in by the
 application. If ECB is a listener or uses SPX, this field is
 filled by XPX.

34-35 Fragment Count. Filled in by application to tell XPX how many
 fragment descriptors are to follow. All ECBs must have at
 least one fragment descriptor to point to an XPX packet
 header. The cumulative size of the fragments associated with
 an ECB can not exceed 576 bytes.

36-39 Fragment Pointer 1. Far pointer to the first fragment
 associated with the ECB. The application must have at least
 one fragment that contains a complete IPX or SPX header. Any
 additional data buffers can be contiguous extensions of the
 XPX header or segmented into unconnected memory locations.
 Noncontiguous memory fragments require more than one fragment
 descriptor.

40-41 Fragment Size 1. The number of bytes in the first fragment. If
 the size of the first fragment is not at least the size of
 either an IPX or SPX header, XPX returns an error. Fragments
 that follow must contain at least 1 byte.

42-45 Fragment Pointer 2. This fragment descriptor and those that
 follow are optional and need not be used or declared if the
 first fragment descriptor describes the packet buffer in its
 entirety.

46-47 Fragment Size 2. Optional size of the second fragment.

... ...

nn-nn Fragment Pointer n

mm-mm Fragment Size m


The Event Service Routine (ESR) is a function provided by the application and
called by XPX when the ECB has been processed. The ESR is called in the
following conditions:
ES:SI points to the ECB that was processed.
All registers except SS and SP have been saved on the stack.
Interrupts are disabled.
AL is 0FFh if called by IPX, or 0 if called by the Asynchronous Event
Scheduler.

All segment registers are in unknown states.
An ESR must return with an RETF instruction and interrupts disabled and
maintain the stack's integrity. It can call any XPX function except Close
Socket; reschedule itself through an AES call; and enable interrupts during
operation, as long as blocking is done against another event calling the same
function. Listing One shows my ESR implementation.
An application could conceivably have a separate ESR for every ECB; however,
that approach would be excessive. The ESR should be fairly general purpose and
should be approached with the same mind-set as an Interrupt Service Routine.
The In Use Flag field can reflect a variety of states as the ECB nears
completion. These state values are defined within my header file, NETWORK.H.
All definitions associated with the In Use Flag are prefixed with IU_. The
application can poll In Use to check on the current status of a particular ECB
or wait until XPX calls the ESR.
The Completion Code field is filled by XPX after the ECB has been processed.
The contents of this field have no meaning until the In Use Flag equals 0, so
the application should have no expectations of Completion Code until that
time. Many of the Novell-documented completion codes are defined within
NETWORK.H. All definitions associated with Completion Code are prefixed with
CC_.
The first fragment descriptor must point to either a header structure for an
IPX or SPX packet before the ECB can be passed to any XPX function. Both
packet types start with the same structure. The format of the IPX header is
shown in Table 2. The application needs to fill the IPX structure only if the
packet is being transmitted. If the packet is being used as a listener, XPX
fills in all the fields.
Table 2: Format for IPX header
Region Description
0-1 Checksum. Set by XPX to -1.
2-3 Length of the entire XPX packet, including all other fragments
 associated with the ECB. Filled by XPX.
4 Transport Control. Set to 0 by XPX.
5 Packet Type. Set by application to 4 for an IPX packet, or 5
 for an SPX packet if the packet is being sent (talker).
6-9 Destination Network ID. Set by application if XPX packet is
 being sent (talker). If set to 0, the current network is used
 regardless of its true ID.
10-15 Destination Node ID. Set by application if packet is being
 sent (talker). If set to all 0FFh's, the IPX packet will be
 sent to all IPX listeners on the network with an equal socket
 ID, including active listeners on transmitting node.
16-17 Destination Socket ID. Set by application if packet is being
 sent (talker). Socket ID must be opened before socket ID can
 be used.
18-21 Source Network ID. Set by XPX to the network ID of packet's
 source.
22-27 Source Node ID. Set by XPX to node ID of packet's source.
28-29 Source Socket ID. Set by XPX to socket ID of packet's source.

The SPX packet header is a superset of the IPX packet header; see Table 3.
None of the SPX-specific fields need to be filled in by the application for
either transmissions or receptions. These fields must be available for SPX
packets.
Table 3: The SPX packet header structure

Region Description

0-29 IPX header previously defined.

30 Connection Control. Used by SPX to control flow of data.

31 Data-stream Type. Information byte that can be used by
 application for any purpose. SPX reserves values 0FEh and 0FFh
 for its own use.

32-33 Source Connection ID. Connection number of source node for this
 SPX packet. Created by SPX for use by application.

34-35 Destination Connection ID. Connection number of destination
 node for this SPX packet. Created by SPX for use by
 application.

36-37 Sequence Number. Used by SPX to keep sequence of received and
 transmitted packets straight.

38-39 Acknowledge Number. Used by SPX to acknowledge receipt of a
 packet.

40-41 Allocation Number. Used by SPX to keep track of packets sent
 but not acknowledged.



The size of an SPX packet header is 12 bytes larger than the header for IPX
packets. Because the headers must be part of every XPX packet, the maximum
size of data that can be sent with an IPX packet is 546 bytes, and the maximum
for SPX is 534 bytes.


Simplicity Breeds Attempts


When I first began coding access functions, IPX demonstrated itself superior
to file- or pipe-access methods. Eventually, however, IPX began to lose
packets. I discovered that under a moderate transmission load, my application
spent too much time minding its business, while IPX-filled listening packets
weren't being serviced. Polling was not able to keep up with the data-transfer
demand. Novell's documentation confirmed this, stating that IPX yields about a
95 percent delivery rate (although I was never able to achieve this rate).
I first decided that if more than a few packets per second are being
transferred, an IPX communication needs ESR support to function effectively. I
then wrote a library of ESR-powered IPX functions. My experimentation yielded
a 100 percent delivery rate for over 100,000 packet transmissions. To find a
reasonable upper limit, I ran a program on four 80386 PCs on my Ethernet
network. Each computer sent and received 512-byte IPX packets continuously. I
didn't notice any packet loss until they reached about 70 packets/second/node
(280 packets/second, overall). At that point, the 16-MHz 386SX started to go
deaf.
After I implemented ESR support, I decided that the functions and data
structures were cumbersome. At an application level, I didn't want to have to
deal with the asynchronous approach. Because I was already using a file-based
approach, I figured that the entire IPX access would be best implemented as a
streams type of interface. With a streams implementation, analogous structures
became apparent. The full network-address structure (network ID, node ID, and
socket ID) became the "filename." Access flags could be used to define the
type of packet (listener or talker) and communication method (IPX or SPX). I
merged the ECB and a superset structure of the IPX/SPX header.
To keep track of the stream's operation, I created a structure called
XPX_STREAM_T, which is documented in the file NETWORK.H. XPX_STREAM_T allows
the programmer to open a channel to a node (or nodes, if it is a broadcast
channel) and perform reads, writes, and queries on that channel. Multiple XPX
packets are automatically allocated and provided for the stream's I/O.


Descriptions of Major Functions


The major functions written for the stream approach are XPX_INIT(),
XPX_OPEN(), XPX_READ(), XPX_WRITE(), XPX_CLOSE(), and IPX_READ_ORPHAN().
Except for XPX_INIT() and IPX_READ_ORPHAN(), the functions are used in a
fashion analogous to their file counterparts. This function library, a text
file that details the internals of IPX functions, my NETWORK.H, and the
TEST1.C sample program are available electronically; see page 3.
XPX_INIT() (see Listing Two) initializes the SPX internals and gets the entry
vector for accessing IPX. This function queries IPX for the application's node
ID and network address, which are placed in the global structure Our_Address.
After XPX is initialized, a single-socket ID is opened and dynamically
generated by IPX. This socket ID is placed in the Our_Address structure, and
it can be used as an application's private channel. The open socket is not
needed for the operation of any functions, and is provided only for the sake
of convenience.
XPX_OPEN() allows the application to open a communication channel to another
node on the network. The channel can be read only, write only, or read/write.
The node ID can be specific or a broadcast channel. A specific open will
receive packets only from the defined target-node ID. A broadcast channel will
accept packets from any node ID at that socket. There is no such thing as a
broadcast socket, so socket IDs are important. Any packet received by a node
from itself will be ignored. Note that an IPX stream cannot share the same
socket as an SPX stream, but mutiple IPX streams can share the same socket.
XPX_READ() allows the application to read the next received packet data
partially or completely. The read will only be made up to the end of the first
available packet. Data from multiple packets can only be read by successive
reads. Packets may only be read in the order they were received from the
source node.
XPX_WRITE() allows the application to write data to a stream. The data written
does not have to be of a particular size. XPX_WRITE() will packetize the data
and send the packets to XPX. XPX_WRITE() returns the number of bytes written.
If a done pointer is provided for IPX_WRITE(), IPX_WRITE() will set the flag
to a nonzero value, and the ESR will reset the flag to 0 when a packet has
been sent.
XPX_CLOSE() closes the communication channel and frees all data buffers
allocated by XPX_OPEN().
IPX_READ_ORPHAN() allows the application to read any packets sent to a socket
ID that do not match any open stream addresses. (Broadcast streams with read
ability match all nodes.) This function should be called periodically, so that
all the listening packets do not get used up by spurious receptions. If orphan
packets are not an issue, the application can set the global variable
_Ignore_Nomatch to a nonzero value.
Four macros have been defined within NETWORK.H to provide status information
for the application. XPX_ERROR_STATUS() returns the number of packet errors
that have occurred since the last call to IPX_INIT(). XPX_ORPHAN_STATUS()
returns the current number of unprocessed orphan packets. XPX_READ_STATUS()
returns the number of packets associated with a stream that have not been
read. XPX_WRITE_STATUS() returns the number of packets that are available to a
stream for transmission.


Conclusion


IPX is a high-performance protocol available on Novell Netware networks. IPX
adds a level of complexity and uncertainty to peer-to-peer communications, but
it can be fashioned into a useful resource. SPX is slightly slower, but
guarantees data delivery. Given the scope of this article, I have dwelled upon
IPX and only touched upon SPX. I leave these functions as a foundation on
which you can build.
DDJ


_IPX: THE GREAT COMMUNICATOR_
by Rahner James


[LISTING ONE]

; ***************************************************************************
; * Title: ESRS.ASM -- by Rahner James
; * Copyright (c) January 1991, Ryu Consulting, 916/722-1939
; * File contains default Event Service Routine for listening & talking
packets
; ****************************************************************************

_ESRS_ASM_ equ 1
ifdef LARGEMODEL
 .model large,c
else
 .model small,c
endif
ADDRESS_S struct
 network dw ?,? ; Network number
 node dw 3 dup(?); Node address on that network
 socket dw ? ; Socket number on that node
ADDRESS_S ends
IPX_PACKET_S struct
 next dd ? ; Used by IPX/SPX when the ECB is active
 function dd ? ; Called after packet sent/recd, called ESR
 in_use db ? ; Set to !0 by IPX/SPX when packet is in use

 completion_code db ? ; Set by XPX after packet task is complete
 socket dw ? ; Socket to use for this ECB
 IPX_work dd ? ; Workspace used internally by IPX
 driver_work dd ?,?,? ; Workspace used internally by IPX driver
 dest_address db 6 dup(?); Destination address for packet
 fragment_count dw ? ; Fragments descriptors that follow
 hdr dd ? ; -> IPX/SPX packet descriptor to use
 size_hdr dw ? ; Size of the IPX(30) or SPX(42) descriptor
 buffer_ptr dd ? ; -> data buffer to use for transmission/reception
 buffer_size dw ? ; Number of bytes in that buffer
 next_allocated dd ? ; -> next allocated packet structure
 next_sibling dd ? ; -> next packet for stream and condition
 parent dd ? ; -> parent stream definition packet
 default_buffer dd ? ; -> default buffer to use for IPX or SPX
 default_size dw ? ; Size of the default buffer
 done_flag dd ? ; Set by the ESR with the completion code

 checksum dw ? ; Dummy checksum of 30-byte packet header
 packet_length dw ? ; Length of complete IPX packet
 control db ? ; Transport control byte for internet bridges
 packet_type db ? ; Packet type: IPX(4)/SPX(5)

 dest_network dd ? ; Destination network address
 dest_node db 6 dup(?); Destination node address
 dest_socket dw ? ; Destination socket

 src_network dd ? ; Source network address
 src_node db 6 dup(?); Source node address
 src_socket dw ? ; Source socket
IPX_PACKET_S ends
XPX_STREAM_S struct
 next dd ? ; -> next stream structure opened
 first_allocated dd ? ; -> first allocated packet for handle
 last_allocated dd ? ; -> last allocated packet for handle
 first_unread dd ? ; -> first unread packet
 last_unread dd ? ; -> last unread packet in the list
 first_free dd ? ; -> first packet available for talking
 first_error dd ? ; -> first packet encountering an error
 last_error dd ? ; -> last packet encountering an error

 dest_network dd ? ; Destination network address
 dest_node db 6 dup(?); Destination node address
 dest_socket dw ? ; Destination socket
 local_target db 6 dup(?); Node address of local target for dest
 connection_ID dw ? ; Connection ID used for SPX
 total_talkers dw ? ; Number of talkers for this stream
 total_listeners dw ? ; Number of listeners for this stream
 unread_count dw ? ; Number of packets unread by app
 free_count dw ? ; Number of packets ready for talking
 maximum_unread dw ? ; Maximum number of unread packets
 error_count dw ? ; Number of unprocessed error packets

 total_transmissions dd ? ; Number of transmissions performed
 total_receptions dd ? ; Number of receptions performed
 total_errors dd ? ; Number of errors encountered
XPX_STREAM_S ends
 .data
 extern _Ignore_Nomatch:byte, _Our_Address:word, IPX_Vector:dword,
 _First_Stream:dword

 extern _First_Nomatch:dword, _Last_Nomatch:dword, _Total_Nomatchs:word
 .code
Last_Broad_Ptr dw 0,0 ; -> last checked broadcast stream

; ****************************************************************************
; * void far TALK_ESR( void ) -- Event Service Routine (ESR) for IPX
; * functions and their talking packets
; * Given: AL = 0 if AES called this ESR, 0xff if this is a normal event
; * ES:SI -> ECB that just finished talking
; * Returns: Packet either glued onto the free list or the error list
; * Note: Interrupts are enabled at this point and should stay that way
; ****************************************************************************
talk_esr proc far
; * See if we need to set the done flag
 lds bx, es:[si].IPX_PACKET_S.done_flag ; DS:BX -> process done flag
 mov cl, es:[si].IPX_PACKET_S.completion_code
 mov ax, ds
 or ax, bx
 jz @F ; If DS:BX -> NULL, just skip it
 mov [bx], cl ; Set the flag with our completion code
 mov word ptr es:[si].IPX_PACKET_S.done_flag, 0 ; Make it NULL
 mov word ptr es:[si].IPX_PACKET_S.done_flag+2, 0
; * Check whether the packet goes in the error list or the free list
@@: lds bx, es:[si].IPX_PACKET_S.parent ; DS:BX -> parent structure
 or cl, cl ; See if we got a transmission error
 jnz talk20_esr ; Jump if we got one
; * Here's where we process the good transmissions
 add word ptr [bx].XPX_STREAM_S.total_transmissions, 1
 adc word ptr [bx].XPX_STREAM_S.total_transmissions+2, 0
 inc [bx].XPX_STREAM_S.free_count;
 mov cx, word ptr [bx].XPX_STREAM_S.first_free ; DX:CX -> first free
 mov dx, word ptr [bx].XPX_STREAM_S.first_free+2
 mov word ptr [bx].XPX_STREAM_S.first_free, si
 mov word ptr [bx].XPX_STREAM_S.first_free+2, es
 mov word ptr es:[si].IPX_PACKET_S.next_sibling, cx
 mov word ptr es:[si].IPX_PACKET_S.next_sibling+2, dx
talk10_esr:
 ret
; * Here's where we take care of our challenged packets
talk20_esr:
 add word ptr [bx].XPX_STREAM_S.total_errors, 1
 adc word ptr [bx].XPX_STREAM_S.total_errors+2, 0
 inc [bx].XPX_STREAM_S.error_count
 mov cx, word ptr [bx].XPX_STREAM_S.last_error ; DX:CX ->last error
 mov dx, word ptr [bx].XPX_STREAM_S.last_error+2
 mov word ptr [bx].XPX_STREAM_S.last_error, si ; Set new last error
 mov word ptr [bx].XPX_STREAM_S.last_error+2, es
 mov word ptr es:[si].IPX_PACKET_S.next_sibling, 0
 mov word ptr es:[si].IPX_PACKET_S.next_sibling+2, 0
 mov ax, cx ; See if we need to do the first as well
 or ax, dx
 jnz @F
 mov word ptr [bx].XPX_STREAM_S.first_error, si ;Set new first error
 mov word ptr [bx].XPX_STREAM_S.first_error+2, es
 ret
@@: mov ds, dx ; DS:BX -> the first born
 mov bx, cx
 mov word ptr [bx].IPX_PACKET_S.next_sibling, si ; Point old end
 mov word ptr [bx].IPX_PACKET_S.next_sibling+2, es

 ret
talk_esr endp

; ****************************************************************************
; * void far LISTEN_ESR( void ) -- Event Service Routine (ESR) for IPX
; * functions and their listening packets
; * Given: AL = 0 if AES called this ESR, 0xff if this is a normal event
; * ES:SI -> ECB that just got something
; * Returns: Packet put at the end of the unread packet list of the stream it
; * was intended for.
; * Note: Interrupts are enabled at this point and should stay that way. This
; * packet may not be put with its parent if there are multiple parents
; * associated with one socket
; ****************************************************************************
listen_esr proc far
 mov ax, @Data ; DS = our data segment
 mov ds, ax
; * First see if we sent it as a broadcast and it got back to us
 mov ax, _Our_Address.ADDRESS_S.node+4
 cmp word ptr es:[si].IPX_PACKET_S.src_node+4, ax
 jne listen10_esr
 mov ax, _Our_Address.ADDRESS_S.node+2
 cmp word ptr es:[si].IPX_PACKET_S.src_node+2, ax
 jne listen10_esr
 mov ax, _Our_Address.ADDRESS_S.node
 cmp word ptr es:[si].IPX_PACKET_S.src_node, ax
 jne listen10_esr
 mov ax, _Our_Address.ADDRESS_S.network+2
 cmp word ptr es:[si].IPX_PACKET_S.src_network+2, ax
 jne listen10_esr
 mov ax, _Our_Address.ADDRESS_S.network
 cmp word ptr es:[si].IPX_PACKET_S.src_network, ax
 jne listen10_esr
listen_again_buckwheat:
 mov bx, 4 ; BX = IPX Listen For Packet command
 call dword ptr IPX_Vector ; Call the IPX function
 ret
; * See if we need to set the done flag
listen10_esr:
 mov ax, es:[si].IPX_PACKET_S.packet_length ; Change format
 xchg ah, al
 sub ax, es:[si].IPX_PACKET_S.size_hdr
 mov es:[si].IPX_PACKET_S.packet_length, ax

 lds bx, es:[si].IPX_PACKET_S.done_flag ; DS:BX -> process done flag
 mov cl, es:[si].IPX_PACKET_S.completion_code
 mov ax, ds
 or ax, bx
 jz listen20_esr
 mov [bx], cl
 mov word ptr es:[si].IPX_PACKET_S.done_flag, 0
 mov word ptr es:[si].IPX_PACKET_S.done_flag+2, 0
; * Check whether the packet goes in the error list or the free list
listen20_esr:
 lds bx, es:[si].IPX_PACKET_S.parent ; DS:BX -> parent structure
 or cl, cl ; See if we got a reception error
 jz listen40_esr ; Jump if we have an unimpaired reception
; * Here's where we take care of our datistically challenged packets
 add word ptr [bx].XPX_STREAM_S.total_errors, 1

 adc word ptr [bx].XPX_STREAM_S.total_errors+2, 0
 inc [bx].XPX_STREAM_S.error_count
 mov cx, word ptr [bx].XPX_STREAM_S.last_error ; DX:CX->last error
 mov dx, word ptr [bx].XPX_STREAM_S.last_error+2
 mov word ptr [bx].XPX_STREAM_S.last_error, si ; Set new last error
 mov word ptr [bx].XPX_STREAM_S.last_error+2, es
 mov word ptr es:[si].IPX_PACKET_S.next_sibling, 0
 mov word ptr es:[si].IPX_PACKET_S.next_sibling+2, 0
 mov ax, dx ; See if we are the only packet here
 or ax, cx
 jnz @F ; Skip out of this ESR if we are not alone
 mov word ptr [bx].XPX_STREAM_S.first_error, si ;Set new first error
 mov word ptr [bx].XPX_STREAM_S.first_error+2, es
 ret
@@: mov ds, dx ; DS:BX -> the first born
 mov bx, cx
 mov word ptr [bx].IPX_PACKET_S.next_sibling, si
 mov word ptr [bx].IPX_PACKET_S.next_sibling+2, es
 ret
; * Here's where we process the good transmissions
listen40_esr:
 push ds ; DX:CX -> the first stream structure
 mov ax, @Data
 mov ds, ax
 mov cx, word ptr _First_Stream
 mov dx, word ptr _First_Stream+2
 pop ds
 mov Last_Broad_Ptr, 0
 mov Last_Broad_Ptr+2, 0
listen50_esr:
 cmp [bx].XPX_STREAM_S.total_listeners, 0 ; See if READ ONLY stream
 jz not_parent ; Skip this one if it is READ ONLY
 mov ax, word ptr [bx].XPX_STREAM_S.dest_node
 and ax, word ptr [bx].XPX_STREAM_S.dest_node+2
 and ax, word ptr [bx].XPX_STREAM_S.dest_node+4
 inc ax
 jnz @F ; Skip if it is not a broadcast type
 mov Last_Broad_Ptr, bx ; Save this for later
 mov Last_Broad_Ptr+2, ds
 jmp short not_parent ; Still not necessarily the right one
@@: mov ax, word ptr [bx].XPX_STREAM_S.local_target+4 ; Match parent
 cmp word ptr es:[si].IPX_PACKET_S.src_node+4, ax
 jne not_parent
 mov ax, word ptr [bx].XPX_STREAM_S.local_target+2
 cmp word ptr es:[si].IPX_PACKET_S.src_node+2, ax
 jne not_parent
 mov ax, word ptr [bx].XPX_STREAM_S.local_target
 cmp word ptr es:[si].IPX_PACKET_S.src_node, ax
 jne not_parent
 mov ax, word ptr es:[si].IPX_PACKET_S.src_network+2
 cmp word ptr es:[si].IPX_PACKET_S.dest_network+2, ax
 jne not_parent
 mov ax, word ptr es:[si].IPX_PACKET_S.src_network
 cmp word ptr es:[si].IPX_PACKET_S.dest_network, ax
 je found_listener
; * At this point, the current structure has been determined not to be
suitable
not_parent:
 mov ax, cx ; See if we are at the end of our rope
 or ax, dx

 jz no_listener ; No stream match found

 mov ds, dx ; DS:BX -> next stream definition
 mov bx, cx
 mov cx, word ptr [bx].XPX_STREAM_S.next ; DX:CX -> next stream
 mov dx, word ptr [bx].XPX_STREAM_S.next+2
 jmp listen50_esr ; Loop until we poop
; * Stream ID matches up with destination address, so add packet to stream
list
found_listener:
 add word ptr [bx].XPX_STREAM_S.total_receptions, 1
 adc word ptr [bx].XPX_STREAM_S.total_receptions+2, 0
 inc [bx].XPX_STREAM_S.unread_count
 mov ax, [bx].XPX_STREAM_S.unread_count ; Update our statistics
 cmp [bx].XPX_STREAM_S.maximum_unread, ax
 jnc found10_listener ; Skip if no need to update
 mov [bx].XPX_STREAM_S.maximum_unread, ax
found10_listener:
 mov cx, word ptr [bx].XPX_STREAM_S.last_unread
 mov dx, word ptr [bx].XPX_STREAM_S.last_unread+2
 mov word ptr [bx].XPX_STREAM_S.last_unread, si
 mov word ptr [bx].XPX_STREAM_S.last_unread+2, es
 mov word ptr es:[si].IPX_PACKET_S.next_sibling, 0
 mov word ptr es:[si].IPX_PACKET_S.next_sibling+2, 0
 mov ax, cx ; See if we need to do the first as well
 or ax, dx
 jnz @F ; All done if there are others
 mov word ptr [bx].XPX_STREAM_S.first_unread, si
 mov word ptr [bx].XPX_STREAM_S.first_unread+2, es
 ret
@@: mov ds, dx ; DS:BX -> the first born
 mov bx, cx
 mov word ptr [bx].IPX_PACKET_S.next_sibling, si
 mov word ptr [bx].IPX_PACKET_S.next_sibling+2, es
 ret
; * At this point, packet is an orphan and must be sent off to farm or be glue
no_listener:
 lds bx, dword ptr Last_Broad_Ptr ; DS:BX->last broadcast stream
 mov ax, ds
 or ax, bx
 jnz found_listener ; If one was found, use as last resort
 cmp _Ignore_Nomatch, al ; Ignore orphans or adopt
 jmp listen_again_buckwheat ; Put back on the mountaintop
no10_listener:
 mov ax, @Data ; DS = our most lovable data segment
 mov ds, ax
 inc _Total_Nomatchs
 mov cx, word ptr _Last_Nomatch ; DX:CX -> last error packet
 mov dx, word ptr _Last_Nomatch+2
 mov word ptr _Last_Nomatch, si ; Make it point to us
 mov word ptr _Last_Nomatch+2, es
 mov word ptr es:[si].IPX_PACKET_S.next_sibling, 0
 mov word ptr es:[si].IPX_PACKET_S.next_sibling+2, 0
 mov ax, cx ; See if we need to do the first as well
 or ax, dx
 jnz @F ; All done if there are others
 mov word ptr _First_Nomatch, si ; Set us as the new first error
 mov word ptr _First_Nomatch+2, es
 ret
@@: mov ds, dx ; DS:BX -> the first born

 mov bx, cx
 mov word ptr [bx].IPX_PACKET_S.next_sibling, si
 mov word ptr [bx].IPX_PACKET_S.next_sibling+2, es
 ret
listen_esr endp
 end





[LISTING TWO]

; ****************************************************************************
; * Title: XPX_INIT.ASM -- Rahner James
; * Copyright (c) December 1991, Ryu Consulting, 916/722-1939
; * File contains all the functions to support initializing IPX engine
; ****************************************************************************

_XPX_INIT_ASM_ equ 1
 include network.inc
 .data
 public IPX_Vector, _Socket_Life, _SPX_Version, _SPX_Max_Connections,
 _Our_Address
 public _SPX_Available_Connections, _SPX_Retry_Count

IPX_Vector dw offset dummy_IPX_function,@Code ; -> IPX support function
_Socket_Life db 0 ; 0=socket closed at app termination
 ; 0ffh= socket closed when requested
_SPX_Version dw 0 ; SPX version #: MSByte=major, LSByte=minor
_SPX_Max_Connections dw 0 ; Max number of SPX connections
_SPX_Available_Connections dw 0 ; # of SPX connections available to app
_SPX_Retry_Count db 0 ; Retry count for SPX establish connection
_SPX_Bowser_Flag db 1 ; Watchdog flag, 0=disable, 1=enable
_Our_Address label dword ; Global access for this structure
network dd 0 ; Network address
node db 6 dup(0) ; Node address
socket dw 0 ; Socket number

 .code
; ****************************************************************************
; * int XPX_INIT( us SOCKET_NUMBER ) -- Initializes all IPX/SPX internals
; * Given: SOCKET_NUMBER = socket number to open for listening, 0 opens
; * the next available
; * Returns: 0 if IPX was initialized successfully
; * -1 = socket already open (!)
; * -2 = socket table full
; * -3 = IPX or SPX is not installed
; * Note: Initializes IPX vector, opens a listening socket for IPX driver.
; * Internal vectors, counter, & pointers are brought to initial conditions
; ****************************************************************************
xpx_init proc uses di si, socket_number:word
 mov ax, 7a00h ; Get the IPX vector
 int 2fh ; Query the DOS multiplexer
 inc al ; AL = 0ffh if IPX is there
 jnz derr_xpx_init ; Quit in disgrace if it's not there
 mov IPX_Vector, di ; IPX function vector returned in ES:DI
 mov IPX_Vector+2, es
; * See if we need to close the old stuff down

 mov dx, socket
 or dx, dx ; See if we opened a socket
 jz xpx10_init ; Skip if we didn't
 cmp socket_number, 0
 jz xpx20_init ; Skip if so
 cmp socket_number, dx ; See if it's the same as before
 je xpx20_init ; Skip if it is
 IPX 1 ; IPX Close Socket command
; * Now, open the socket
xpx10_init:
 mov dx, socket_number
 mov al, _Socket_Life
 IPX_CHECK 0 ; IPX Open Socket command
 jnz done_xpx_init ; Quit if an error
xpx20_init:
 mov socket, dx
; * Get our internetwork address
 mov ax, ds ; ES = DS
 mov es, ax
 mov si, offset network
 mov di, si
 IPX 9 ; IPX Get Internetwork Address command
; * Last, we have to initialize the SPX interface
 xor ax, ax
 IPX_CHECK 10h ; SPX Initialize command
 jz derr_xpx_init ; Quit if it's not there
 mov _SPX_Version, bx ; Save the information returned
 mov _SPX_Max_Connections, cx
 mov _SPX_Available_Connections, dx
 xor ax, ax ; Good return
 jmp short done_xpx_init
derr_xpx_init:
 mov al, -3 ; It's gone McCreedy!
done_xpx_init:
 cbw
 ret
xpx_init endp

; ****************************************************************************
; * int DUMMY_IPX_FUNCTION( void ) -- Dummy function that returns error
; * code -10 so that system will not hang if not initialized
; * Given: nothing
; * Returns: -10 always
; ****************************************************************************
dummy_IPX_function proc far
 mov ax, -10
 ret
dummy_IPX_function endp
 end



Example 1:

 mov ax, 7a00h ; Function 7Ah, AL = 0
 int 2fh ; MS-DOS Multiplex interrupt
 ; Returns with AL == 0FFh if xPX
 ; exists and ES:DI == xPX vector


 inc al ; Set ZERO if AL == -1
 jnz outta_here ; Quit if xPX isn't around
 mov IPX_Vector, di ; Save the xPX entry vector
 mov IPX_Vector+2, es


























































May, 1992
PORTING UNIX TO THE 386: MISSING PIECES, PART 1


Completing the 386BSD kernel




William Frederick Jolitz and Lynne Greer Jolitz


Bill was the principal developer of 2.8 and 2.9BSD and was the chief architect
of National Semiconductor's GENIX project, the first virtual-memory,
microprocessor-based UNIX system. Prior to establishing TeleMuse, a market
research firm, Lynne was vice president of marketing at Symmetric Computer
Systems. They conduct seminars on BSD, ISDN, and TCP/IP. Send e-mail questions
or comments to ljolitz@cardio.ucsf.edu. (c) 1992 TeleMuse.


When we began the 386BSD project in 1989, 386BSD was simply intended to be a
port of BSD to the 386. Our purpose in doing 386BSD was so students, faculty,
staff, and researchers could use BSD on a simple and inexpensive platform.
While we did not wish to add to anyone's proprietary license revenues by
folding in new encumbered code (especially pertaining to the 386), removing or
redesigning new code to replace old encumbered code was out of the scope of
this project. And since we were the only ones willing to work gratis on
386BSD, making an unencumbered version was impossible. After we contributed
386BSD to the University of California at Berkeley (UCB) in December 1990, the
UCB staff seriously began to set their sights on releasing only unencumbered
code. As you might expect, it was quite a chore to continually update 386BSD
and revise it, matching the work done by UCB staff. The result was the
University of California Berkeley Networking Software, Release 2 ("NET/2").
In a break with past installments, we've taken the unencumbered but incomplete
NET/2 kernel and finished the missing pieces necessary to make a bootable
running kernel that provides a self-supporting development environment. In
order to make the complete 386BSD kernel, we had to complete some code that
was not available when UCB composed the NET/2 tape. Some of these areas, such
as execve(), were simply not available at the time, while others (clists,
resource maps, buffer cache) were based on obsolete portions of the system due
to be replaced by more modern facilities (such as streams and the page cache).
In fact, we have found that many of these "new" facilities are still quite far
down the road.
We needed to create these "missing pieces," so we used the NET/2 system itself
to evolve the eventual replacements for these same pieces. To replace the
missing clists, we chose to design ring buffers to function in their stead. In
place of the missing resource maps, we invented a more flexible mechanism
called a resource list that exploits the dynamic memory-allocation mechanism
already present in the NET/2 kernel. For the buffer-based I/O mechanism and
program-execution system call, we relied on reference materials to design a
suitable substitute to serve us while we slowly work on our own, newer version
of a page cache which utilizes a completely different approach.
While we still intend to emphasize innovative work, we are constrained within
the confines of the present. Thus, we have chosen the shortest path to
completion of the 386BSD goal, by implementing facilities that will dock to
the rest of the NET/2 kernel source with minimal change. In fact, the source
code presented over the next two installments--plus a small set of bug fixes
and a recent copy of the NET/2 tape (available via M&T Online)--will enable
you to build an operational kernel. Before you go running for the phone,
please realize that besides a kernel, you need bootstraps (DDJ, February
1991), binaries of the utilities (DDJ, March, April, May 1991), a root file
system (DDJ, July, August, September, and October 1991), an installation
mechanism (DDJ, February and May 1991 as well as March and April 1992), and
documentation (DDJ, January through November 1991, and February 1992 onward)
before making 386BSD a real bootable system.
Readers who have followed this series now have enough material and information
to put in the elbow grease and finish it on their own. For those with less
patience, we hope to follow this up shortly with a real 386BSD binary that can
be put on a PC without major travail.


386BSD Kernel Completion Methodology


The methodology we followed to complete the 386BSD NET/2 kernel was critical
to our success. We began with an examination of the documentation that
described how each of these missing facilities managed to function, and then
reviewed the interface structure within the NET/2 kernel. From this
information, we created a model of semantics for each of the missing
facilities. Among the references we found useful were Maurice Bach's The
Design of the UNIX Operating System, Tanenbaum's MINIX series, the BSD book,
the UNIX System V Programmers Reference Manual, Knuth's The Art of Computer
Programming, as well as selected readings from USENIX Technical Proceedings
over the years. Comparing these references is interesting because they all
contain different perspectives which "color" the puzzle piece slightly
differently. Using these reference materials, we wrote the first versions of
these modules from scratch. Then, by examining functional entry points that
called the prototype version, we discovered weaknesses in the original
assumptions--some of which required significant revision. This approach also
allowed us to construct a realistic test program around the intended kernel
code that facilitated developing and debugging code in a user-process
"framework."
Although modern kernel debuggers and other development tools are now
ubiquitous, isolated debugging is still valuable because it allows you to
circumscribe the problem more efficiently. For example, a number of bugs in
the NET/2 kernel which had been masked by the older missing code were
discovered by this process. The independent validation that isolated,
user-mode development provides is a valuable tool for pinpointing problem
areas in the NET/2 code as well as in the new software.
Once a rough version was made to work in a user framework, the complex cases
were tested to localize implementation problems. It's said that 90 percent of
your bugs come from less than 1 percent of your source code. However, you can
usually guess where the trouble will strike, and that's exactly where you
target your test vectors. Unfortunately, many programmers shy away from this
procedure, preferring to visually "inspect" the code instead of doing the
"acid test." For example, all the bugs in the ring-buffer code were located in
the boundary-crossing cases on normal and inverse functions. There were
analogous cases with the contiguous GET/PUT operations, so these too needed to
be examined at the boundaries. In one case, we made the ring buffers absurdly
short to exacerbate the problems on the boundaries, and in doing so, we
exposed other inadequacies as well.
Next, we had to contend with the environmental and interface demands of the
kernel. No user program framework can hope to simulate the interrupt-driven,
context-switched, race-prone environment of a basic friendly kernel (with the
result that the system wedges a lot). Unlike Mach, which hopes to export this
environment to user processes under the guise of kernelizing the system (the
system still wedges, but the problems occur in the user environment, making it
more difficult to localize the problem), we prefer that the kernel environment
not be a part of the isolated test framework. When we step into the kernel,
it's an all-or-nothing proposition, because the mechanisms interact greatly.
This complexity of interaction is always present, whether it stays in the
kernel or gets exported to a user process.
The methods through which we regulate the introduction of new code into the
kernel are the key to ensuring that code's proper operation. We leverage our
understanding of the entry points in order to examine and track the actual
requests passed to our new code and compare them against what our user-mode
test model does.
In a sense, we end up turning our debugging process inside out. Instead of
producing cases to perturb the inner workings of the new code, we debug the
interfaces and the outer procedures that call it. We do this methodically,
testing the boundary cases of each as we go and looking for unexpected
assumptions about the semantics of the new code.
In the case of the buffer cache, this method caught significant "gotchas."
Much of the semantics of the NET/2 buffer cache are implied by the surrounding
code. (For example, the rescaling of buffer size, buffer invalidation, and
forcing to back store are all intertwined with the virtual file system and
subsidiary file-system layers.) This could, in part, explain why it has
appeared so difficult to replace the obsolete buffer cache. Its semantics are
spread all over the map!
With our methodology in place, we began to finish off the missing pieces,
arriving, ultimately, at 386BSD Unbound.


386BSD Resource Lists


One part missing from the NET/2 386BSD kernel is the facility for dense
storage, or region allocation, known in Berkeley UNIX vernacular as the
resource maps. Resource maps were created in 4BSD as a generalization of the
"core click" physical-memory allocator found in the original Version 6 Bell
Laboratories UNIX for the PDP-11. They were widely used in the older 4BSD
virtual-memory system. However, in the NET/2 kernel's virtual-memory system,
they are only used to allocate contiguous hunks of swap space to contain
swapped-out processes. (Incidentally, the term "map" here is misleading, as
this has nothing to do with the virtual-memory system's use of the term "map"
to describe using the processor's address-translation hardware.)
Resource maps work by describing allocatable segments as a two-tuple (index,
size). These two-tuples are stored in a contiguous array of fixed size. As
allocations are made from the map, the segments fragment and take up
increasing space in the array. When fragments are logically returned to the
map, the free() procedure glues the fragments back together and attempts to
shrink space in the array. If the array is large enough, the worst possible
fragmentation cannot exceed the size of the resource map.
Resource maps, while elegant, compact, and quick for PDP-11 memory allocation,
have some annoying drawbacks. To make a fast implementation, the "0th" index
is not usable, because it is indistinguishable from "nothing" on the list (in
other words, it's used as a sentinel), and the entry that corresponds to it is
used to hold the upper-bounds limit and name of the given resource map
instance. As a result, the caller either needs to relocate the range above 0
before handing it to the resource map routines or discard the first allocation
unit (the "0th" index). In addition, the size and extent of the resource map
is fixed at initialization time and is unalterable, so storage for the map
must be reserved for the "worst-case size." If you guess wrong at worst case,
you're screwed!
In many early UNIX systems, some kernels never bounds-checked these arrays at
all, and would merrily scribble all over the next adjacent memory locations
after the map! The system would then run for a period of time afterward, and,
when the inevitable crash occurred, it appeared to come from an unrelated
portion of the system. Of course, by that time, the map had become less
fragmented, and its contents showed no irregularity; in other words, an almost
"self-healing" bug. Those who had discovered this clever trick protected the
map with a "bounds check" which caused a system panic to occur if the map
fragmented outside the array. After a while, it became tedious to recompile
the kernel with greater and greater map sizes, so instead of panicking, the
resource map allocator would just "drop" the fragment. This approach had
humorous side effects on large time-sharing systems, when the fragment dropped
turned out to be something important, such as half of all available swap
space, because large fragments tend to collect on the end of the list.
This static mechanism has been tolerated for so long partly because it's been
used as the bottom-level storage allocator, and allowing it to use dynamic
allocation was considered unwise. (For example, what if it fragmented when low
on memory?)
Many design decisions in the kernel change when dynamic memory allocation
becomes available, so we replaced the fixed allocation resource maps with
resource lists built out of a pay-as-you-go dynamic memory-allocation scheme.


Resource Lists Defined


Resource lists (see Listing One) are arbitrary-length lists, each element of
which describes a segment as an inclusive [start, end] two-tuple. The list
elements are kept in a sorted order, with fragmenting entries causing
spontaneous new entries to be allocated (via malloc). As segments are freed,
allowing holes to be filled and fragments to be reassembled, adjacent entries
are reduced to single ones, and the now superfluous list entries are freed and
returned to the dynamic memory allocator. Thus, no loss of storage need occur.
The price we pay is the added cost of dynamic allocation. which is generally
small compared to the number of times our resource lists are used.
Other advantages of resource lists stem from their dynamic nature. There is no
initialize function, only allocate and free entry calls, because
initialization is just passing free space to an otherwise empty list, in any
order and at any time. This allows us, for example, to add additional swap
space without having to reinitialize the resource map nor reserve space ahead
of time. Also, because the full dynamic range is present, segments starting
and ending at any point in a 32-bit number's range can be used.
rlist_alloc(). The resource list allocate function (see Listing Two) traipses
down a linked list looking for a large enough region from which to allocate.
In doing this, it uses a single doubly indirect pointer to check for an
unallocated entry (a null rlist pointer) as well as hold a pointer to the
forward link (in case we need to restructure the list). When we find an entry
of the appropriate size, we optionally pass it back it's location and reduce
the size of the resource-list entry. (The caller might not want it, but this
ensures it won't be allocated by others.) If we reduce the size to the point
that the element is empty, we free the list-entry space and rewire the
previous list's pointer to the succeeding entry (if one exists).
rlist_free(). The resource list free function (see Listing Two) is beefier, as
it must glue the fragments back together and attempt to simplify them (that
is, represent the fewest list entries). This routine is actually attempting to
reverse the damage from the numerous unordered allocations and frees prior to
being called. Like rlist_alloc(), it walks the list with a doubly indirect
pointer, but it searches for a list element it can merge into or a point in
the sorted list where it can be inserted. If a merge occurs, this function
scans the entire list, trying to reduce adjacent entries that can be merged as
the result of a hole being filled.


Program Execution Function



Another missing piece from the NET/2 kernel involves execution of a program
from a file, a critical part of any system. At some point, we must load a file
of binary 386 instructions and execute them. In many UNIX implementations,
this is one of the most complicated system calls in the entire kernel.


Executable File Format


The executable file formats to choose from on the 386 include COEFF, ELF,
ROSE, X.OUT, A.OUT, and variants. All have different advantages and adherents,
and even Intel's BCS2 (Binary Compatibility Standard) does not give a single
conclusive choice for an executable format. Unlike MS-DOS, where an .EXE file
is the same wherever you go, UNIX compatibility is less certain.
To make a basic functional system, we implemented a single executable format.
It needed to be one our GNU linkage editor already generated, so we chose the
type 413 style of A.OUT, the original executable file format of UNIX, named
for the octal value of the magic number leading the header in front of the
executable file. (A.OUT is short for Assembler OUTput file. With no other
arguments, a UNIX assembler drops its object file into "a.out" in the current
directory. On the PDP 11, such files were directly executable by the system's
exec(), and would work if they did not have any undefined external references
that the loader/link editor would need to satisfy from other object files or
libraries.)
This format was originally created in 3BSD for the VAX, specifically to allow
use of paging. Its prevalence is due mainly to it being around for a long time
and luckily for us, it is perpetuated in the GNU loader. The format consists
of a "page-cluster" sized header, followed by pages of instruction space (to
be marked for "read-only" access) and ending with pages of initialized data
(to be marked for "read/write" access).
A page cluster is a group of pages logically considered to be a single page.
On a VAX, with a tiny page size of 512 bytes, clusters of 1 Kbyte reduced
paging traffic at the expense of increased fragmentation loss. With the 386's
4Kbyte pages, we chose a cluster size of 1, because the pages are of adequate
size. Alignment isn't a bad thing, especially at a sector (or block)
granularity level, so a page fault can be satisfied with an integral number of
contiguous disk transfers. However, this results in some wasted space. An
alternative is to put the header either at the rear of the file (where it
becomes a trailer) or include the header as part of the address space mapped
by the ffle (the first words of instruction space). Type 407, the original
UNIX executable, worked this way because octal 400 was the jump instruction,
and an offset of 7 caused it to jump over the eight-word header to the first
instruction following the header!
As on the VAX (see Listing Three), we assume that instructions are to be
mapped to virtual address 0 on up to the end of text (a_text), where the data
pages start and continue, including so-called "BSS pages" (uninitialized data
or a_bss). When the system runs the program, it assigns (somewhere) and
creates a stack segment, packs it with arguments, and enters the program in
user mode at a location (a_entry) designated in the header. This type 413 is
the predominate executable format on Berkeley UNIX systems, as it allows for
sharing of the read-only pages and protection of user instructions from
accidental or deliberate modification.
For the purposes of basic operation, type 413 executable file format is
sufficient. It's far from ideal, however, because the 4-Kbyte header page on
the front wastes space. Another rub is that location 0 is mapped with a valid
page, and at times we would like to detect access to the uninitialized
pointers that frequently appear as NULL pointers. Such pointers can
conceivably point to large structures, so the actual illegal reference we want
to trap might not occur at 0, but "near" it (perhaps within 64 Kbytes).
Therefore, we might need to avoid putting anything at the bottom of the
virtual address space, which conflicts with the way type 413 defines its
instruction segment to work. On the 386 not only could a user program have
NULL pointers to be caught, but the kernel could as well, because they share
the same virtual address space.
Additionally, our new kernel uses dynamic memory allocation, and a common
problem with this occurs when an unintended reference is made to a (stale)
pointer to a freed (and frequently reassigned) memory region. Such areas are
often cleared (set to 0) before use! The upshot of this is that kernel NULL
pointers are a common problem made more difficult to diagnose because 0 is
mapped, thus masking the problem.
Executable file format also impacts the development of a method for sharing
commonly referenced code among files. "Shared-object libraries" reduce the
amount of disk space taken up by executable files by making one physical copy
available to many programs simultaneously. The copy is stored separately from
all the interlinked executables. This can result in considerable space
savings, especially by the multimegabyte X libraries and toolkits. Shared
libraries (and potentially dynamic linking) also provide mechanisms to manage
the ever-growing complexity of modern software, by exploiting it in an
object-oriented fashion.
Given these demands, it's tempting to create yet another executable file
format, but this must be considered carefully, as it could affect future
editions of 386B5D.


Next Month


In next month's installment, we'll implement a "bare-bones" execve() system
call that allows 386BSD to provide basic operation, a block I/O buffer cache
used to reduce the cost of UNIX file operations, and ring buffers that reduce
the cost of tty-character buffer management.
DDJ


_PORTING UNIX TO THE 386: MISSING PIECES, PART I_
by William Frederick Jolitz and Lynne Greer Jolitz


[LISTING ONE]

/* Copyright (c) 1992 William Jolitz. All rights reserved.
 * Written by William Jolitz 1/92
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions are met:
 * 1. Redistributions of source code must retain the above copyright notice,
 * this list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright
notice,
 * this list of conditions and the following disclaimer in the documentation
 * and/or other materials provided with the distribution.
 * 3. All advertising materials mentioning features or use of this software
 * must display the following acknowledgement: This software is a component
 * of "386BSD" developed by William F. Jolitz, TeleMuse.
 * 4. Neither the name of the developer nor the name "386BSD" may be used to
 * endorse or promote products derived from this software without specific
 * prior written permission.
 * THIS SOFTWARE IS A COMPONENT OF 386BSD DEVELOPED BY WILLIAM F. JOLITZ AND
 * IS INTENDED FOR RESEARCH AND EDUCATIONAL PURPOSES ONLY. THIS SOFTWARE
SHOULD
 * NOT BE CONSIDERED TO BE A COMMERCIAL PRODUCT. THE DEVELOPER URGES THAT
USERS
 * WHO REQUIRE A COMMERCIAL PRODUCT NOT MAKE USE OF THIS WORK.
 * THIS SOFTWARE IS PROVIDED BY THE DEVELOPER "AS IS" AND ANY EXPRESS OR
 * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
OF
 * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO
 * EVENT SHALL THE DEVELOPER BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
 * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
TO,
 * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
 * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
 * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
 * OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
 * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

 * Resource lists. Usage:
 * rlist_free(&swapmap, 100, 200); add space to swapmap
 * rlist_alloc(&swapmap, 100, &loc); obtain 100 sectors from swap
 */

/* A resource list element. */
struct rlist {
 unsigned rl_start; /* boundaries of extent - inclusive */
 unsigned rl_end; /* boundaries of extent - inclusive */
 struct rlist *rl_next; /* next list entry, if present */
};

/* Functions to manipulate resource lists. */
extern rlist_free __P((struct rlist **, unsigned, unsigned));
int rlist_alloc __P((struct rlist **, unsigned, unsigned *));
extern rlist_destroy __P((struct rlist **));

/* heads of lists */
struct rlist *swapmap;






[LISTING TWO]

/* Copyright (c) 1992 William Jolitz. All rights reserved.
 * Written by William Jolitz 1/92
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions are met:
 * 1. Redistributions of source code must retain the above copyright notice,
 * this list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright
notice,
 * this list of conditions and the following disclaimer in the documentation
 * and/or other materials provided with the distribution.
 * 3. All advertising materials mentioning features or use of this software
 * must display the following acknowledgement: This software is a component
 * of "386BSD" developed by William F. Jolitz, TeleMuse.
 * 4. Neither the name of the developer nor the name "386BSD" may be used to
 * endorse or promote products derived from this software without specific
 * prior written permission.
 * THIS SOFTWARE IS A COMPONENT OF 386BSD DEVELOPED BY WILLIAM F. JOLITZ AND
 * IS INTENDED FOR RESEARCH AND EDUCATIONAL PURPOSES ONLY. THIS SOFTWARE
SHOULD
 * NOT BE CONSIDERED TO BE A COMMERCIAL PRODUCT. THE DEVELOPER URGES THAT
USERS
 * WHO REQUIRE A COMMERCIAL PRODUCT NOT MAKE USE OF THIS WORK.
 * THIS SOFTWARE IS PROVIDED BY THE DEVELOPER "AS IS" AND ANY EXPRESS OR
 * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
OF
 * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO
 * EVENT SHALL THE DEVELOPER BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
 * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
TO,
 * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
 * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
 * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
 * OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
 * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */

#include "sys/param.h"
#include "sys/cdefs.h"

#include "sys/malloc.h"
#include "rlist.h"

/* Resource lists. */
/* Add space to a resource list. Used to either
 * initialize a list or return free space to it. */
rlist_free (rlp, start, end)
register struct rlist **rlp; unsigned start, end; {
 struct rlist *head;
 head = *rlp;
loop:
 /* if nothing here, insert (tail of list) */
 if (*rlp == 0) {
 *rlp = (struct rlist *)malloc(sizeof(**rlp), M_TEMP, M_NOWAIT);
 (*rlp)->rl_start = start;
 (*rlp)->rl_end = end;
 (*rlp)->rl_next = 0;
 return;
 }
 /* if new region overlaps something currently present, panic */
 if (start >= (*rlp)->rl_start && start <= (*rlp)->rl_end)
 panic("overlapping rlist_free: freed twice?");
 if (end >= (*rlp)->rl_start && end <= (*rlp)->rl_end)
 panic("overlapping rlist_free: freed twice?");
 /* are we adjacent to this element? (in front) */
 if (end+1 == (*rlp)->rl_start) {
 /* coalesce */
 (*rlp)->rl_start = start;
 goto scan;
 }
 /* are we before this element? */
 if (end < (*rlp)->rl_start) {
 register struct rlist *nlp;
 nlp = (struct rlist *)malloc(sizeof(*nlp), M_TEMP, M_NOWAIT);
 nlp->rl_start = start;
 nlp->rl_end = end;
 nlp->rl_next = *rlp;
 *rlp = nlp;
 return;
 }
 /* are we adjacent to this element? (at tail) */
 if ((*rlp)->rl_end + 1 == start) {
 /* coalesce */
 (*rlp)->rl_end = end;
 goto scan;
 }
 /* are we after this element */
 if (start > (*rlp)->rl_end) {
 rlp = &((*rlp)->rl_next);
 goto loop;
 } else
 panic("rlist_free: can't happen");
scan:
 /* can we coalesce list now that we've filled a void? */
 {
 register struct rlist *lp, *lpn;
 for (lp = head; lp->rl_next ;) {
 lpn = lp->rl_next;
 /* coalesce ? */

 if (lp->rl_end + 1 == lpn->rl_start) {
 lp->rl_end = lpn->rl_end;
 lp->rl_next = lpn->rl_next;
 free(lpn, M_TEMP);
 } else
 lp = lp->rl_next;
 }
 }
}
/* Obtain a region of desired size from a resource list. If nothing available
 * of that size, return 0. Otherwise, return a value of 1 and set resource
 * start location with *loc. (Note: loc can be zero if we don't wish value) */
int rlist_alloc (rlp, size, loc)
struct rlist **rlp; unsigned size, *loc; {
 register struct rlist *lp = *rlp, *olp = 0;
 /* walk list, allocating first thing that's big enough (first fit) */
 for (; *rlp; rlp = &((*rlp)->rl_next))
 if(size <= (*rlp)->rl_end - (*rlp)->rl_start + 1) {
 /* hand it to the caller */
 if (loc) *loc = (*rlp)->rl_start;
 (*rlp)->rl_start += size;
 /* did we eat this element entirely? */
 if ((*rlp)->rl_start > (*rlp)->rl_end) {
 lp = (*rlp)->rl_next;
 free (*rlp, M_TEMP);
 *rlp = lp;
 }
 return (1);
 }
 /* nothing in list that's big enough */
 return (0);
}

/* Finished with this resource list, reclaim all space and
 * mark it as being empty. */
rlist_destroy (rlp)
struct rlist **rlp; {
 struct rlist *lp, *nlp;

 lp = *rlp;
 *rlp = 0;
 for (; lp; lp = nlp) {
 nlp = lp->rl_next;
 free (lp, M_TEMP);
 }
}





[LISTING THREE]

/* Excerpted with permission from 4.3BSD include file
 * "/usr/include/sys/exec.h"
 * Redistribution and use in source and binary forms are freely permitted
 * provided that the above copyright notice and attribution and date of work
 * and this paragraph are duplicated in all such forms.
 * THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR

 * IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
 * WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.
 * Header prepended to each a.out file.
 */
struct exec {
 long a_magic; /* magic number */
unsigned long a_text; /* size of text segment */
unsigned long a_data; /* size of initialized data */
unsigned long a_bss; /* size of uninitialized data */
unsigned long a_syms; /* size of symbol table */
unsigned long a_entry; /* entry point */
unsigned long a_trsize; /* size of text relocation */
unsigned long a_drsize; /* size of data relocation */
};

#define OMAGIC 0407 /* old impure format */
#define NMAGIC 0410 /* read-only text */
#define ZMAGIC 0413 /* demand load format */












































May, 1992
FILE VERIFICATION USING CRC


32-bit cyclical redundancy check


 This article contains the following executables: CRCMAN.ARC


Mark R. Nelson


Mark is vice president of software development at Greenleaf Software and
author of The Data Compression Book (M&T Publishing). Mark can be contacted at
16479 Dallas Parkway, Suite 570, Dallas, TX 75248.


File verification is the process of determining whether a file on a computer
has been modified unexpectedly. No matter whether this happens through
hardware failure, program error, or malicious tampering, I like to know when a
file has had its contents altered. Likewise, I'd like a convenient way to
check the integrity of a file to verify that it hasn't been changed.
The question of file integrity has been on my mind because of several nearly
simultaneous incidents. First of all, I recently ran dozens of relatively
untested programs through my home system when judging the Dr. Dobb's Data
Compression Contest. At least two programs caused inadvertent damage to the
file systems on my PC, one under UNIX and the other under MS-DOS. In both
cases, I was able to spot much of the damage; but after I restored the data
that looked bad, I was left feeling unsure about the rest of my system. Had
other files been damaged in more subtle ways? I suddenly felt as though I
couldn't trust my system.
An even more alarming incident occurred a couple of weeks later. A programmer
who supplies my company (Greenleaf Software) with a product for resale called
and casually mentioned that his office had been infested with the notorious
"Stoned" virus. Had we by any chance noticed anything funny in our systems? We
see funny things on our systems on an hourly basis, so suddenly we again did
not trust any of the files on our PCs. (Fortunately, this turned out to be a
false alarm.)
Finally, as part of a recent product release at Greenleaf, we decided to
implement a program that would allow our customers to download short patch
files from our BBS to apply to the source code they purchased from us. I wrote
a small program that could read in the patch file and make modifications to an
existing source file, resulting in a corrected output file. To keep the
program simple, however, we had to have a way to be sure that the input file
we were patching was the file we expected it to be, and that it hadn't been
modified in any way. Our patch program would be capable of really fouling up
the file if a programmer had just changed a few lines here and there before
trying to patch it.
The solution to all these problems consists of two parts. The first is to use
the CRC-32 algorithm to provide a "fingerprint" method of file identification.
The second is a general-purpose program called CRCMAN that can develop a
catalog of CRC values for all the files in a directory tree, and can later
check the files in the same directory tree against the catalog.


CRC-32


CRC-32 is an acronym for the "32-bit Cyclical Redundancy Check" algorithm.
CRC-32 generally refers to a specific 32-bit CRC formula sanctioned by the
CCITT, an international standards body primarily concerned with
telecommunications. CRC-32 is used in communications protocols such as HDLC
and ZMODEM to verify the integrity of blocks of data being transferred through
various media.
CRC calculations are done using a technique with the formidable name of
"polynomial division." A block of data, regardless of how long, is treated as
if each bit in the block is the coefficient in a long polynomial. For example,
a single hexadecimal byte, F0H, would be considered to be the polynomial:
1*X[7] + 1*X[6] + 1*X[5] + 1*X[4] + 0*X[3] + 0*X[2] + 0*X[1] + 0*X[0]. The
terms with coefficients of 0 drop out, so the polynomial can be expressed as:
X[7] + X[6] + X[5] + X[4]. If we are calculating the CRC of an entire file,
the exponents will be very large but this is not a problem. The actual values
of the exponents do not come into play during the calculation of the CRC, so
the CRC values can grow indefinitely without affecting the algorithm.
The calculation of the CRC is done by dividing a second polynomial, the
"generator polynomial," into the message polynomial, producing a quotient and
a remainder. The generator polynomial used by the CRC-32 is: X[32] + X[26] +
X[23] + X[22] + X[16] + X[12] + X[11] + X[10] + X[8] + X[7] + X[5] + X[4] +
X[2] + X[1]. After dividing this generator polynomial into our message
polynomial, we simply discard the quotient, and use the remainder as our
32-bit CRC.
Polynomial division to create a CRC was originally done using hardware shift
registers and Boolean glue logic. Fortunately, "cookbook" algorithms now exist
to implement the CRC on PCs in a relatively fast and efficient manner. In
CRCMAN, I use a table-lookup version of the algorithm that exchanges a small
increase in storage space for fast calculation.
The details of how the CRC calculations work have been discussed in many
places, so I won't go into further details in this article. Some excellent
resources are listed in the References section, if you are interested in
exploring this topic further.


The Qualities of the CRC-32


CRCMAN uses the CRC-32 algorithm to generate a 32-bit number for any given
file. We then treat this 32-bit number as a somewhat unique "fingerprint" for
that file. The fingerprint differs from the human fingerprint in that while no
two people have identical fingerprints, this is not the case for files. More
than 4,294,967,296 different files exist in the world, so it is a foregone
conclusion that some of them must have identical checksums. However, the
CRC-32 does have attributes that make it attractive for verification of files.
These include the following:
Every bit in the message contributes to the CRC. This means that changing any
bit in the message changes the CRC.
Relatively small changes in the message are guaranteed to change in the CRC.
Thus, two files that differ by only a few bits are certain to have different
CRC values.
The histogram of output CRC values for input messages tends to be flat. For a
given input message, the probability of a given CRC being produced is nearly
equal across the entire range of possible CRCs from 0 to FFFFFFFFH.
These are the attributes that the CCITT had in mind when selecting the CRC-32
algorithm, and we assume that they made a good choice. The chances of
inadvertently damaging or modifying a file without modifying the CRC is
vanishingly small, so for all practical purposes, a program such as CRCMAN can
be considered to be infallible.
Another characteristic we would like to see in the CRC-32 is noninvertability.
Although not really necessary when using the CRC to simply guard against
accidental file corruption, noninvertability becomes much more important when
we want to detect virus infestations of our files.
A typical virus might operate by modifying the MS-DOS command interpreter,
COMMAND.COM. My version of COMMAND.COM happens to have a CRC-32 of O2f8690cH.
In the event that a virus modifies this file, it will undoubtedly have a new
CRC-32. The challenge to the virus programmer would then be to add new bytes
to the end of the file so that the original CRC was restored.
Unfortunately, it is possible to do this through simple brute force. In
theory, the virus programmer could just add a limited number of bytes to the
end of COMMAND.COM, and then begin systematically trying out new values of
those bytes until random chance produces a combination that yields the correct
checksum. An efficient algorithm that could calculate a new CRC in a
millisecond should then never need more than a few weeks of processing power
to come up with a set of numbers that solves this puzzle. And a programmer
with access to hardware with an extra couple of orders of magnitude of CPU
power could solve the same problem in an afternoon.
This means that while CRCMAN will be an excellent judge of unintentional
damage to files, it is possible than an exceptionally clever virus will be
able to defeat it. Further modifications to the program would improve its
ability to fight viruses. For example, just storing the length of the file
along with its CRC would make a virus programmer's job much more difficult, if
not impossible.


Implementation


CRCMAN was written as a dual-purpose program. It has two operating modes: one
for building a list of CRC values, and another for checking files against that
list. The command line for CRCMAN has one of two forms. The first form is used
to create a CRC listing file that has the CRC and filename of every file in a
directory tree. The syntax for invoking CRCMAN in this mode is: CRCMAN-b
directory crc-filename. The directory parameter passed to CRCMAN is the name
of a root directory. CRCMAN will calculate the CRC-32 for every file in and
under that directory, and store the results in the CRC file named as the
second parameter. The CRC file created is an ordinary ASCII text file that can
be edited and manipulated using any text editor. All it contains is a sequence
of lines that contain a CRC-32 value followed by a filename.
For example, I ran CRCMAN on the directory that holds my work for this article
on my MS-DOS machine with the following command: CRCMAN-b .
C:\CRCFILES\TEST.CRC. This created a CRC file named TEST.CRC, which contains
the information shown in RX REF = 05920651Figure 1./RX. Later on, I can
check the integrity of these files by running CRCMAN in its second mode, which
takes the command line: CRCMAN C:\CRCFILES\TEST.CRC. In this mode, CRCMAN just
reads in each line of the CRC file, calculates the CRC of the file, and
determines if it matches the stored CRC. When working on this article, I
changed a few lines in the text file, then ran CRCMAN. It produced the data in
Figure 2.
Figure 1: Sample contents of TEST.CRC
363476a4 .\CRC.TXT
b97a5169 .\CRC.BAK
d6a5f5f5 .\CRCMAN.C
02f8690c .\TEST\COMMAND.COM
88f2e4d6 .\CRCMAN.EXE

23123e1c .\TEMP.CRC


Figure 2: Sample output from CRCMAN
 Checking .\CRC.TXT .
 Error: Expected 76a414c6, got 86793634
 Checking .\CRC.BAK .
 Error: Expected 516914c6, got 76a4b97a
 Checking .\CRCMAN.C . OK
 Checking .\TEST\COMMAND.COM .. OK
 Checking .\CRCMAN.EXE . OK
 Checking .\TEMP.CRC . OK

CRCMAN correctly detected the changes in the two files. In the current
implementation of CRCMAN, when an error is detected, an error message is
printed out to the screen. In a production version of this program, the action
taken can obviously grow to be as sophisticated as you like.


The Code


Listing One is the complete listing for CRCMAN. This version of the program is
designed to run with most MS-DOS C compilers, K&R implementations under XENIX,
and the hybrid Microsoft C compiler under UNIX. For the most part, the
portability of the program doesn't intrude when you read the code, but there
are a few exceptions. Microsoft C 6.0 generates messages at warning level 4,
so this code has to be compiled with /W3 instead of /W4. Additionally, when
compiling for UNIX targets, you will have to define the UNIX macro in order to
turn on the correct directory processing code. A few places in the code
contain code bracketed with #ifdef UNIX statements. This puts the burden on
the programmer compiling under UNIX or XENIX to define the macro UNIX either
in the program or on the command line.
The main() routine of CRCMAN has to first perform initialization of the table
used when calculating CRC-32 values. This routine, found in BuildCRCTable(),
initializes all the values in the array CRCTable[]. These are used later in
the program anytime a CRC value is calculated.
Once main() has built the CRC table, it next checks to see which mode the user
has selected, based solely on the number of arguments passed on the command
line. If argc is equal to 2, main() assumes it has been invoked with a single
filename as an argument, and it calls CheckFiles(). If argc is equal to 4 and
the first argument is -b, main() assumes it has been invoked to build a CRC
file, and it calls BuildCRCFiles(). If neither of these turns out to be true,
a usage message is printed out and the program exits.


Building the CRC File


Of the two possible jobs given to this program, building the CRC file is the
more complex. Both tasks have to calculate the CRC-32 values for one or more
files, but building the file has the additional job of navigating through the
directory tree. Complicating this even more, navigating the directory tree has
to be done differently under UNIX and MS-DOS.
BuildCRCFile() sets things up for the task by opening up the output file that
will receive all the filenames and CRC values. It then makes a call to the
routine that does all the work, ProcessAllFiles(). This routine takes two
arguments, a path name and a CRC file FILE pointer.
ProcessAllFiles() has the same flow of control under UNIX, XENIX, and various
MS-DOS compilers. Unfortunately, the function names and structures needed to
implement the loop vary quite a bit between the various environments. The
pseudocode for this routine looks like that in Example 1.
Example 1: Pseudocode for the ProcessAllFiles() routine
 ProcessAllFiles( path )
 dir = OpenDirectory( path )
 while FilesLeftInDirectory( dir )
 filename = GetNextFile( dir )
 if filename is a directory then
 ProcessAllFiles( filename )
 else
 crc = CalculateCRC( filename )
 write filename and crc
 endif
 end of while
 end of ProcessAllFiles


The underlying operating systems primitives to implement this pseudocode are
different under MS-DOS and UNIX. UNIX uses a pair of functions called
opendir() and readdir() to open the directory and get the next filename. Under
MS-DOS, there are a pair of MS-DOS function calls named findfirst() and
findnext(). To complicate things under MS-DOS, Borland has implemented these
system calls using different function and structure names than those selected
by other vendors.
The result of all these variations is a routine that has been coded using a
rat's nest of #ifdef statements and macro definitions. However, if you
understand the underlying pseudocode, the C is not too bad. One nice thing
about this particular function is that it provides an excellent example of a
job that is truly easier to do using recursion.
Examining the body of ProcessAllFiles() shows that near the bottom of the
routine, a call is made to Calculate CRCFile(). This routine calculates the
CRC-32 file for the file. The result is then printed out along with the
filename to the CRC log file. As ProcessAllFiles() does its work, a complete
listing of the entire directory tree is built up, for later use by CRCMAN in
its checking mode.


Calculating the File CRC


Calculating the CRC-32 for a given file is relatively easy. The
CalculateFileCRC() routine repeatedly reads in blocks of 512 bytes and passes
them through the CalculateBufferCRC() routine. CalculateBufferCRC takes as an
argument the CRC of the file up to that point, and returns the new CRC for the
file so far. This process repeats until the entire file has been processed.
When calculating the file CRC, this routine initializes the CRC value to all
1s, or 0xFFFFFFFFL. After the file calculation has completed, the bits in the
CRC are inverted by XORing with the same value, 0xFFFFFFFFL. This pre- and
postconditioning is intended to provide additional error immunity and is used
by protocols such as ZMODEM, as well as programs such as PKZIP and ARJ. The
CRC values produced by CRCMAN consequently match up with those you would see
in the listing of a ZIP file containing the same list of files.


Checking the Files



The second mode of operation for this program is the CRC check. Most of the
work here is done in the CheckFiles() routine. It gets to bypass the directory
tree navigation, because all the filenames it needs to check are already
stored in the CRC log file. All this routine does is repeatedly read in a line
from the CRC log file containing a 32-bit CRC value and a filename. It then
calculates the actual CRC for the file and reports on whether the stored and
calculated CRC values match up.
The simple ASCII format of this file makes for easy maintenance of the CRC log
files. For example, if the log file contains the CRC values for the directory
tree containing your Borland C++ compiler, you will probably regularly get
reports from CRCMAN that your configuration file, TURBOC.CFG, has changed.
Because you know this file is supposed to change, it is a simple enough matter
to edit the log file and delete the line that refers to the file.


Using the Program


CRCMAN can be set up to provide a quick way to check the integrity of any or
all of the files on your system. By calling CRCMAN with the -b parameter for
every directory full of executables, you create a set of CRC log files that
can be periodically checked with a single call to CRCMAN. CRCMAN operates
quickly enough that you can even include it as part of your AUTOEXECBAT file
under MS-DOS, without letting it slow down your work too much.
CRCMAN could also be easily modified to provide good virus checking. To make
things a little tougher on the virus programmer, you would probably want to
store the length of the file along with CRC-32, and perhaps even store the
file time stamp. While even these measures don't give you an iron-clad
guarantee against a virus attack, they will probably detect the vast majority
of infestations.
Ultimately, the best way to use CRCMAN would be as a concurrent process that
runs continuously on your machine at low priority. This is fairly easy to
implement under OS/2 or UNIX, but is somewhat problematic under MS-DOS.


References


Campbell, Joe. C Programmer's Guide to Serial Communications. Indianapolis,
Ind.: Howard W. Sams, 1988.
Ramabadran, Tenkasi V. and Sunil S. Gaitonde. "A Tutorial on CRC
Computations." IEEE Micro (August 1988).
DDJ


_FILE VERIFICATION USING CRC_
by Mark R. Nelson



[LISTING ONE]

/************************** Start of CRCMAN.C *************************
 * This program is used to build a list of CRC-32 values for all of files in a
 * given directory tree. After building file, program can be run later to
 * verify CRC values, giving assurance of integrity of files. To build CRC
file
 * command line is: CRCMAN -b root-dir crc-file-name
 * To check list of files created, run with: CRCMAN crc-file-name
 * Should work with most 16 and 32-bit compilers under MS-DOS and UNIX. */

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>

unsigned long CRCTable[ 256 ];

/* To build this program under UNIX, define UNIX either on the command line or
 * by editing this file. To define it on the command line, program should be
 * built like this: cc -o crcman -DUNIX crcman.c Code in this program assumes
 * UNIX compiler is K&R variety; does away with real function prototyping. */
#ifdef UNIX

#include <varargs.h>
#ifdef M_XENIX
#include <sys/ndir.h>
#else
#include <sys/dirent.h>
#endif /* M_XENIX */

#define SEPARATOR "/"
#define FILENAME_SIZE 81


void FatalError();
unsigned long CalculateFileCRC();
void ProcessAllFiles();
void BuildCRCFile();
void CheckFiles();
unsigned long CalculateBufferCRC();
void BuildCRCTable();

#else /* not UNIX, must be MSDOS */
/* Most MS-DOS compilers have converged on same names for structures and
 * functions used when searching directories. Borland C implementations use a
 * variant, requiring macro definitions. Functions work in an identical
manner,
 * so actual implementation of code is straightforward. Addition of MS-DOS
 * definition helps convince Zortech compiler to use same structure and
 * function names as everyone else. */
#define MSDOS 1
#include <stdarg.h>
#include <dos.h>

#define SEPARATOR "\\"
#define FILENAME_SIZE FILENAME_MAX

#ifdef __TURBOC__

#include <dir.h>
#define FILE_INFO struct ffblk
#define FIND_FIRST( n, i ) findfirst( ( n ), ( i ), FA_DIREC )
#define FIND_NEXT( info ) findnext( ( info ) )
#define FILE_NAME( info ) ( ( info ).ff_name )

#else

#define FILE_INFO struct find_t
#define FIND_FIRST( n, i ) _dos_findfirst( (n), _A_SUBDIR, (i) )
#define FIND_NEXT( info ) _dos_findnext( ( info ) )
#define FILE_NAME( info ) ( ( info ).name )

#endif

void FatalError( char *fmt, ... );
unsigned long CalculateFileCRC( FILE *file );
void ProcessAllFiles( char *path, FILE *crc_file );
void BuildCRCFile( char *input_dir_name, char *crc_file_name );
void CheckFiles( char *crc_file_name );
unsigned long CalculateBufferCRC( unsigned int count, unsigned long crc,
 void *buffer );
void BuildCRCTable( void );

#endif /* UNIX */

/* Main program checks for valid occurences of two different types of command
 * lines, and executes them if found. Otherwise, it prints out a simple
 * usage statement and exits. */
int main( argc, argv )
int argc;
char *argv[];
{
 setbuf( stdout, NULL );

 BuildCRCTable();
 if ( argc == 2 )
 CheckFiles( argv[ 1 ] );
 else if ( argc == 4 && strcmp( argv[ 1 ], "-b" ) == 0 )
 BuildCRCFile( argv[ 2 ], argv[ 3 ] );
 else {
 printf( "Usage: CRCMAN [-b input_dir] crc-file \n" );
 printf( "\n" );
 printf( "Using the -b option checks all files under the input_dir\n" );
 printf( "and appends their data to the crc-file. Otherwise, the\n" );
 printf( "program checks the CRC data of all of the files in the\n" );
 printf( "crc-file and prints the results\n" );
 return( 1 );
 }
 return( 0 );
}
/* Instead of performing a straightforward calculation of the 32-bit CRC using
 * a series of logical operations, program uses faster table lookup method. */
#define CRC32_POLYNOMIAL 0xEDB88320L

void BuildCRCTable()
{
 int i;
 int j;
 unsigned long crc;

 for ( i = 0; i <= 255 ; i++ ) {
 crc = i;
 for ( j = 8 ; j > 0; j-- ) {
 if ( crc & 1 )
 crc = ( crc >> 1 ) ^ CRC32_POLYNOMIAL;
 else
 crc >>= 1;
 }
 CRCTable[ i ] = crc;
 }
}
/* Routine checks CRC values for a list of files. */
void CheckFiles( crc_file_name )
char *crc_file_name;
{
 FILE *crc_file;
 FILE *test_file;
 unsigned long log_crc;
 unsigned long crc;
 char log_name[ FILENAME_SIZE ];
 int result;

 crc_file = fopen( crc_file_name, "r" );
 if ( crc_file == NULL )
 FatalError( "Couldn't open the log file: %s\n", crc_file_name );
 for ( ; ; ) {
 result = fscanf( crc_file, "%lx %s", &log_crc, log_name );
 if ( result < 2 )
 break;
 test_file = fopen( log_name, "rb" );
 if ( test_file != NULL ) {
 printf( "Checking %s ", log_name );
 crc = CalculateFileCRC( test_file );

 fclose( test_file );
 if ( crc != log_crc )
 printf( "Error: Expected %08lx, got %08lx\n",
 log_name, log_crc, crc );
 else
 printf( "OK\n" );
 } else
 printf( "Could not open file %s\n", log_name );
 }
}
/* ProcessAllFiles() scans directory. Routine also makes sure that directory
 * name passed on command line is stripped of trailing '/' or '\' character.
*/
void BuildCRCFile( input_dir_name, crc_file_name )
char *input_dir_name;
char *crc_file_name;
{
 char path[ FILENAME_SIZE ];
 FILE *crc_file;

 strcpy( path, input_dir_name );
 if ( path[ strlen( path ) - 1 ] == SEPARATOR[ 0 ] )
 path[ strlen( path ) - 1 ] = '\0';
 crc_file = fopen( crc_file_name, "w" );
 if ( crc_file == NULL )
 FatalError( "Can't open crc log file: %s\n", crc_file_name );
 ProcessAllFiles( path, crc_file );
}
/* This routine is responsible for actually performing the calculation of the
 * 32-bit CRC for the entire file. We precondition the CRC value with all 1's,
 * then invert every bit after the entire file has been done. This gives us a
 * CRC value that corresponds with the values calculated by PKZIP and ARJ. */
unsigned long CalculateFileCRC( file )
FILE *file;
{
 unsigned long crc;
 int count;
 unsigned char buffer[ 512 ];
 int i;

 crc = 0xFFFFFFFFL;
 i = 0;
 for ( ; ; ) {
 count = fread( buffer, 1, 512, file );
 if ( ( i++ % 32 ) == 0 )
 putc( '.', stdout );
 if ( count == 0 )
 break;
 crc = CalculateBufferCRC( count, crc, buffer );
 }
 putc( ' ', stdout );
 return( crc ^= 0xFFFFFFFFL );
}
/* This is the routine that is responsible for calculating all of CRC values
 * for files in a given directory. The CRC values and file names are written
 * out to the crc_file. */
void ProcessAllFiles( path, crc_file )
char *path;
FILE *crc_file;
{

#ifdef UNIX
 DIR *dirp;
#ifdef M_XENIX
 struct direct *entry;
#else
 struct dirent *entry;
#endif /* M_XENIX */
#define NAME entry->d_name
#else
 FILE_INFO fileinfo;
 int done;
#define NAME FILE_NAME( fileinfo )
#endif
 char fullname[ FILENAME_SIZE ];
 struct stat buf;
 unsigned long crc;
 FILE *file;

 printf( "Searching %s\n", path );
 strcat( path, SEPARATOR );
#ifdef UNIX
 dirp = opendir( path );
 if ( dirp == NULL )
 FatalError( "Error opening directory %s\n", path );
 entry = readdir( dirp );
 while ( entry != 0 ) {
#else
 strcpy( fullname, path );
 strcat( fullname, "*.*" );
 done = FIND_FIRST( fullname, &fileinfo );
 while ( done == 0 ) {
#endif
 strcpy( fullname, path );
 if ( strcmp( NAME, "." ) && strcmp( NAME, ".." ) ) {
 strcat( fullname, NAME );
 if ( stat( fullname, &buf ) == -1 )
 FatalError( "Error reading stat from file %s!\n", fullname );
 if ( buf.st_mode & S_IFDIR )
 ProcessAllFiles( fullname, crc_file );
 else {
 file = fopen( fullname, "rb" );
 if ( file != NULL ) {
 printf( "Scanning %s ", fullname );
 crc = CalculateFileCRC( file );
 putc( '\n', stdout );
 fprintf( crc_file, "%08lx %s\n", crc, fullname );
 fclose( file );
 } else
 printf( "Could not open %s!\n", fullname );
 }
 }
#ifdef UNIX
 entry = readdir( dirp );
#else
 done = FIND_NEXT( &fileinfo );
#endif
 }
}
/* Routine calculates the CRC for a block of data using table lookup method.

 * It accepts an original value for the crc, and returns the updated value. */
unsigned long CalculateBufferCRC( count, crc, buffer )
unsigned int count;
unsigned long crc;
void *buffer;
{
 unsigned char *p;
 unsigned long temp1;
 unsigned long temp2;

 p = (unsigned char*) buffer;
 while ( count-- != 0 ) {
 temp1 = ( crc >> 8 ) & 0x00FFFFFFL;
 temp2 = CRCTable[ ( (int) crc ^ *p++ ) & 0xff ];
 crc = temp1 ^ temp2;
 }
 return( crc );
}
/* Fatal error handler prints a formatted error message and then exits. */
#ifdef UNIX

void FatalError( va_alist )
va_dcl
{
 char *fmt;
 va_list argptr;

 va_start( argptr );
 fmt = va_arg( argptr, char * );
#else

void FatalError( char *fmt, ... )
{
 va_list argptr;
 va_start( argptr, fmt );
#endif

 printf( "Fatal error: " );
 vprintf( fmt, argptr );
 va_end( argptr );
 exit( -1 );
}




















May, 1992
MULTIUSER DOS FOR CONTROL SYSTEMS: PART II


Building an application




Richard Kryszak


Richard is a senior engineer at Rockwell International Graphic Systems. He has
a BSEE from the University of Illinois in Chicago and a MSMC from Illinois
Institute of Technology. He can be contacted at 9616 South 49th Avenue, Oak
Lawn, IL 60453.


In last month's installment of this two-part article, I described Digital
Research's Multiuser DOS (DRMDOS) in general terms, focusing on the interface
library, interprocess communication, multitasking features, and the like. This
month, I present specific details of how I use DRMDOS as the basis for
industrial control systems on the factory floor. Also, I'll describe how to
use the interface library to develop a system that consists of three
independent processes. The first process is the owner of a memory-resident
database, the second an I/O process, and the third a logic function that
monitors data in the input portion of the database for changes.
The first process will use a queue to pass the pointer to the base of the
database to any other process that wishes to access the database. The I/O
process has two responsibilities: It must read the contents of a digital input
port and deposit them into a location in the database. It must also read
another location in the database and write the data to a digital output port.
This process will utilize the delay function and the priority function to
ensure that it runs on a timely basis. The last function, the logic function,
monitors the data in the input portion of the database for changes. When a
change is encountered, the data is operated on to produce an output that is
put back into the database.


Database Process


The code for the database process is shown in Listing Three. (Note: Listing
One and Listing Two were included with last month's installment.) This is a
simple memory-resident database. It begins by declaring local variables. A
call is made using the s_memory() function to obtain a pointer to a block of
system memory. This is saved as a member of a union. If the pointer cones back
with a NULL value, there is not enough system memory available, and the
program exits. The program then creates and opens a queue called "database."
This queue will be used to pass the database pointer to any process that
wishes to access the database. The "result" values should be used for error
checking and logging. Next, the pointer to the base of the database is written
to the queue. The union is used because the 4-byte pointer must be broken into
bytes to be written to the byte-wide queue. The process then makes a
c_detach() call to detach from the system console. Finally, it goes into a
loop that simply delays the process. In a more realistic system, the
memory-resident database would be initialized at startup or loaded with data
that has been saved to disk. This is an approach I have taken. In the while
loop, the data is periodically written to the hard disk along with a checksum.
The next time the system is started. the data is read in and put into the
database. Another issue that must be addressed is contention for data. If
necessary, a set of semaphores could be used to control access to certain
areas of the database.
If a very large database is needed, it might be necessary to resort to a
disk-based system. Another alternative is to use a data server process that is
accessed through a queue. To implement this type of system would require
exclusive read/write access which could be controlled by a database semaphore.
A process would first get the semaphore and then submit a request for data
through a queue and get the resultant data from another queue.


Database Support Code


There are three operations that can be performed with respect to the database.
They are dbopen(), dbread(), and dbwrite(). In order for a process to access
the database, it must open it. Once the database is open, the process can
perform read or write operations on the database. Listing Four shows the code
that is supplied to interact with the database.
The first function is dbopen(). This function is used to read the database
pointer from the queue. The global pointer dbase_ptr is first assigned a NULL
value. A while loop is then entered that calls the function opens dbase()
until it returns a non-NULL value. This function declares some local
variables, including a union to hold the pointer from the queue. A while loop
is then used to wait for the queue to be created. This is a multitasking
system, so this function might be called before the queue is actually created
by the database process. Once the queue is opened, the information is read
into the union. The message buffer is then written back to the queue to allow
other processes to obtain the pointer. Finally, the pointer is returned. Two
other functions allow access to the data in the database. The dbwrite()
function allows a word in the database to be set to a desired value. These are
the two parameters passed to the function. The previous contents are
destroyed. The dbread() function takes a database address as a parameter and
returns the value in that location.
There are several other possible functions that can be constructed to ease
interaction with the database. It is useful to be able to easily set or clear
individual bits in a database word and to read and write a range of words.


I/O Scanner Process


The I/O scanner process reads data from a digital input board that has four
channels of both 8-bit input and 8-bit output. The base address of the board
is 0x300 in the I/O map of the CPU. In this example, the data will be read
from the first input port and stored in location 0 of the database. The
information in location 1 of the database will be written to the last output
port. The code for the I/O process is contained in Listing Five. The process
starts out by changing its priority to 199, which puts it above all other
normal user processes. It then uses a database support call to open a link to
the central database. Next, it detaches from the console to become a
background process. Finally, a loop is entered in which the actual I/O is
performed. The inp() function is used to read the byte contained at the
digital input port. This data is then written to the central database using
the dbwrite() call. The data to be written to the output port is read from the
database and written to the output port using the outp() call. The program
then executes a p_delay() call to allow other processes to run. This delay
call delays the process for at least 3*16.67 ms, or about 50 ms. This function
will continue running as long as the computer is running.


Logic Process


This process is simple compared to those normally used in a true control
system. It is only monitoring one input byte and performing a simple logic
operation. In a real system there would be more extensive calculations to do.
This is the reason the data is monitored for change rather than simply
executing the logic on the data. If the input data is not changed, there is no
need to do more calculations. The code for this process is contained in
Listing Six. The process starts out by declaring local variables and then
establishes a link to the memory-resident database. Next, it detaches from the
console to become a background process. Finally, a loop is entered in which
the input data (from the I/O process) is read and compared to the last data.
If a change has occurred, the data is inverted and written back to the
database. This new data will be used by the I/O process and written to the
output port. At the end of each loop the p_dispatch() call is made to allow
other processes to run.


Getting Started


Getting things started is pretty simple. It can be done as a batch file or
from the command line. The commands are the same at any rate. First the
database is started by typing dbase. Next, the I/O process can be started by
typing ioboard. Finally the logic process can be started by typing logic. If
everything goes okay, a normal DRMDOS prompt will be all that is left on the
screen. Now is happening? One simple way is to hook up signals to the input
port on the I/O board and check the outputs. This is fine if everything is
working, but if not, this method does not tell us much. I do quite a bit of
debugging using the QuickC environment. The other processes are compiled and
run as above, but the one being debugged is run under QuickC. There are also
other methods of seeing what is going on. I've written a monitor program that
allows the database to be viewed and updated in binary, decimal, and
hexidecimal. There are also system monitor programs available for watching
over system resources. These can be used to see if queues are opened and if
messages are being put into and taken out of the queue. One such system is
available from Digital Research as part of the developer~s package.


Conclusion


I've presented a basic overview of using Digital Research's Multiuser DOS for
industrial control. The basic functions needed to make use of the multitasking
features are contained in the listings. They do not include the error checking
necessary for a robust system. DRMDOS has proven capable of handling the
demands of "lightweight" real-time systems. The information provided here can
be expanded to provide a graphical user interface module that displays the
condition of the database and allows the user to interact with the system.
Another feature I've taken advantage of under DRMDOS is the use of intelligent
multichannel serial boards to allow communications. It is also possible to
network the control computer onto a LAN. DRMDOS is multiuser, so terminals can
be connected to the system to allow remote monitoring of controls used in
harsh environments. Although DRMDOS was not originally designed to be used on
the factory floor, it seems to be adapting to its new environment just fine.
DDJ



_MULTIUSER DOS FOR CONTROL SYSTEMS, PART II_
by Richard Kryszak


[LISTING ONE]

/* file name: system.c */

#include <dos.h>
#include "queues.h"
#include <stdio.h>

/*===============*/
/* local defines */
/*===============*/
#define CCPM 0xE0 /* cdos call int value */
#define C_DETACH 0x93 /* console detach CL register value */
#define P_DELAY 0x8D /* process delay CL register value */
#define P_DISPATCH 0x8E /* process dispatch CL register value */
#define P_PRIOR 0x91 /* process priority CL register value */
#define Q_CREAD 0x8A /* queue cread CL register value */
#define Q_CWRITE 0x8C /* queue cwrite CL register value */
#define Q_MAKE 0x86 /* queue make CL register value */
#define Q_OPEN 0x87 /* queue open CL register value */
#define Q_READ 0x89 /* queue read CL register value */
#define Q_WRITE 0x8B /* queue write CL register value */
#define S_MEMORY 0x59 /* system memory allocation request */

/*=====================*/
/* function prototypes */
/*=====================*/
unsigned int c_detach(void);
void p_dispatch(void);
void p_priority(unsigned char data);
void p_delay(unsigned int del);
unsigned int far * s_memory(int mem_size);
int q_make(struct q_descriptor *descript_ptr,
 unsigned int msg_length,
 unsigned int num_msg,
 char que_name[8],
 int *err_ptr);
int q_open(struct q_parameter_blk *param_blk_ptr,
 char que_name[8],
 int *err_ptr);
int q_read(struct q_parameter_blk *param_blk_ptr,
 unsigned char *buff_ptr,
 int *err_ptr);
int q_write(struct q_parameter_blk *param_blk_ptr,
 unsigned char *buff_ptr,
 int *err_ptr);
int q_cread(struct q_parameter_blk *param_blk_ptr,
 unsigned char *buff_ptr,
 int *err_ptr);
int q_cwrite(struct q_parameter_blk *param_blk_ptr,
 unsigned char *buff_ptr,
 int *err_ptr);

/*======================*/

/* function definitions */
/*======================*/
unsigned int c_detach()
 { union REGS inregs,outregs;

 inregs.h.cl = C_DETACH; /* detach function call */
 int86(CCPM,&inregs,&outregs); /* call cdos */
 return(outregs.x.ax); /* return call status */
 }
void p_dispatch()
 { union REGS inregs,outregs;

 inregs.h.cl = P_DISPATCH; /* dispatch function call */
 int86(CCPM,&inregs,&outregs); /* call cdos */
 }
void p_priority(unsigned char priority)
 { union REGS inregs,outregs;

 inregs.h.cl = P_PRIOR; /* priority change call */
 inregs.h.dl = priority; /* desired priority */
 int86(CCPM,&inregs,&outregs); /* call cdos */
 }
void p_delay(unsigned int del)
 { union REGS inregs,outregs;
 inregs.h.cl = P_DELAY; /* delay function call */
 inregs.x.dx = del; /* number of ticks */
 int86(CCPM,&inregs,&outregs); /* call cdos */
 }
unsigned int far * s_memory(int mem_size)
 { union REGS inregs,outregs;
 struct SREGS seg_regs; /* segment registers */
 unsigned int _far *mem_ptr=NULL; /* pointer to memory block */
 mem_size *= 2; /* compute # of bytes */
 inregs.h.cl = S_MEMORY; /* system memory allocation */
 inregs.x.dx = mem_size; /* # of bytes requested */
 int86x(CCPM,&inregs,&outregs,&seg_regs); /* call cdos */
 if(outregs.x.ax == 0xFFFF) /* if not successful */
 { return(NULL); /* return a null pointer */
 }
 mem_ptr = (unsigned int far *)
 ((0x10000 * seg_regs.es)
 + outregs.x.ax); /* convert into a pointer */
 return(mem_ptr); /* return the pointer */
 }
int q_make(struct q_descriptor *descript_ptr,
 unsigned int msg_length,
 unsigned int num_msg,
 char que_name[8],
 int *err_ptr)
 { int int86_error; /* return status */
 int i; /* index variable */
 union REGS inregs, outregs; /* processor registers */
 struct SREGS seg_regs; /* segment registers */
 segread(&seg_regs); /* read segment registers */
 descript_ptr->internal_1 = 0; /* must be 0 */
 descript_ptr->internal_2 = 0; /* must be 0 */
 descript_ptr->internal_3 = 0; /* must be 0 */
 descript_ptr->internal_4 = 0; /* must be 0 */
 descript_ptr->internal_5 = 0; /* must be 0 */

 descript_ptr->internal_6 = 0; /* must be 0 */
 descript_ptr->msglen = msg_length; /* add message length */
 descript_ptr->nmsgs = num_msg; /* add number of messages */
 descript_ptr->flags = 0; /* no flags used */
 for(i = 0; i < 8; ++i) /* copy queue name */
 { descript_ptr->name[i]=que_name[i];
 }
 descript_ptr->buffer = 0; /* buffer in system area */
 inregs.h.cl = Q_MAKE; /* queue make call */
 inregs.x.dx = FP_OFF(descript_ptr); /* put offset into dx */
 int86_error=int86x(CCPM,&inregs,
 &outregs,&seg_regs); /* call cdos */
 *err_ptr = outregs.x.cx; /* write error code */
 return(int86_error); /* int86 return status */
 }
int q_open(struct q_parameter_blk *param_blk_ptr,
 char que_name[8],
 int *err_ptr)
 { int int86_error; /* return status */
 int i; /* index variable */
 union REGS inregs, outregs; /* processor registers */
 struct SREGS seg_regs; /* segment registers */
 segread(&seg_regs); /* read segment registers */
 param_blk_ptr->internal_1 = 0; /* must be 0 */
 param_blk_ptr->internal_2 = 0; /* must be 0 */
 for(i = 0; i < 8; ++i)
 { param_blk_ptr->name[i] = que_name[i]; /* copy queue name */
 }
 inregs.h.cl = Q_OPEN; /* q_open call */
 inregs.x.dx = FP_OFF(param_blk_ptr); /* put offset into dx */
 int86_error=int86x(CCPM,&inregs,
 &outregs,&seg_regs); /* call cdos */
 *err_ptr = outregs.x.cx; /* write error code */
 return(int86_error); /* int86 return status */
 }
int q_write(struct q_parameter_blk *param_blk_ptr,
 unsigned char *buff_ptr,
 int *err_ptr)
 { int int86_error; /* return status */
 union REGS inregs, outregs; /* processor registers */
 struct SREGS seg_regs; /* segment registers */
 segread(&seg_regs); /* read segment registers */
 param_blk_ptr->buffer=FP_OFF(buff_ptr); /* pointer to the buffer */
 inregs.h.cl = Q_WRITE; /* q_write call */
 inregs.x.dx = FP_OFF(param_blk_ptr); /* put offset into dx */
 int86_error=int86x(CCPM,&inregs, &outregs,&seg_regs); /* call cdos */
 *err_ptr = outregs.x.cx; /* write error code */
 return(int86_error); /* int86 return status */
 }
int q_read(struct q_parameter_blk *param_blk_ptr,
 unsigned char *buff_ptr,
 int *err_ptr)
 { unsigned int int86_error; /* return status */
 union REGS inregs, outregs; /* processor registers */
 struct SREGS seg_regs; /* segment registers */
 segread(&seg_regs); /* read segment registers */
 param_blk_ptr->buffer=FP_OFF(buff_ptr); /* pointer to the buffer */
 inregs.h.cl = Q_READ; /* q_read call */
 inregs.x.dx = FP_OFF(param_blk_ptr); /* put offset into dx */

 int86_error=int86x(CCPM,&inregs,&outregs,&seg_regs); /* int86 call */
 *err_ptr = outregs.x.cx; /* write error code */
 return(int86_error); /* int86 return status */
 }
int q_cwrite(struct q_parameter_blk *param_blk_ptr,unsigned char *buff_ptr,
 int *err_ptr)
 { int int86_error; /* return status */
 union REGS inregs, outregs; /* processor registers */
 struct SREGS seg_regs; /* segment registers */
 segread(&seg_regs); /* read segment registers */
 param_blk_ptr->buffer=FP_OFF(buff_ptr); /* pointer to the buffer */
 inregs.h.cl = Q_CWRITE; /* q_write call */
 inregs.x.dx = FP_OFF(param_blk_ptr); /* put offset into dx */
 int86_error=int86x(CCPM,&inregs,&outregs,&seg_regs); /* call cdos */
 *err_ptr = outregs.x.cx; /* write error code */
 return(int86_error); /* int86 return status */
 }
int q_cread(struct q_parameter_blk *param_blk_ptr,unsigned char *buff_ptr,
 int *err_ptr)
 { int int86_error; /* return status */
 union REGS inregs, outregs; /* processor registers */
 struct SREGS seg_regs; /* segment registers */
 segread(&seg_regs); /* read segment registers */
 param_blk_ptr->buffer=FP_OFF(buff_ptr); /* pointer to the buffer */
 inregs.h.cl = Q_CREAD; /* q_cread call */
 inregs.x.dx = FP_OFF(param_blk_ptr); /* put offset into dx */
 int86_error=int86x(CCPM,&inregs,&outregs,&seg_regs); /* call cdos */
 *err_ptr = outregs.x.cx; /* write error */
 return(int86_error); /* int86 return status */
 }





[LISTING TWO]

/* file name: queues.h */

struct q_descriptor
{ unsigned int internal_1; /* for internal use ; must be zero */
 unsigned int internal_2; /* for internal use ; must be zero */
 int flags; /* for internal use ; queue flags */
 char name[8]; /* queue name */
 int msglen; /* number of bytes in each logical message */
 int nmsgs; /* maximum number of messages supported */
 unsigned int internal_3; /* for internal use ; must be zero */
 unsigned int internal_4; /* for internal use ; must be zero */
 unsigned int internal_5; /* for internal use ; must be zero */
 unsigned int internal_6; /* for internal use ; must be zero */
 unsigned int buffer; /* address of the queue buffer */
 };

struct q_parameter_blk
{ unsigned int internal_1; /* for internal use ; must be zero */
 int queueid; /* queue number field ; filled by q_open */
 unsigned int internal_2; /* for internal use ; must be zero */
 unsigned int buffer; /* offset of queue message buffer */
 char name[8]; /* queue name */

 };





[LISTING THREE]

/* file name: database.c */

#include <stdio.h>
#include "queues.h"

/*=====================*/
/* function prototypes */
/*=====================*/
void main(void);

/*===============*/
/* local defines */
/*===============*/
#define Q_DEPTH 1 /* queue contains 1 message */
#define DBASE_SIZE 2048 /* size of the database */
#define TRUE 1

/*================================*/
/* external function declarations */
/*================================*/
extern unsigned int c_detach(void);
extern void p_delay(unsigned int del);
extern unsigned int far *s_memory(int mem_size);
extern int q_make(struct q_descriptor *descript_ptr,
 unsigned int msg_length,
 unsigned int num_msg,
 char que_name[8],
 int *err_ptr);
extern int q_open(struct q_parameter_blk *param_blk_ptr,
 char que_name[8],
 int *err_ptr);
extern int q_write(struct q_parameter_blk *param_blk_ptr,
 unsigned char *buff_ptr,
 int *err_ptr);
/*=====================*/
/* function definition */
/*=====================*/
void main()
 { int result; /* result of q_make */
 int error_type; /* cdos return code */
 union base
 { unsigned int far *base_ptr;
 unsigned char base[sizeof(unsigned int far *)];
 }base_union; /* composite pointer */
 struct q_descriptor dbase_descript; /* descriptor block */
 struct q_parameter_blk dbase_parameters; /* parameter block */
 base_union.base_ptr = s_memory(DBASE_SIZE); /* request system memory */
 if(base_union.base_ptr == NULL) /* if NULL pointer */
 { puts("No System Memory Available"); /* print an error message */
 exit(-1); /* exit, memory error */
 }

 result = q_make(&dbase_descript, /* pointer to descriptor */
 sizeof(base_union), /* length of messages */
 Q_DEPTH, /* number of messages */
 "database", /* queue name */
 &error_type); /* error return */
 result = q_open(&dbase_parameters, /* pointer to parameter */
 "database", /* queue name */
 &error_type); /* error return */
 result = q_cwrite(&dbase_parameters, /* write to queue */
 &base_union.base[0], /* pointer to database */
 &error_type); /* error return */
 c_detach(); /* detach from console */
 while(TRUE) /* loop */
 { p_delay(1800); /* delay 30 seconds */
 }
 }
/* NOTE: DATABASE.EXE is made up of database.c and system.c */





[LISTING FOUR]

/* file name: dbsuport.c */

#include <stdio.h>
#include "queues.h"

/*=====================*/
/* function prototypes */
/*=====================*/
void dbopen(void);
unsigned int dbread(int index);
void dbwrit(int index, unsigned int value);
unsigned int far *open_dbase(void);

/*==============================*/
/* external function prototypes */
/*==============================*/
extern int q_open(struct q_parameter_blk *param_blk_ptr,
 char que_name[8],
 int *err_ptr);
extern void p_dispatch(void);
extern int q_read(struct q_parameter_blk *param_blk_ptr,
 unsigned char *buff_ptr,
 int *err_ptr);
extern int q_write(struct q_parameter_blk *param_blk_ptr,
 unsigned char *buff_ptr,
 int *err_ptr);

/*================*/
/* global storage */
/*================*/
unsigned int far *dbase_ptr; /* pointer to database */

/*===============*/
/* local defines */
/*===============*/

#define FAILURE 1
#define SUCCESS 0

/*======================*/
/* function definitions */
/*======================*/
void dbopen()
 { dbase_ptr = NULL; /* initialize pointer */
 while(dbase_ptr == NULL) /* loop while still NULL */
 { dbase_ptr = open_dbase(); /* call open database */
 }
 }
void dbwrit(int index, unsigned int value)
 { *(dbase_ptr + index) = value; /* write value to database */
 }
unsigned int dbread(int index)
 { return(*(dbase_ptr + index)); /* return value at index */
 }
unsigned int far *open_dbase()
 { struct q_parameter_blk dbase_parameters; /* parameter block */
 int result; /* result of q_make */
 int error_type; /* cdos return code */
 union base
 { unsigned int far *base_ptr;
 unsigned char base[sizeof(unsigned int far *)];
 }base_union; /* composite pointer */
 result = FAILURE; /* preset the variable */
 while(result != SUCCESS) /* loop til we can open */
 { result = q_open(&dbase_parameters, /* pointer to param block */
 "database", /* queue name */
 &error_type); /* error type return */
 p_dispatch(); /* let someone else run */
 }
 result = q_read(&dbase_parameters, /* read the dbase_queue */
 &base_union.base[0], /* msg read is in union */
 &error_type); /* error return type */
 result = q_write(&dbase_parameters, /* write to dbase_queue */
 &base_union.base[0], /* msg sent is pointer */
 &error_type); /* error return type */
 return(base_union.base_ptr); /* return the pointer */
 }





[LISTING FIVE]

/* file name ioboard.c */

#include <conio.h>

/*====================*/
/* function prototype */
/*====================*/
void main(void);

/*==============================*/
/* external function prototypes */

/*==============================*/
extern void dbopen(void);
extern void dbwrit(int index, unsigned int value);
extern unsigned int dbread(int index);
extern unsigned int c_detach(void);
extern void p_priority(unsigned char data);
extern void p_delay(unsigned int del);

/*===============*/
/* local defines */
/*===============*/
#define INPUT_BASE_ADDR 0x300 /* hardware input address */
#define OUTPUT_BASE_ADDR 0x300 /* hardware output address */
#define DBASE_WRITE_ADDR 0 /* database write location */
#define DBASE_READ_ADDR 1 /* database read location */
#define TRUE 1

/*=====================*/
/* function definition */
/*=====================*/
void main()
 { unsigned int temp_data; /* for reading database */
 p_priority(199); /* set priority */
 dbopen(); /* link to database */
 c_detach(); /* detach from console */
 while(TRUE)
 { temp_data = inp(INPUT_BASE_ADDR); /* read data from port */
 dbwrit(DBASE_WRITE_ADDR, temp_data); /* write to the database */
 temp_data = dbread(DBASE_READ_ADDR); /* read data from database */
 outp(OUTPUT_BASE_ADDR+3, temp_data); /* write to output port */
 p_delay(3); /* delay for 50 ms */
 }
 }

/* NOTE: IOBOARD.EXE is made up of ioboard.c, system.c, and dbsuport.c */





[LISTING SIX]

/* file name logic.c */

/*====================*/
/* function prototype */
/*====================*/
void main(void);

/*==============================*/
/* external function prototypes */
/*==============================*/
extern void dbopen(void);
extern unsigned int dbread(int index);
extern void dbwrit(int index, unsigned int value);
extern unsigned int c_detach(void);
extern void p_dispatch(void);

/*===============*/

/* local defines */
/*===============*/
#define DATA_IN 0 /* data written by I/O process */
#define DATA_OUT 1 /* data read by I/O process */
#define TRUE 1

/*=====================*/
/* function definition */
/*=====================*/
void main()
 { static unsigned int last_data_read; /* old data retainer */
 unsigned int temp_data; /* for reading database */

 dbopen(); /* link to database */
 c_detach(); /* detach from console */
 while(TRUE) /* continuous loop */
 { temp_data = dbread(DATA_IN); /* read input data */
 if(temp_data ^ last_data_read) /* if there was a change */
 { dbwrit(DATA_OUT, ~temp_data); /* write the to the port */
 last_data_read = temp_data; /* save the new value */
 }
 p_dispatch(); /* let another process run */
 }
 }
/* NOTE: LOGIC.EXE is made up of logic.c, system.c, and dbsuport.c */





































May, 1992
BRIDGING THE GAP WITH RESIDENT_C


Improving text exchanges between DOS and Windows applications




Charles Albert Mirho


Charles is a software consultant in New Jersey and holds an MS degree in
computer engineering from Rutgers University. He can be reached at 73
Montgomery Street, Piscataway, NJ 08854.


If you're running Windows in 386 Enhanced Mode, cutting from a WinApp and
pasting to a DOS application is easy--simply run the DOS app in a Window and
cut to and paste from the clipboard. But life isn't that simple if you're
stuck in real or standard mode. In this case you must first copy from your
clipboard, switch to the DOS application, select the insertion point where you
want the information from the clipboard to appear, press Alt-Esc to switch
back to Windows without quitting the application, click the application icon
to open the control menu for the destination application, and finally choose
Paste.
The main drawback to this procedure is that it requires you to exit from the
application before pasting the data in. But that's not all. If you are running
in real or standard mode, copying the entire screen using Print Screen is the
only method for copying information from a non-Windows application. This
doesn't provide much flexibility--copying even a single line of text requires
capturing the entire screen.
This article presents a TSR that bridges the gap between DOS and real- or
standard-mode Windows. This TSR is first loaded under DOS. Upon startup,
Windows preserves the contents of occupied real memory, including the TSR and
DOS. Windows will be configured so that a tiny program, WINDOS, runs
automatically on startup. The user will never interact with WINDOS. Instead,
WINDOS appears as an icon at the bottom of the screen to let us know that
improved data exchange with DOS programs is available.
The improved method doesn't require an exit before pasting into the DOS
application, and will allow the exchange of text blocks of any size. The
method works only in real- or standard-mode Windows. (In 386 enhanced mode,
text-mode DOS programs run in a window, making the Windows cut/paste
relatively efficient.) You can still copy data from graphics-mode DOS program
using the Print Screen method described above.


/*resident_C*/


The tricky part of any TSR is knowing when it is safe to interrupt DOS. If
TSRs issued only BIOS calls, writing them would be relatively simple. To
become a TSR, a program would simply hook the keyboard interrupt (9h) and
issue an int 31h to terminate resident. However, if the TSR uses DOS services,
which is likely, it must not take control of the machine while the processor
is executing DOS system code. This is because DOS is a non-reentrant operating
system.
South Mountain Software's /*resident_C*/ library (which I'll refer to as
simply "resident_C") shields the TSR programmer from pop-up pitfalls. The
resident_C library frees the programmer to concentrate on procedural details
without worrying about when it is safe for the TSR to take control.
TSR.C (Listing Two) hooks several interrupts, among them the keyboard
interrupt, timer interrupt, and software interrupts 0x60 and 0x61. The
keyboard interrupt detects when hot keys are pressed. LShift-Alt-C copies text
from the screen; the user highlights the text to copy (Lotus 1-2-3 style),
then presses the Enter key. LShift-Alt-P pastes text at the location of the
text cursor. LShift-Alt-R, which is generally used once, instructs the TSR to
unhook all interrupts and remove itself from memory.
Besides cutting and pasting text, the TSR must do one other important thing:
transfer text between its internal buffer and the Windows clipboard. Actually,
there is no way the TSR can directly access the Windows clipboard. (An obscure
method exists, but it only works under enhanced-mode Windows and is poorly
documented.) But the TSR can respond to interrupts 0x60 and 0x61 issued from
the WINDOS program, which has access to the clipboard. First the TSR must hook
these interrupts in the initialization procedure tsrpre(); see Listing One and
Listing Two. resident_C has a convenient function initisrN() (where N is an
integer from 1 to 9) for hooking interrupts; see Figure 1(a). The first
parameter is the interrupt number to hook, the second parameter is a pointer
to the handler function, and the third is nonzero if we want to chain to the
previous interrupt handler when done processing the interrupt.
Figure 1: (a) resident_C's function for hooking interrupts; (b) defining a hot
key using resident_C; (c) responding to additional hot keys.
 (a)

 initis1 (0x60, isr60, 1);
 initis2 (0x61, isr61, 1);

 (b)

 inittsr2 (LEFT_SHIFTALT_KEY, SCAN_P, vResprog, 1000, uiCCSig, SIG,
 (void far *) 0L);

 (c)

 tsrhotkeys (uiCCSig, (struct HKEYS far *)ahkHotKeys, 2);


WINDOS issues interrupts 0x60 and 0x61 to initiate the transfer of text to and
from the TSR, respectively. In particular, when the TSR receives an interrupt
0x60 from WINDOS, it forms a pointer to the transfer buffer from registers
DS:DX. (resident_C fills the variables isrX_ds and sirX_dx with the contents
of these registers before passing control to the interrupt service routines.)
The TSR then copies the text from the transfer buffer into its own internal
buffer. When the TSR receives an interrupt 0x61 from WINDOS, it again forms a
pointer to the transfer buffer from DS:DX. This time, however, it copies data
from its internal buffer to the transfer buffer.
The resident_C library simplifies hot-key and interrupt processing within the
TSR. One of the first resident_C functions we call is inittsr2(); see Figure
1(b). The first two parameters define the primary hot key to invoke the TSR
(LShift-Alt-P in this case). There is nothing special about the primary hot
key; it is simply the first one defined. The third parameter to inittsr2()
points to a funciton that gets control when a hot key is pressed. The fourth
parameter is the number of paragraphs of extra memory to allocate when the TSR
goes resident. The amount used is excessive--in this case 16K. This particular
TSR does not do dynamic allocations (malloc or calloc), so the number of extra
paragraphs could be reduced to something close to the 2-4K required by the
Microsoft runtime libraries. (I used 16K to put the issue of memory allocation
out of mind in the even of future enhancements.)
The fifth and sixth parameters to inittsr2() define the signature of the TSR
in memory; DOS maintains an internal list of resident programs by their
signature. resident_C checks the DOS signature list when we try to load the
TSR. If the TSR is already loaded, resident_C forces an exit without loading
the TSR a second time. The signature for our program is held in the variable
uiCC_Sig, defined as a random four-digit integer. The last parameter to
inittsr2() is not used for our purposes and should be set to NULL.
The inittsr2() function is important because it does so much to define the
behavior of the TSR. Not only does it set the primary hot-key and control
function, signature, and extra memory space; it also hooks the keyboard
interrupt and timer interrupts. Furthermore, the function actually loads our
program in the resident state. Before terminating resident, however, it calls
the user-defined function tsrpre(). This is the function in which we perform
any initializations before the TSR terminates resident--in this case, to
compute the dimensions of the screen and print a greeting message.
We also call another resident_C function, tsrhotkeys(), see Figure 1(c), which
tells resident_C to respond to two additional hot keys. These additional hot
keys are defined in an array, ahkHotKeys[], as LShift-Alt-C and LShift-Alt-R.
The first parameter to tsrhotkeys() is the TSR's signature, the second is the
array of hot keys, and the last is the number of hot keys defined in the
array. Now the TSR will get control when any of the three hot keys are
pressed.
When one of these key sequences is detected, resident_C takes control and
calls the TSR control function, vResprog(). (Recall that a pointer to this
function can be found in the third parameter to inittsr2().) When invoked,
vResprog() determines whether a hot key was pressed by checking the global
variable _hotkey_hit. This variable is set by the resident_C startup code; it
is 0 if no hot key was pressed, and non-zero otherwise.
vResprog() next determines which hot key was pressed by checking the global
variable _hotkey_num. The resident_C startup code sets _hotkey_num to a unique
number representing the hot key. A value of 0 indicates the primary hot key as
defined by parameters one and two in function inittsr2(). A non-zero value
represents one of the other hot keys, those defined in the call to function
tsrhotkeys(). The primary hot key (LShift-Alt-P) initiates stuffing of the
keyboard buffer with text from the TSR's internal buffer. A _hotkey_num value
of 1 indicates that hot key LShift-Alt-R was pressed, in which case the TSR
removes itself from memory. A _hotkey_num value of 2 means that LShift-Alt-C
was pressed, and the TSR proceeds to highlight text on the screen for copying.


Pasting Text


Perhaps the most fascinating aspect of this TSR is how it handles the pasting
of text into an application. The standard method for pasting text into DOS
applications (which I use here) is stuffing the keyboard buffer. When
LShift-Alt-P is pressed, vResprog() sets some global variables and calls the
resident_C function kb_init(); see Figure 2
Figure 2: Using the hook function kybdstuf2() to stuff all the bytes in the
buffer into the application

 kb_str_ptr = szDataBuffer;
 kb_in_progress = strlen(szDataBuffer);
 kb_init();

Function kb_init() hooks the system timer to a resident_C internal function
called kybdstuf2(), which uses two global variables. The global variable
kb_str_ptr points to the TSR's internal buffer, which contains the array of
keystrokes to stuff. The global variable kb_in_progress is set to the number
of bytes in the array to stuff. The call to kb_init() tells the library
function kybdstuf2() to begin stuffing the keyboard buffer. As each byte is
stuffed, kybdstuf2() decrements kb_in_progress. When the whole buffer is done,
the value of kb_in_progress is 0. Because a maximum of 16 bytes can be fit
into the keyboard buffer at a time, kybdstuf2() stuffs only a few bytes at a
time. But because it is invoked repeatedly from the timer interrupt, function
kbdstuf2() eventually finishes the entire array.
Notice that this method for pasting text assumes that the DOS program reads
the keys from the keyboard buffer as they are stuffed. It is therefore
imperative that the TSR periodically relinquish control so that the foreground
program can get control to read the keystrokes. When all the text has been
stuffed, the TSR calls _kb_close() to disable the stuffing function
kybdstuf2().


WINDOS


You may have noticed that very little of the TSR is dedicated to interfacing
with WINDOS. Only the two small interrupt service routines isr60()and isr61()
are needed for this purpose. Most of the code in the TSR is used to copy and
paste text. A similar imbalance holds for WlNDOS; most of its code (see
Listing Three and Listing Four) is used to exchange data with the Windows
clipboard. Relatively little is used to interface with the TSR.
The initialization code in WINDOS is executed when the window procedure
mAINwNDpROC() receives a WM_CREATE message. The initialization code tells
Windows to report WM_TIMER messages every half-second. It also registers a
proprietary "signature" format with the clipboard. Finally, it allocates the
transfer buffer using GlobalDosAlloc(). This buffer is where WINDOS and the
TSR swap text back and forth. The transfer buffer must reside in system memory
below the 1-Mbyte real-mode ceiling, making it addressable by the TSR.
With every timer message, WINDOS polls the clipboard to see if any new data is
waiting. It polls the clipboard by checking for the signature on the
clipboard. If the signature is present, the clipboard contains "stale" data
(data not to be copied to the TSR), either because it came from the TSR to
begin with, or because WINDOS copied it to the TSR during a previous timer
message. If the signature is missing from the clipboard, it means another
Windows application has overwritten the clipboard with new data. In this case,
WINDOS checks that the new data is in the CF_TEXT format, and if so, copies it
to the TSR by simulating real-mode interrupt 0x60.
Next WINDOS polls the TSR by simulating real-mode interrupt 0x61. If the TSR's
internal buffer contains new text copied from the screen, then isr61() copies
the text to the transfer buffer pointed to by DS:DX and then returns 1 in AX.
WINDOS checks AX when the simulated interrupt returns, and if AX is equal to
1, it copies text from the transfer buffer to the clipboard. Otherwise, it
closes the clipboard without doing anything.
Notice that if Windows is running in real mode, there is no need to simulate
real-mode interrupts using DPMI because it is not available. WINDOS checks the
Windows mode with a call to GetWinFlags(). If Windows is running in real mode,
WINDOS uses the old Microsoft int86x() function instead.


Conclusion


To set up for improved text exchange, run TSR.EXE before running Windows. Then
run Windows using windos on the command line. That's all there is to it.
Remember that the hot keys LShift-Alt-P and LShift-Alt-C are for use only
within DOS applications and have no effect in Windows applications; use the
standard cut/paste in Windows applications.


Products Mentioned


/*resident_C*/ South Mountain Software 76 South Orange Ave. South Orange, NJ
07079 201-762-6965 $249 Requires a supported C compiler
DDJ


_BRIDGING THE GAP WITH RESIDENT_C_
by Charles Albert Mirho



[LISTING ONE]


/* tsr.h */
#define FALSE 0
#define TRUE 1

/* Some commonly used scan codes */
#define SC_ESCAPE 0x01
#define SC_PRTSC 0x37
#define SC_SPACE_BAR 0x39
#define SC_CAP_LOCK 0x3A
#define SC_NUM_LOCK 0x45
#define SC_SCROLL_LOCK 0x46

/* toggle key codes for inittsr() */
#define ALT_KEY 0x08
#define CTRL_KEY 0x04
#define LEFT_SHIFT 0x02
#define RIGHT_SHIFT 0x01

/* TSR signature */
#define SIG *((unsigned int *)"ESI")


extern unsigned char Tsr24Result, Tsr24Error; /* used for int24 */

struct HKEYS{
 unsigned char hotscancode;
 unsigned char hottoggle;
 unsigned char key_identifier;
 };

struct sharedata{ /* shared structure definition */
 unsigned oldpsp;
 unsigned ourpsp;
 unsigned ourextra;
 unsigned tsr_size;
 unsigned tsr_stat;
 unsigned char hottoggle;
 unsigned char hotscancode;
 unsigned char safewait;
 unsigned char oldbreak;
 unsigned char freeflag;
 unsigned char beepflag;
 unsigned char more_hotkeys;
 unsigned char cdummy;
 struct HKEYS far *hotkey_data;
 int far *indosflag;
 int far *critflag;
 int far *oldint8;
 int far *oldint9;
 int far *oldint10;
 int far *oldint13;
 int far *oldint16;
 int far *oldint1b;
 int far *oldint1c;
 int far *oldint21;
 int far *oldint23;
 int far *oldint24;
 int far *oldint28;
 int far *shareptr;
 unsigned int oldss;
 unsigned int oldsp;
 unsigned int ourss;
 unsigned int oursp;
 int far *resident;
 int far *dosstack;
 int far *olddta;
 int far *ourdta;
 unsigned int ourcs;
 unsigned long vec_table[256];
 unsigned long our_floats[44];
 unsigned long float_hold[44];
 };






[LISTING TWO]


/**************************************************************************
* File Name: tsr.c
* Description: Main body of the TSR. Traps hot keys LSHIFT-ALT-C (copy text),
* LSHIFT-ALT-P (paste text), LSHIFT-ALT-R (remove TSR from memory).
**************************************************************************/
#include <stdio.h>
#include <dos.h>
#include "tsr.h"
#include "isr.h"
#include "clip.h"

extern unsigned long cculMode;
extern int cciColumns;
extern int cciRows;
extern BYTE far *ccfpVidMem;
extern ATTRIBUTE ccatDef25x80;
extern ATTRIBUTE ccatSecond25x80;
extern BYTE *ccabAttrBuffer; //Undo buffer of attributes
char szDataBuffer[3000] = "Yo de do de do."; //Holds transfer data
BOOL bNewData = FALSE;

#define SCAN_C 46 //popup scan code
#define SCAN_P 25 //popup scan code
#define SCAN_R 19 //menu scan code
unsigned int uiCCSig = 0x9191; //randomly selected signature code
extern int _hotkey_num; //will hold value of hot key pressed
 //additional hot keys to trap
extern char *kb_str_ptr; //points to buffer for the
 //keystuffer routine to use
extern unsigned kb_in_progress; //non-zero if more bytes waiting
 //in buffer to be stuffed
extern int _hotkey_hit; //true if a hot key was pressed
struct HKEYS ahkHotKeys[] =
{
 SCAN_R, LEFT_SHIFT ALT_KEY, 1,
 SCAN_C, LEFT_SHIFT ALT_KEY, 2,
};
void tsrpost ();
void tsrpre ();
void vResprog(); //function which gets control upon invocation
int isr60();
int isr61();

RECTANGLE rtScreen;
RECTANGLE rtRect = {20,5,20,5};

/****************************************************************************
 FUNCTION: Main
 PURPOSE: Main of TSR program
****************************************************************************/
main (int argc, char *argv[])
{
int irc;
 cciColumns = 80;
 cciRows = 25;
 ccfpVidMem = (unsigned char far *) 0xB8000000;
 MAKEATTRIBUTE (ccatDef25x80, COLOR_TEXT_BLUE, COLOR_TEXT_RED,
 INTENSITY_HIGH, NOBLINK);
 MAKEATTRIBUTE (ccatSecond25x80, COLOR_TEXT_RED, COLOR_TEXT_BLUE,

 INTENSITY_HIGH, NOBLINK);
 if ((ccabAttrBuffer = (BYTE *) calloc (cciColumns*cciRows,
 sizeof(BYTE)))==NULL)
 {
 printf ("Allocation error\n");
 } //End if (not enough memory)
 //Test if TSR already loaded. If not, do initialization code
 if (!tsrloaded (uiCCSig))
 {
 //initialization code
 //initializes a TSR that works off the system timer
 if ((irc=inittsr (LEFT_SHIFTALT_KEY, SCAN_P, vResprog, 1000, uiCCSig, SIG,
 (void far *) 0L)) != 0)
 {
 printf ("Error loading TSR: Error %d\n", irc);
 return 1;
 } //End if (failed to initialize TSR)
 } //End if (TSR not loaded already)

} //End function (main)

/* This function is called on every clock tick, approx. 18 times per sec.
 It should do its processing and exit as quickly as possible */
void vResprog ()
{
static bStuffing = FALSE;
 //The variable kb_in_progress holds the number of bytes
 //remaining in the buffer that need to be stuffed into application.
 if(kb_in_progress)
 {
 return;
 } //End if (in the middle of stuffing keystrokes)
 if (!_hotkey_hit)
 {
 return;
 } //End if (no hot key pressed)
 //If we were previously stuffing but now we are done, clean up
 if (bStuffing && !kb_in_progress)
 {
 bStuffing = FALSE;
 kb_close ();
 } //End if (done stuffing)
 if (_hotkey_hit)
 {
 _hotkey_hit = 0;
 if (_hotkey_num == 0)
 {
 //Now initiate the hook function kybdstuf2() to stuff
 //all the bytes in the buffer into the application
 kb_str_ptr = szDataBuffer;
 kb_in_progress = strlen(szDataBuffer);
 kb_init();
 bStuffing = TRUE;
 return;
 } //End if (hot key to paste)
 if (_hotkey_num == 1)
 {
 if (freetsr())
 {

 printf ("Error - could not free tsr.\n");
 } //End if (failed to free)
 freeisr1 ();
 freeisr2 ();
 } //End if (hot key to free tsr)
 if (_hotkey_num == 2)
 {
 cciSaveAttrRect(&rtScreen);
 if (cciHilightRect(&rtRect) == ccSUCCESS)
 {
 cciSaveCharRect (&rtRect);
 bNewData = TRUE;
 } //End if (sucessful highlight)
 cciRestoreAttrRect(&rtScreen);
 } //End if (hot key to clip screen)
 } //End if (hot key hit)
} //End function (vRespProg)

void tsrpre ()
{
 tsrhotkeys (uiCCSig, (struct HKEYS far *)ahkHotKeys, 2);
 MAKERECTANGLE (rtScreen, 0, 0, cciColumns-1, cciRows-1);
 initisr1 (0x60, isr60, 1);
 initisr2 (0x61, isr61, 1);
 printf ("TSR Loaded.\n");
 printf ("LSHIFT-ALT-C to copy\n");
 printf ("LSHIFT-ALT-P to paste\n");
 printf ("LSHIFT-ALT-R to remove from memory\n");
 return;
} //End function (tsrpre)
void tsrpost ()
{
 if (ccabAttrBuffer)
 {
 free (ccabAttrBuffer);
 } //End if (attribute buffer defined)
 printf ("\n** TSR removed from memory. **");
 printf ("\n** Hit enter to continue. **\n");
 return;
} //End function (tsrpost)
char far *szTransferBuf;
int i;
isr60 ()
{
 FP_SEG (szTransferBuf) = isr1_ds;
 FP_OFF (szTransferBuf) = isr1_dx;
 i = 0;
 while (szTransferBuf[i] != 0)
 {
 szDataBuffer[i++] = szTransferBuf[i];
 } //End while (more bytes to transfer)
 szDataBuffer[i] = 0;
 return 0;
} //End function (isr60)
isr61 ()
{
 if (!bNewData)
 {
 isr2_ax = 0;

 return 0;
 } //End if (no new data to copy)
 bNewData = FALSE;
 FP_SEG (szTransferBuf) = isr2_ds;
 FP_OFF (szTransferBuf) = isr2_dx;
 i = 0;
 while (szDataBuffer[i] != 0)
 {
 szTransferBuf[i] = szDataBuffer[i];
 i++;
 } //End while (more bytes to transfer)
 szTransferBuf[i] = 0;
 isr2_ax = 1;
 return 0;
} //End function (isr61)





[LISTING THREE]

/* Windos.h */
int PASCAL WinMain(HANDLE, HANDLE, LPSTR, int);
long FAR PASCAL MainWndProc(HWND, unsigned, WORD, LONG);
int pmiIntX (int iIntNo, LPSTR szSwapData);

#define BUFFER_SIZE 3000L //size of transfer buffer
#define TEXT_INC 15 //space between lines of text
#define SIG "windos" //Signature to place on clipboard





[LISTING FOUR]

/****************************************************************************
 PROGRAM: WINDOS
 PURPOSE: Reads text from the clipboard and writes it to a DOS TSR using
 DPMI. Or, reads text from a DOS TSR and writes it to clipboard.
****************************************************************************/
#include "windows.h"
#include "windos.h"
#include "dpmi.h"
#include <dos.h>
#include <string.h>
#include <stdlib.h>

HWND hwndMain; //Global handle to main window
WORD cfSignature; //signature format (proprietary)
char szClipData[3000];
SELECTOR PMSelector; //protected mode selector
SEGMENT RMSegment; //real-mode segment
DWORD dwSegSelector; //seg:selector pair returned by
 //GlobalDosAlloc()
LPSTR lpTransferBuf; //Points to transfer buffer

/****************************************************************************

 FUNCTION: WinMain(HANDLE, HANDLE, LPSTR, int)
 PURPOSE: Calls initialization functions and processes message loop
 Only one instance of the program is allowed to run.
****************************************************************************/
int PASCAL WinMain(HANDLE hInstance, HANDLE hPrevInstance, LPSTR lpCmdLine,
 int nCmdShow)
{
MSG msg;

 if (!hPrevInstance)
 {
 if (!InitApplication(hInstance))
 {
 return (FALSE);
 } //End if (failed initializing application params)
 } //End if (no previous instance running)
 else
 {
 return FALSE;
 } //End if (previous instance already running)
 if (!InitInstance(hInstance, nCmdShow))
 {
 return (FALSE);
 } //End if (failed initializing instance data)
 while (GetMessage(&msg, NULL, NULL, NULL))
 {
 TranslateMessage(&msg);
 DispatchMessage(&msg);
 } //End while (not a quit message)
 return (msg.wParam);
} //End function (WinMain)
/****************************************************************************
 FUNCTION: InitApplication(HANDLE)
 PURPOSE: Initializes window data and registers window classes
****************************************************************************/
BOOL InitApplication(HANDLE hInstance)
{
WNDCLASS wc;
 wc.style = NULL;
 wc.lpfnWndProc = MainWndProc;
 wc.cbClsExtra = 0;
 wc.cbWndExtra = 0;
 wc.hInstance = hInstance;
 wc.hIcon = LoadIcon(hInstance, "SMILEY");
 wc.hCursor = LoadCursor(NULL, IDC_ARROW);
 wc.hbrBackground = GetStockObject(WHITE_BRUSH);
 wc.lpszMenuName = NULL;
 wc.lpszClassName = "WinDos";
 return (RegisterClass(&wc));
} //End function (InitApplication)
/****************************************************************************
 FUNCTION: InitInstance(HANDLE, int)
 PURPOSE: Saves instance handle and creates main window
****************************************************************************/
BOOL InitInstance(HANDLE hInstance, int nCmdShow)
{
HDC hDC;
TEXTMETRIC tm;
 hwndMain = CreateWindow(

 "WinDos",
 "WinDos",
 WS_OVERLAPPEDWINDOW,
 CW_USEDEFAULT,
 CW_USEDEFAULT,
 CW_USEDEFAULT,
 CW_USEDEFAULT,
 NULL,
 NULL,
 hInstance,
 NULL
 );

 if (!hwndMain)
 {
 return (FALSE);
 } //End if (error creating main window)
 ShowWindow(hwndMain, SW_SHOWMINIMIZED);
 UpdateWindow(hwndMain);
 return (TRUE);
} //End function (InitInstance)
/****************************************************************************
 FUNCTION: MainWndProc(HWND, unsigned, WORD, LONG)
 PURPOSE: Processes messages for WINDOS application.
****************************************************************************/
long FAR PASCAL MainWndProc(HWND hWnd, unsigned message, WORD wParam, LONG
lParam)
{
HANDLE hClipData, hSig;
LPSTR lpClipData, lpSig;
int i, j;
 switch (message)
 {
 case WM_CREATE:
 if (GetWinFlags() & WF_ENHANCED)
 {
 MessageBox (hWnd, "Cannot use WINDOS in 386 enhanced mode", "Error", MB_OK);
 PostQuitMessage(0);
 break;
 } //End if (running 386 enhanced mode)
 // Allocate the transfer buffer
 // Take selector from low word of return value
 // (real-mode segment is in high word)
 if ((dwSegSelector = GlobalDosAlloc (BUFFER_SIZE+1))==NULL)
 {
 MessageBox (hWnd, "Unable to allocate transfer buf", "Error", MB_OK);
 PostQuitMessage(0);
 break;
 } //End if (could not allocate transfer buf)
 PMSelector = LOWORD (dwSegSelector);
 RMSegment = HIWORD (dwSegSelector);
 // Set pointer to transfer buffer
 lpTransferBuf = (LPSTR) MAKELONG (0, PMSelector);
 //Register our proprietary format with the clipboard
 if (!(cfSignature = RegisterClipboardFormat("Sig")))
 {
 MessageBox (hWnd, "Unable to register clipbd format", "Error", MB_OK);
 PostQuitMessage(0);
 break;
 } //End if (error registering proprietary clipboard format)

 //Receive timer messages every 1/2 second
 if (!SetTimer (hWnd, 1, 500, NULL))
 {
 MessageBox (hWnd, "Unable to create timer", "Error", MB_OK);
 PostQuitMessage(0);
 break;
 } //End if (could not create timer)
 break;
 case WM_TIMER:
 //Check the clipboard; if signature not present, then copy
 //data.
 do
 {
 if (OpenClipboard(hWnd))
 {
 if (IsClipboardFormatAvailable (cfSignature))
 {
 CloseClipboard ();
 break;
 } //End if (signature present - stale data)
 else
 {
 if (IsClipboardFormatAvailable (CF_TEXT))
 {
 if (!(hClipData = GetClipboardData (CF_TEXT)))
 {
 CloseClipboard ();
 break;
 } //End if (error getting data from clipboard)
 if (!(lpClipData = GlobalLock (hClipData)))
 {
 CloseClipboard ();
 break;
 } //End if (error dereferencing clip handle)
 } //End if (text available on clipboard)
 else
 {
 CloseClipboard ();
 break;
 } //End if (no text on clipboard)
 } //End if (new data on the clipboard)
 CloseClipboard ();
 } //End if (able to open clipboard)
 else
 {
 break;
 } //End if (unable to open clipboard)

 //Now copy the data to the TSR
 //Do the real-mode interrupt to get the TSR's attention.
 pmiIntX (0x60, lpClipData);
 //Now copy signature to clipboard
 if (!(hSig = GlobalAlloc(GMEM_MOVEABLE GMEM_SHARE, (DWORD)lstrlen(SIG))))
 {
 break;
 } //End if (couldn't find the memory)
 if (!(lpSig = GlobalLock(hSig)))
 {
 GlobalFree(hSig);

 break;
 } //End if (couldn't lock the memory)
 lstrcpy (lpSig, SIG);
 GlobalUnlock(hSig);
 if (OpenClipboard(hWnd))
 {
 SetClipboardData(cfSignature, hSig);
 CloseClipboard();
 } //End if (clipboard open)
 } while (FALSE);
 do
 {
 if (!pmiIntX (0x61, szClipData))
 {
 if (OpenClipboard(hWnd))
 {
 //Now copy data to clipboard; allocate a few extra
 //bytes for CR - CR-LF combos.
 if (!(hClipData = GlobalAlloc(GMEM_MOVEABLE GMEM_SHARE, (DWORD) 3300)))
 {
 CloseClipboard ();
 break;
 } //End if (couldn't find the memory)
 if (!(lpClipData = GlobalLock(hClipData)))
 {
 CloseClipboard ();
 GlobalFree(hClipData);
 break;
 } //End if (couldn't lock the memory)
 //Copy the data to clipboard buf, translating
 //CR to CR-LF combo; this is the clipboard
 //TEXT format.
 i=0;
 j=0;
 while (szClipData[j] != 0)
 {
 if (szClipData[j] == 0x0D)
 {
 lpClipData[i] = 0x0D;
 i++;
 lpClipData[i] = 0x0A;
 } //End if (CR found)
 else
 {
 lpClipData[i] = szClipData[j];
 } //End if (normal byte)
 i++;
 j++;
 } //End while (more bytes to copy)
 lpClipData[i] = 0;
 GlobalUnlock(hClipData);
 //Now copy signature to clipboard
 if (!(hSig = GlobalAlloc(GMEM_MOVEABLE GMEM_SHARE, (DWORD)lstrlen(SIG))))
 {
 CloseClipboard ();
 break;
 } //End if (couldn't find the memory)
 if (!(lpSig = GlobalLock(hSig)))
 {

 CloseClipboard ();
 GlobalFree(hSig);
 break;
 } //End if (couldn't lock the memory)
 lstrcpy (lpSig, SIG);
 GlobalUnlock(hSig);
 EmptyClipboard();
 SetClipboardData(CF_TEXT, hClipData);
 SetClipboardData(cfSignature, hSig);
 CloseClipboard();
 } //End if (clipboard opened)
 } //End if (got data from TSR)
 } while (FALSE);
 break;
 case WM_DESTROY: /* message: window being destroyed */
 KillTimer (hWnd, 1);
 GlobalDosFree (PMSelector);
 PostQuitMessage(0);
 break;
 default: /* Passes it on if unproccessed */
 return (DefWindowProc(hWnd, message, wParam, lParam));
 } //End switch (on message)
 return (NULL);
} //End function (MainWndProc)
/****************************************************************************
 FUNCTION: pmiIntX
 PURPOSE: High level: Performs real mode interrupt
****************************************************************************/
int pmiIntX (int iIntNo, LPSTR szSwapData)
{
 if (iIntNo == 0x60)
 {
 //Copy data to transfer buffer, including trailing NULL
 _fmemcpy (lpTransferBuf, (LPSTR) szSwapData, lstrlen (szSwapData)+1);
 } //End if (writing data TO TSR)
 //If running protected mode, simulate real mode interrupt
 //Otherwise do an int86x()
 if (GetWinFlags() & WF_PMODE)
 {
 RM386_INT rmInt; //structure for making real-mode
 //interrupts from protected mode
 memset (&rmInt, 0, sizeof(rmInt));
 rmInt.edx = 0x0000;
 rmInt.ds = RMSegment;
 dpmiRMInt (iIntNo, 0, &rmInt, 0);
 if (iIntNo == 0x61)
 {
 if (rmInt.eax == 0)
 {
 return -1;
 } //End if (no TSR data to read)
 } //End if (reading data FROM TSR)
 } //End if (running in protected mode)
 else
 {
 union REGS inregs, outregs;
 struct SREGS sregs;
 sregs.ds = RMSegment;
 inregs.x.dx = 0;

 int86x (iIntNo, &inregs, &outregs, &sregs);
 if (iIntNo==0x61 && outregs.x.ax == 0)
 {
 return -1;
 } //End if (no TSR data to read)
 } //End if (running in real mode)
 if (iIntNo == 0x61)
 {
 //Copy data from transfer buffer, including trailing NULL
 _fmemcpy ((LPSTR) szSwapData, lpTransferBuf, lstrlen (lpTransferBuf)+1);
 } //End if (reading data FROM TSR)
 return 0;
} //End function (pmiDPMIIntX)
/****************************************************************************
 FUNCTION: pmiDPMIIntX
 PURPOSE: Low level: Performs simulated real mode interrupt using DPMI
****************************************************************************/
BOOL dpmiRMInt(WORD wIntno, WORD wFlags, RM386_INT far *rmInt, WORD
wStackVals)
{
 if (wFlags)
 {
 wIntno = 0x100;
 } //End if (flags set)
 _asm
 {
 push di
 mov ax, 0300h
 mov bx, word ptr wIntno
 mov cx, word ptr wStackVals
 les di, dword ptr rmInt
 int 31h
 jc error
 mov ax, 1
 jmp short done
 } //End (low-level dpmi real-mode interrupt)
error:
 _asm xor ax, ax
done:
 _asm pop di
} //End function (dpmiRMInt)






















May, 1992
VISUAL BASIC AND WINDOWS 3.1 EXTENSIONS


Custom controls are key to pencentric development


 This article contains the folowing executables: PENCKBK.ARC


Moshe Lichtman


Moshe is a product manager for Microsoft and the coauthor of the Complete
Guide to the C Language (Hod-Ami Publishing, 1988), a textbook published in
Israel. He can be contacted at One Microsoft Way, Redmond, WA 98052.


You've undoubtedly noticed a number of recent DDJ articles on programming
Windows applications using various flavors of Basic--Visual Basic, GFA Basic,
and the like. In this article, I carry that discussion into the realm of
pen-based application development by presenting an overview of the pen
extensions to the Microsoft Windows 3.1 API. I then describe how custom
controls make this new API accessible to Visual Basic applications, and how
they can turn conventional Visual Basic programs into pencentric applications.


Windows 3.1 Pen Extensions


"Windows for Pens" is the name given to a set of extensions to Version 3.1 of
Microsoft Windows. Windows for Pens adds about 70 new functions that support
pen-based applications to the existing 600-plus functions in the Windows 3.1
API.
As Figure 1 illustrates, Windows for Pens functions can be divided into six
groups: Pen Interface, Pen Data, Recognizer, Pen module, Drivers, and
Dictionary. Figure 1: Windows for Pens software architecture
The Pen Interface module includes most of the functions required to build
pencentric applications. It consists of 32 functions, the most important being
Process-Writing(). This is a high-level recognition function which handles the
entire recognition process. Your application passes control to this function,
which tracks pen motion and then returns a set of recognized characters. For
flexibility, the Pen Interface module also allows low-level access to the
recognition and data-conversion functions. The module also includes training
functions that facilitate contextual training and virtual event handlers that
add pen compatibility to existing Windows applications.
Pen Data functions include all the logic for ink and pen data manipulation and
data transfer between applications. Functions in this module provide
device-specific information (for example, stylus pressure, angle, and
proximity to surface), data compression, and additional data-conversion
functions.
Most of the functions in the remaining four modules are used by specialized
applications and by Windows for Pens device drivers. The Custom Recognizer
module includes functions which must be supported by any Windows for Pens
recognizer: configuration and initialization, recognition, and training. The
Pen and Drivers modules include functions which pen and display drivers must
support or use in addition to the functionality required from standard Windows
drivers.
The Dictionary module includes a single function, DictionaryProc, which
enables application developers to build their own dictionaries in addition to
the standard dictionary of language words provided with Windows for Pens.
Dictionaries can significantly enhance recognition in situations in which the
input to be recognized can be constrained to a set of words.
The pen extensions (excluding the recognition module) are reasonably compact
in size, under 130 Kbytes. Windows for Pens can run on a variety of platforms,
spanning a range of form factors and hardware configurations. In conjunction
with the forthcoming ROMable Windows, Windows for Pens will run comfortably on
diskless platforms with a total memory size (both ROM and RAM) of less than 2
Mbytes.
Designing a windowing environment to comfortably accommodate small form
factors is not a simple task. The answer is not just shrinking everything to
fit the smaller screen. The standard Windows environment does not work well in
this situation. However, the solution is not to adopt a new rigid paradigm
that forces a specific metaphor on application developers. Rather, the system
software should be modular and open enough to allow developers to easily
tailor their applications to target markets and platforms. Windows for Pens
provides this flexibility, as well as support for power management and Flash
memory, both of which are essential to small tablet and hand-held computers.
Architectural flexibility is also required in the area of handwriting
recognition. The embryonic nature of handwriting-recognition technology and
the need to support a multitude of different languages and markets dictates a
modular design. The Windows for Pen API enables ISVs and
handwriting-recognition vendors to build custom recognizers as well as support
foreign languages. In the future, recognition of cursive writing and voice
input will also fit into this architecture.


Visual Basic and Pens


To create a standard Windows application using Visual Basic, you interactively
design its interface by dragging and placing GUI elements, known as
"controls," onto a layout form. You specify the behavior of your application
by attaching Visual Basic code to these controls. Programmers can extend the
default functionality in Visual Basic by means of custom controls. Custom
controls are similar to Windows controls and are typically written in C,
following the specifications in the Visual Basic Control Development Kit
(CDK). Once created, custom controls are basically indistinguishable from
controls included with the product.
The pen-oriented custom controls added to Visual Basic provide a high-level,
interactive way of creating pencentric programs. Although some pencentric
applications need to use the full Windows for Pens API (for example, a
calligraphy program that relies on device-specific information such as stylus
pressure and angle), the needs of most pencentric applications can be
satisfied with less functions. These commonly used functions are found in the
Windows for Pens custom controls and are contained in the file pencntrl.vbx.
Currently, this file is available only as part of the Windows 3.1 Pen SDK, but
future versions of Visual Basic will include these controls as an integral
part of the development environment.


The BEdit and HEdit Controls


The BEdit and HEdit custom controls replace the standard Visual Basic
text-input controls. HEdit is a pen-enhanced version of the text-box control.
It supports most of the regular text-box properties and adds the handwriting
and inking capabilities from the Windows for Pens API. When running, it
accepts handwriting input and displays the corresponding recognized text. Many
of the initial pen applications will be form based, so the HEdit has an
additional BorderStyle setting, Underline, which applies only to single-line
controls.
BEdit, or "boxed edit" control, is a superset of HEdit and provides the
application with comb or box style guides that accept pen input. Each segment
or box accepts only a single character of input. The BEdit control is used to
work around the limitations of today's recognition algorithms. BEdit provides
important baseline and segmentation information to the recognizer. By dividing
up handwritten text into single characters placed one at a time into box or
comb guides, users write more neatly, and recognition accuracy is
significantly enhanced. BEdit properties control the size of the BEdit cells,
their shape (box or comb), and the number of rows and columns in a certain
control. BEdit is for fields which accept a predefined text length. For
example, phone numbers will typically not exceed 15 characters.


Pencentric Properties


Unlike typed text, the size of which is controlled by the application,
handwriting size depends solely on user habits. Users tend to write larger
because of the limited resolution and display quality found in today's pen
computers. Pen applications address this problem by allowing inking in areas
that extend beyond the displayed input box. This area is sometimes called the
"gray area." Its size is controlled via the "Inflate" properties, of which
there are four: InflateBottom, InflateLeft, InflateRight, and InflateTop. The
gray area becomes active only when the associated control receives focus
(which indicates that the user has started inking inside the visible
boundary). In delayed-recognition mode, the gray area is not active: ink can
then be drawn only inside the visible boundary.
It's no secret that present handwriting-recognition engines, even with
accuracy levels as high as 95 percent, are far from perfect. Pencentric
applications should be designed to minimize text entry and to utilize
contextual information during the recognition process. One popular method for
enhancing recognition accuracy is to constrain the character set for a
particular field. For example, there is no point in allowing alphabetic
characters in a social-security number field. By constraining the allowed
character set in the social-security number field to numerical only, we avoid
misrecognizing "Z" for "2" and "O" for "0" (zero).
In Visual Basic, you can use the CharSet property, which accepts the settings:
ALC_DEFAULT, ALC_LCALPHA, ALC_UCALPHA, ALC_NUMERIC, ALC_PUNC, ALC_MATH,
ALC_MONETARY, ALC_OTHER, and ALC_WHITE. These settings can be ORed together
and are controlled either at design time via a custom dialog, or at run time.
Ink is an exciting feature unique to pen applications. The ability to capture,
store, manipulate, and display ink opens up new application categories as well
as adding a new dimension to existing "text-centric" applications. Visual
Basic supports inking via the DelayedRecog property. When set to True, all
writing in the control remains as ink until the property is set to False. When
it is changed from True to False, the OnTap property is examined. If OnTap is
True, recognition of the collected ink will take place when the user taps on
the control with the pen. The pen custom controls incorporate two new events:
the RcResult and the Update. The RcResult event occurs whenever the control
receives recognition results from the recognizer. The returned result is a
pointer to the RcResult structure (Recognition Context), which forms a central
part of the Windows for Pens API. The RcResult structure contains a symbol
graph that describes the possible results from the ink, a handle to the ink,
and the best guess at the symbols. If DelayedRecog is True, the RcResult
structure does not contain any information about recognized symbols.
The Update event occurs before the control redraws the data. This differs from
the Change event, which redraws the data before it occurs. Update can be used
to format data so that flashes do not appear.
Aside from allowing quick erasing of the ink via the EraseInk method, Visual
Basic currently does not provide sophisticated ink-handling methods. To add
advanced functions such as ink selection, resizing, and editing you must
access the internal ink data structures and the underlying Windows for Pens
API. You can do this with Visual Basic's facility for declaring external DLL
calls and by using the handles to the ink and window structures provided by
the hInk and hWnd methods. I'll discuss these later.



An Example Application


PenCheckbook is a program that started out as a regular Visual Basic
application. I converted it into a pen application by replacing the standard
controls with pen custom controls.
The background consists of a bitmap of a check. This bitmap was first drawn
with Paintbrush, and cut-and-pasted into a picture field in the main program
form. In the nonpen version of the program, the main form includes five Edit
Controls to handle user input: PayToCtrl, AmountCtrl, AmountStrCtrl,
RemarksCtrl, and SignatureCtrl. Three other fields are display-only:
DateLabel, CheckNumLabel, and CurrencyLabel, which display, respectively, the
current date, a revolving check number, and the relevant currency. These
fields are controlled by the application but do not accept user input.
Figure 2 shows the main program display after conversion for pen input.
Captions have been added to identify the fields.
Figure 2: A PenCheckbook check
The application includes a global data structure that holds information about
the last check. In the conversion process, I modified this structure to hold
ink information and to be able to store and retrieve this information from a
file. However, I've not yet implemented file storage and retrieval. For now,
the items in the File and Edit menus (Save As, Open, Undo, and Clear) are
merely placeholders.
The Options menu includes the functions Balance, Hide, and Show Last. The
Balance function loads a second form, which displays the current balance and
enables the user to modify it. The current balance is automatically updated
when a check is signed by the user. If the user tries to write an amount that
exceeds the current balance, the application warns the user and cancels the
check. The menu bar can be removed via the Hide function. Tapping on the form
brings back the menu bar. This can be useful in pen applications which can be
controlled via gestures rather than menu functions. It supports a less
cluttered user interface while retaining the option to display additional
controls for training purposes or advanced functionality. Lastly, the Show
Last function displays the last check signed by the user.


Converting Checkbook to Use Pens


In modifying Checkbook to accept pen input, the standard edit controls were
replaced with pen custom controls. Consequently, the PayToCtrl, AmountStrCtrl,
and RemarksCtrl fields can process both handwriting and gestures. The original
AmountCtrl field was replaced by a boxed edit field which recognizes numbers
and gestures only. The BalanceCtrl field in the Balance form will be converted
to a BEdit on-the-fly (when the user presses the Change button). This whole
process requires adding less than ten lines of code. The new SignatureCtrl
field accepts the user's signature and retains it as ink.
Replacing the standard controls with pen controls is a slightly tedious task,
because there is no automated way to do it. You have to delete the old
control, drag the new HEdit or BEdit control, and redefine various properties
such as BorderStyle, BackColor, and CtlName. During this process you may
notice a number of new properties. The CharSet property enables you to define
the characters accepted by the input field; restricting the character set
substantially improves recognition.
When adding the SignatureCtrl field, I set the DelayedRecog property to True.
This results in an ink field which captures stylus input and can later
recognize ink if the user taps on the field when the OnTap property is set to
True. The program uses the signature to "lock" a certain check so that users
cannot change the amount on the check once it has been signed.
Modifying the AmountCtrl and BalanceCtrl fields (located on the second form)
requires more changes. The single edit control is replaced by two BEdit
controls for Dollars and Cents. You will need to adjust the Cell and Comb
properties to achieve the best space utilization without compromising
usability. The names of the controls now become CentCtrl, DollarBalanceCtrl,
and CentBalanceCtrl. I used the CharSet property to restrict input to
numerical values (and gestures).
At this stage in its evolution, the Checkbook program is ready for pen input.
Users can write and sign checks on screen similar to the way they would in a
paper checkbook. Standard gestures can be invoked to edit and manipulate the
various fields.
After using the application for a while, you'll realize that certain layout
changes make life easier. As mentioned before, handwriting is typically larger
than printed text. The controls built for keyboard input may be too small and
result in loss of ink when the pen exits the border of the control. By using
the Inflate properties of HEdit and BEdit, you can adjust the size and
location of the fields to fit the new method of input. The Inflate properties
don't change the visible borders of the controls; they affect only the gray
area which surrounds them. Thus, the user can write beyond the visible borders
without losing ink.


Accessing the Underlying API


Visual Basic provides several hooks that enable developers to tap into the
full functionality supported by the Windows for Pens API. These include the
RcResult and Change events and the hInk and hWnd handles.
The RcResult event occurs when a BEdit or HEdit control attempts to recognize
the ink. This event can be used as a trigger to create locking form fields in
form-based applications where a signature validates the data in the form. Once
the user stops inking in the signature field, your program can lock all the
input fields in the form so no changes can be made. To lock a field, merely
set the Enabled property to False. In PenCheckbook, once the user has signed
the check, the only way to unlock the input fields is by pressing the Pay
button or selecting Clear in the Edit menu. In both cases, the input field
will be initialized.
Because the RcResult event occurs before the recognition results affect the
display, it can be used to trap custom gestures. Listing One contains
declarations from the global declarations file of PenCheckbook. This file is
basically the Visual Basic version of PENWIN.H. These declarations are used in
processing the RcResult event.
Listing Two shows the ProcessGestures subroutine. ProcessGestures() does the
same thing as the "Hide" function in the Options menu, but it is invoked with
a gesture. If the user draws the "Circle H" gesture in the edit control, this
routine will hide the menu bar. Process_Gestures() is defined in the global
module and is called by each edit control when the RcResult event occurs.
As previously mentioned, Checkbook lets you store and display the last check
paid. When SignatureCtrl becomes an ink field, the CheckRec global structure
needs to be modified to hold a pointer to the ink structure rather than the
signature text. Listing Three contains declarations and code samples showing
how hInk, hWnd, and the Windows for Pens API enable the ink to be stored in
memory and retrieved later on.


Conclusion


By combining Visual Basic with pen custom controls, you get a tool that lets
you create simple pen applications quickly, and at the same time implement
truly pencentric programs via the underlying Windows for Pens API.
DDJ


_VISUAL BASIC AND WINDOWS 3.1_
by Moshe Lichtman


[LISTING ONE]

Type RcResult ' RcResult structure
 SYGraph As syg ' Symbol Graph element
 wREsultsType As Integer ' Status of RcResult handling
 cSyv As Integer
 lpsyv As Long
 hSyv As Integer
 nBaseLine As Integer
 nMidLine As Integer
 hPenData As Integer
 rectboundink As RECTSHORT
 pntEnd As PointShort
 lprc As Long
End Type


Type syg ' Symbol Graph structure
 rgpntHotSpotsArray As PointArray ' "hot spot" array
 cHotSpot As Integer
 nFirstBox As Integer
 lRecogVal As Long
 lpSye As Long ' Pointer to first Symbol Element
 cSye As Integer ' Number of Symbol elements
 lpSyc As Long
 csyc As Integer
End Type

Type SYE ' Symbol element structure
 Syv As Long ' Symbol Value
 lRecogVal As Long
 cl As Integer
 iSyc As Integer
End Type

'The following two functions copy data to/from Visual Basic strings
'and from/to memory pointed by long pointers.
Declare Sub VBTypeToCPointer Lib "PENCNTRL.VBX" (lpSrc As Any, ByVal
lpDest As Long, ByVal cb As Integer)
Declare Sub CPointerToVBType Lib "PENCNTRL.VBX" (ByVal lpSrc As Long,
lpDest As Any, ByVal cb As Integer)





[LISTING TWO]

Sub Process_cGestures
 Dim VBrc As RcResult
 Dim SyeTable() As SYE ' array of Symbol elements
 CPointerToVBType ByVal RcResult, VBrc, 80 'copy RcResult struct to VB var
 NumOfSymbols% = VBrc.SYGraph.cSye ' get number of Symbol elements
 lpSye& = VBrc.SYGraph.lpSye ' pointer to first Symbol Element
 ReDim SyeTable(NumOfSymbols%) ' re-define Symbol array
 ' copy Symbol elements
 CPointerToVBType ByVal lpSye&, SyeTable(0), NumOfSymbols% * 12
 If (NumOfSymbols% = 1) Then ' process only if single gesture
 Syv& = SyeTable(0).Syv
 SyvType& = Syv& \ &H10000 ' get Symbol type. Is it a gesture?
 If (SyvType& = SYVHI_GESTURE) Then ' if so, is it a circle-H gesture?
 If (Syv& = (SYV_CIRCLELOA + Asc("h") - Asc("a"))) Then
 HideCmd_Click ' hide menu-bar
 VBrc.wREsultsType = VBrc.wREsultsType Or RCRT_ALREADYPROCESSED
 End If ' mark to prevent further processing
 End If
 VBTypeToCPointer VBrc, ByVal RcResult, 80 ' update RcResult
 End If
End Sub





[LISTING THREE]


' DuplicatePenData() takes a handle to ink (hPenData or hInk) and returns a
' new hInk/hPenData that has a copy of the ink. NOTE: Whenever this function
' is used, a GlobalFree() call must be issued to release the memory.
Declare Function DuplicatePenData Lib "PENWIN" (ByVal hPenData As
Integer, ByVal gMemFlags As Integer) As Integer

' SendMessage() is the message sending function of the Windows 3.x API.
Declare Function SendMessage Lib "USER" (ByVal hWnd As Integer, ByVal
wMsg As Integer, ByVal wParm As Integer, ByVal lParam As Any) As Long

Sub PayButton_Click ()
 Amount! = Val(DollarCtrl.text) + Val(CentCtrl.text) / 100
 balance = balance - Amount!
 If balance < 0 Then
 Beep ' warn user if insufficient funds
 MsgBox "Insufficient Funds - Check Cancelled", 0, "ERROR!"
 balance = balance + Amount!
 Else
 check.NumField = CheckNumLabel.Caption ' store last check, remarks
 check.RemarksField = RemarksCtrl.text ' and store ink field
 check.SignatureField = DuplicatePenData(SignatureCtrl.hInk, ByVal 0)

 PayToCtrl.text = "" ' initialize input fields

 PayButton.Enabled = False
End Sub

Sub ShowLastCmd_Click () ' "Show Last" command function

 lParam = check.SignatureField ' Force integer into long
 CheckNumLabel.Caption = check.NumField ' restore last check and
 ' retrieve ink from memory
 lRet = SendMessage(SignatureCtrl.hWnd, WM_HEDITCTL, HE_SETINKMODE, lParam)
End Sub




























May, 1992
PROGRAMMING PARADIGMS


Frontier Wisdom




Michael Swaine


I wrote in this space a few months back about Dave Winer's plan to put a
system-level scripting language on the Mac. UserLand Frontier, which shipped
in January, is the result of that plan: a programming environment for writing
scripts that control the Macintosh operating system and/or applications. It
includes a C-like scripting language called "UserTalk," and can be used to
write scripts invocable from Frontier's own command-line sized window or to
produce double-clickable files. Although Apple has long talked about something
called AppleScript and has been talking about it more openly this spring,
Frontier is the first and at this writing still the only system-level
scripting software for the Mac. Having been developed more recently than the
batch-processing languages of UNIX and MS-DOS, it is not surprising that
Frontier is in many ways more powerful than these scripting predecessors.
Well, that's all good news for us Mac fanatics who have been waiting more or
less patiently during the past eight years for some sort of system scripting
language, but it isn't what I intend to discuss here. I do intend to discuss
three programming concepts embodied in Frontier's scripting language,
UserTalk--concepts that I think deserve consideration from anyone interested
in the design of development environments. These are: outlining as a model for
program structure, sharing the symbol table, and the concept of a
documentation server.


Collapse the Clutter


There's no great novelty in noticing that program structure is largely
hierarchical, nor in making that hierarchical structure visible by conventions
of indentation, nor in supplying a program editor that knows about and
enforces these conventions. What is novel about the use of outline structure
in Frontier is the thoroughness with which it has been integrated into every
aspect of the language, and the benefits that accrue from this no-compromise
approach.
Dave Winer and Doug Baron, Frontier's creators, wrote the book on outlining on
computers. Winer's entire career in software development has been an
occasionally detoured pursuit of the program editor that works the way he
thinks, and he thinks, apparently, in hierarchies. With his brother Peter and
Doug Baron, he created and owned the application category of outline
processors with ThinkTank, More, and Ready, until the concepts of outlining
were encompassed by word processors and other programs.
The Winers and Baron invented those outlining concepts. Or at least they
executed the first successful commercial implementations of them and
established how outlining should look to the user. Here are the basic ideas:
Each line of an outline is called a "heading" or "subheading." Top-level
headings are called "summits," and an outline can have more than one summit.
All other headings are subheadings of some heading, and subheadings can have
subheadings, in principle to any depth. Subheadings of a heading are indented
by a standard amount.
A heading can be collapsed so that its subheadings disappear, and any heading
with subheadings can be expanded to show them.
Navigation techniques such as special arrow-key definitions are provided to
let the user move through the outline either as a hierarchical structure or as
a flat block of text. Headings can be cut, copied, pasted, or moved, with or
without their subheadings, by standard click-and-drag methods.
A capability called "hoisting" lets the user focus on one heading as though it
and its subheadings were the whole universe. While collapsing hides headings
at a lower level, hoisting hides headings at the same or a higher level.
Nobody has focused on these capabilities as closely as Dave Winer has over the
past decade or so. So it is reasonable to assume that when Winer designed a
language and put into it everything he had learned about outlining, there
might be something to learn from looking at the result.
Outlining in Frontier serves the obvious purposes: to hide program detail and
to reveal program structure. We can see how well it succeeds by looking at the
way it handles the ubiquitous if and for structures.Example 1 shows a piece of
Frontier code.
Example 1: Typical Frontier code

 for i = 1 to 12 << zero the month table
 month[i] = 0
 if fileIsFolder() << recurse through its files
 getModDates(f)
 else << get its modification date
 modDate = fileModified(f)


Other structured code is handled similarly. The case structure involves three
levels of headings, with the case branches at the second level and the actions
at the third. The on verb, used to define other verbs, has the same basic
structure as for. (Frontier's vocabulary consists chiefly of functions, which
the documentation calls "verbs.")
Mere indentation may reveal some structure, but it doesn't hide anything. An
important feature of outlines is that all subheadings of a heading can be
collapsed. In Frontier you collapse or expand the subheadings of a heading by
double-clicking on the heading. So an if structure can be collapsed to the if
statement itself or expanded to the full structure.
The fact that an if structure can be collapsed in this way suggests a useful
commenting convention. Placing a comment on the same line as the if statement
that summarizes the collapsed body of the if makes it possible to follow the
logic of the program with the structure collapsed.
Although it's obvious, it may be worth pointing out that outlines break a
certain symmetry of most programming languages: Blocks of code are not
bracketed in Frontier by a structural statement at the beginning and end of
the block. There are no end statements in Frontier. It may not be an important
point, but I see it as desirable in eliminating a purely structural element
from the language.
On the other hand, Frontier introduces its own purely structural element in
the form of the bundle verb. Bundle is a device for grouping lines of code
into logical blocks when they are not parts of any actual common structure.
Example 2(a), for instance, isn't new with Frontier, but combined with the
ability to collapse subheadings, it's a powerful tool for hiding detail and
revealing structure. The initialization code in the example is an ideal
candidate for bundling.
Example 2: Tools for structuring initialization code: (a) using the bundle
verb; (b) syntax of a local verb; (c) a UserTalk declaration; (d) collapsing a
heading that has subheadings.

 (a)

 bundle << set path and menus
 Frontier.pathstring = file.getpath()
 menu.currentSuite = " "

 (b)

 local (cust = " ",amt = 0,custNo = 0)

 (c)

 local

 cust = " "
 amt = 0
 custNo = 0

 (d)

 local << customer data


Another tool for structuring initialization code is the local declaration. The
local verb has a conventional syntax: see Example 2(b). But you can also write
that declaration like that in Example 2(c). This is a heading with
subheadings, so it can be collapsed to look (with a comment added) like
Example 2(dc).
Comments in Frontier don't have to be on the same line as code, of course.
When a comment appears on a separate line, it is a heading, and all the
outlining techniques apply to it. This suggests a commenting style in which
the first comment of a block is a one-line summary, and the following, more
detailed lines are subheads that can be collapsed or expanded. In fact, the
Frontier editor will force any subheading of a comment to be a comment. This
makes it easy to comment out a block of code, just by commenting its first
line.


Share the Symbol Table


Every programming language has a symbol table.
What distinguishes Frontier's symbol table, called the "Object Database," is
that it is permanent, browsable, editable, and hierarchical. But it's more
than a symbol table, really, because it can hold various objects not
necessarily having any connection with any code.
The fact that it is hierarchical is, of course, no surprise. A good deal of
Frontier is outline structured. You can edit the program's (hierarchical)
menus in an outline-processing window, create and save outlines using Frontier
as an outline processor, and edit scripts as outlines. Although the Object
Database isn't canonically presented as an outline, there is a facility for
examining it in an outline window.
But the canonical view of the Object Database is as a hierarchy of tables,
tables invariably consisting of one or more rows and the same three columns:
name, value, and kind.
In the Object Database you can directly modify the name of any entry, the
value of any scalar entry, and in some cases, the kind (that is, the data
type). You can put pretty much any kind of data into the tables of the Object
Database: the time of last backup, a to do list, a formatted form-letter
template, the user name, a table of e-mail accounts, numbers in any standard
numeric type, strings, chars, points, rectangle coordinates, RGB values for
colors, or PICT-format pictures. The Object Database can also hold code; in
fact, all scripts live in it, as does much of UserTalk.
Frontier puts global variables into the Object Database, and you can watch a
variable change as a script runs, much as you would in a debugger window.
Although to a user browsing through it the Object Database looks like a
database, to a scripter it looks more like a symbol table or other internal
data structure. From scripts, you address the objects in the Object Database
by name, as in Example 3(a); by name and index, as in Example 3(b); or by
address, as in Example 3(c)
Example 3: Addressing objects in Frontier's Object Database, using (a) names;
(b) names and indexes; (c) addresses.

 (a)

 table.myStuff

 (b)

 table [1]

 (c)

 @table.myStuff
 adr = @table.myStuff
 x = addr^


In Example 4, the address operator @ can be undone by the dereference operator
"^". This gives x the value of the Object Database address table.myStuff.
Example 4. Using @ to undo the address operator.

Using @ to undo the address operator
addr = @table.myStuff
x = addr^


At least one table in the Object Database has a special purpose. The agents
table holds scripts of agents. Agents are background scripts. They are written
just like any other script, except that: 1. they must include the sleepfor
verb that tells how long Frontier should wait before polling them, and 2. they
reside in the agents table. Some sample agents shipped with Frontier are
scripts that display the number of seconds since Frontier shipped, the current
time, and the path to Frontier.
Agents actually have to be placed in the right table to work. In some cases,
it is not obligatory to locate things in the right places in the Object
Database, but it is smart.
You, as a Frontier third-party developer, would typically write not isolated
scripts, but "suites" of scripts. A suite is a collection of scripts intended
to be used together, along with appropriate menus, a means of invoking the
whole suite, ditto for dismissing it, and documentation. A suite is a package
of technology.
As a Frontier third-party developer, you should put your suites in the suites
table of the Object Database. It's not obligatory, but when you realize that a
Frontier user will have to move all the important stuff from one Object
Database to another when installing a new version of Frontier, you'll see that
keeping things in logical places is pretty important.
The Frontier documentation emphasizes the permanent and hierarchical nature of
the Object Database, but I think that its openness is at least as significant.


Free the Documentation


Along with Frontier comes a stand-alone application called DocServer--a tool
for displaying online documentation on Frontier verbs. All the basic and
advanced verbs of the language are already documented in it as Frontier comes
out of the box, but UserLand intends for developers to use it in documenting
their own verbs and suites of verbs.
That's not too hard to do. DocServer reads from its own database, which it can
update by importing data from a text file. So, you can create documentation
either within Frontier or in a word processor and save it in a text file. Tabs
separate the name of an item, such as "parameters," from the text of that
item, and returns separate items. You can also add verb documentation to the
database on-the-fly by sending Interapplication Communication (IAC) messages.

DocServer implements a simple, straightforward format for documenting
commands. It's easy to follow the format and not so easy to violate it.
DocServer doesn't give you much access to tab and font and other formatting
decisions, but on the other hand it gives you great freedom in the chunks of
information you choose to present. You can supply examples, a BNF-style
definition, detailed information on required parameter, or whatever you think
constitutes documentation for a Frontier verb, in any combination and
quantity, and it'll all come out formatted pretty much the same. So there's
not much reason not to use DocServer to document your verbs, and if you do,
your documentation will have a familiar, standard look.
DocServer is linked to Frontier via IAC links, so when you control-click on a
verb in any script, Frontier sends a request to DocServer to display the
documentation for the verb. Hence the name DocServer. But this suggests a
broader use for DocServer. Because it is IAC aware, any other application that
can send messages ought to be able to drive DocServer, just like Frontier
does. This is particularly significant, given what UserLand says about the
licensing of DocServer:
DocServer is free. Like Apple's TeachText utility, DocServer should be on
every script writer's machine, regardless of what scripting software he or she
is using. So we will allow this product to be distributed at no cost with
other developers' products. Contact UserLand Software for details.
As I read it, that means that DocServer can be used to document any kind of
scripts on the Mac, including Quickeys macros, HyperTalk code, or AppleScript
scripts, if and when there is an AppleScript.
DDJ

























































May, 1992
C PROGRAMMING


The D-Flat Application Window


 This article contains the following executables: DFLT12.ARC D12TXT.ARC


Al Stevens


The APPLICATION window class is the foundation of any D-Flat application,
which must create an application window before doing anything further. The
application window hosts the menu bar, which hosts the pop-down menus, which
react to the user's commands to make the program do its job. You learned about
menus in an earlier "C Programming" column. The example Memopad program, which
I described earlier as well, shows you how to create an application window and
the command functions that its menu executes. This month, I'll explain how the
application window itself works.


Window Class Messages


Listing One is applicat.c. the code that implements the APPLICATION window
class. Its primary function is ApplicationProc, the window-processing module
for the class. A typical D-Flat npplication will include its own
application-window processing module, which intercepts the messages, processes
the ones it wants, and then calls this module through the DefaultWndProc
function. ApplicationProc uses a switch to interpret the messages. As with
other D-Flat window-processing modules. ApplicationProc calls functions to
process most of the messages. The names of the functions reflect the message
name.
The CREATE_WINDOW message. The CREATE_WINDOW message begins by modifying the
dialog box that lets the user change the characteristic of the display.
Depending on the video system, the user had different options. A VGA can
display 25, 43, or 50 lines, and that is how the dialog box is set up. If the
user has a VGA, no changes are made. An EGA system can display 25 and 43
lines, so the program modifies the dialog box to include only those options. A
CGA system can display only 25 lines, so the program modifies the dialog box
so that no line-selection options appear. Next, the CREATE_WINDOW message
checks the settings of the application's configuration-file variables that
control how the application window displays. These settings control whether
the window has a border, whether its data space is clear or textured, whether
there are title and status bars, how many lines are on the screen, and whether
the display is to be in color. monochrome. or reverse monochrome. The program
sets the various check-box and radio-button controls on the dialog box to
reflect the current settings. Then the program creates the menu- and
status-bar windows as child windows of the parent application window. Next,
the program calls a function to load the application's help database. Finally.
it sends the SHOW_MOUSE message to show the mouse cursor.
The HIDE_WINDOW message. The HIDE_WINDOW message normally tries to change the
focus if the window being hidden has the focus. If that window is the
application window, there will not be a window that can get the focus, because
a D-Flat application window is the ancestor of them all. Therefore, the
application window intercepts the HIDE_WINDOW message to itself and sets the
inFocus variable to NULL so that the program will not attempt to give the
focus to another window.
The ADDSTATIIS message. D Flat applications send the ADDSTATUS message to the
application window to write to or clear the text area of the status bar. The
ADDSTATUS message sends SETTENT or CLEARTEXT messages to the status bar's
window handle.
The SETFOCUS message. The APPLICATION window class has its own version of the
SETFOCUS message. It does most of what the NORMAL window class does, but it
does not do the repainting of overlapping and underlapping windows that
document windows need.
The SIZE message. The SIZE message to the application window must do two
things besides change the size of the application window itself. The menu- and
status-bar windows are children of the application window, and their sizes
depend on its size. Therefore, when the user changes the size of the
application window, the program must automatically adjust the size of the menu
and status bars. It does that by calling the functions that create those two
windows. The functions close the windows if they already exist, and then
create them with sizes that reflect the size of the application window.
The MINIMIZE message. D-Flat does not allow you to minimize the application
window. Therefore, it intercepts the MINIMIZE message and simply returns
without doing anything.
The KEYBOARD message. There are several keystroke values that the application
window intercepts and processes. The AIt+F4 keystroke is the accelerator key
for the Close conxti~nd on the application window s system menu. Therefore,
the program sends the application window a CLOSE_WINDOW message when the
window gets a KEY BOARD message with the Alt+F4 key stroke value. The user
presses Alt+F6 to step from window to witidow. The pro gram calls the
SetNextFocus and SkipSystemWindow functions when the application window gets
Alt+F6 in a KEYBOARD message. Alt-Hyphen calls the BuildSystemMenu function to
pop down the system menu for the application window. The application window
passes all other keystrokes to the menu-bar window.
Keystroke events send KEYBOARD messages to the window that has the focus. If
that window does not process the keystroke, it sends the message to its
parent. The parent window does the same thing--it either processes the
keystroke or passes the message to its parent. Eventually, an unprocessed
KEYBOARD message will get to the application window because it is at the top
of the family tree. If the application window does not process the message, it
sends the message to the menu bar, which tests to see if the keystroke is a
command-accelerator key or a menushortcut, Alt-key combination. This procedure
guarantees that every window gets the KEYBOARD messages that it needs.
The SHIFT_CHANGED message. The D-Flat user interface allows the user to switch
between a document window and the menu bar by pressing the F10 key or by
pressing and releasing the Alt key without pressing another key. The
window-processing module for the menu bar intercepts the F10 key and handles
the switch between windows.
The SHIFT_CHANGED message is sent when the event collection routines in the
message-processing code of DFlat senses that the user has pressed or released
a Shift, Alt, or Ctrl key. As with unwanted KEYBOARD messages, all windows
pass the SHIFT_CHANGED message to their parent window. The application program
eventually intercepts this message to watch for changes in the Alt key. The
program uses the AllDou'ii variable to indicate that the Alt key was seen to
be pressed. When another SHIFT_CHANGED message occurs with the Alt key
released, and when there has been no intervening keystroke while the Alt key
was down. the program senses that the Alt key has been pressed and released by
itself. The program then sends a KEYBOARD message with the F10 key value to
the menu bar.
The PAINT message. When the application window gets the PAINT message, it
simply clears its data space by calling the ClearWindow function. The cleaning
character will be either a space or the value defined as APPLCHAR, depending
on whether the user has selected a clear or textured application window.
The COMMAND message. The application window processes several of the COMMAND
messages. A COMMAND message can originate from within the application program,
or it can be the result of the user choosing a menu selection with which the
command's identification is associated. All the commands that the application
window processes come from menu selections.
In fact, all menu commands come to the application window first. The popdown
menu sends the associated COMMAND message to its parent. the menu bar, when
the user chooses a menu selection. The menu bar sends the message to its
parent, the application window. If the application window does not process the
COMMAND message, it sends the message to whatever document window has the
focus. Otherwise, the application window processes the COMMAND messages
discussed next. A COMMAND message is identified by a code in its first
parameter.
There are several help commands associated with selections on the Help menu.
Each of these calls the DisplayHelp function to display the associated help
screen.
The ID_DOS command is associated with the DOS Shell command on the File menu.
bus command calls the Shell DOS function which calls the DOS shell. First the
function hides the D Flat application window and swaps the cursor and screen
height configurations with what they were when the D Flat application program
started. Then the function hides the mouse, displays a message telling the
user how to get back into the D Flat application by typing the DOS EXIT
command, then spawns a new copy of the DOS command processor. When the command
processor retums. the program switches the screen and cursor back to their
D-Flat context and shows the application window and the mouse cursor.
The ID_EMT and ID_SYSCLOSE cornmands both post the CLOSE_WINDOW message to the
application. The ID EMT command is associated with the Exit selection on the
File menu, and the ID_SYSCLOSE command is associated with the Close selection
on the application window's system menu.
The ID_DISPLAY command is the result of the user choosing the Display
selection on the Options menu. It executes the Display dialog box and then
calls the various functions that affect how the application window is
displayed.
The ID_SAVEOPTIONS command saves the user-selected options to a configuration
file.
The ID_WINDOW command is sent when the user selects a document window from the
Window menu. This menu is built dynamically when the user selects it on the
menu baa It displa,ys the titles of the first nine document windows in the
application window's data space. The Window menu provides one of the ways for
the user to select a new document window to have the focus. The message finds
the chosen document window from among those in the application window's list
of child windows and sends the SETFOCUS message to the document window. If the
document window is minimized, the program also sends it the RESTORE command.
The ID_CLOSEALL command is associated with the Close All selection on the
Window menu. It sends every document window in the application window's list
of children a CLOSE_WINDOW message.
The ID_MOREW1NDOWS command is associated with the More Windows selection on
the Window menu. The menu includes that selection when there are more than
nine document windows. The command displays a dialog box with a list-box
control that has all the document-window titles. The dialog box serves as an
extension to the Window menu listin all of the document windows instead of onl
the first nine. The user can choose a window from the list box.
The CLOSE-WINDOW message. This message closes all the application window's
document windows and posts the STOP message to the D-Flat message system.
After the application window closes, the program resets the screen height to
whatever it was when the program started and calls a function to unload the
help database.


How to Get D-Flat Now


The D-Flat source code is on CompuServe in Library 0 of the DDJ Forum and on
M&T Online. If you cannot use either online service. send a formatted 360K or
72OK diskelle and an addressed, stamped diskette mailer to me in care of Dr
Dobb's Journal, 411 Borel Ave., San Mateo, CA 94402. I'll send you the latest
version of D-Flat. The software is free, but if you'd care to. stuff a dollar
bill in the mailer for the Brevard County Food Bank. They help the homeless
and hungry. If you want to discuss DFlat with me, use CompuServe. My ID is
71101.1262, and I monitor the DDJ Forum daily.


D-Sharp?


We're almost to the end of the D-Flat project. There are three installments
remaining that cover dialog boxes. message boxes, and the help system. Many of
you have asked if I have a C++ version of D-Flat in the works. I had not
considered such a project until recently. But now I think the time is right.
Several major Ccompiler vendors now have C++ compilers integrated into their
development environments. Microsoft's C++ compiler has been released. C++ is
poised to dominate software development. Bjarne Stroustrup has been predicting
that for years. Borland and Microsoft will make it happen.
The new project is in the planning stages, and it is forming into an
interesting case study on how the event-driven, message-based architecture of
DFlat maps onto the object-oriented class system of C++. No new ground is
being broken here. There are a number of class libraries that implement the
Windows user interface. They, however, must wrap the Windows message-based API
into C++ classes. The next generation of D-Flat will implement the API with a
class library. There are a number of design issues to address.
I do not expect the project to take as long as D-Flat has taken. Instead of
publishing every line of code in monthly installments. I plan to address the
problems and their solutions with extensive code examples along the way. Of
course, the entire source-code package will be available, just as D-Flat is
available now.
But what to call it? D-Flat++? Naw. If you increment the musical D-Flat note
one whole tone, you get D-Sharp. Maybe. Any suggestions? No, I won't call it E



The Data Compression Book


Last year I published C programs in this column that implement the Huffman
data-compression algorithms. D-Flat uses a variation on those programs to
compress the help database. More recently, Dr. Dobb's ran the DDJ Data
Compression Contest, where readers submitted their own data-compression
programs. Mark Nelson refereed the contest and wrote about the results, and
you can get the source code for the entrants from M&T Online and CompuServe.
Now. Nelson has written a book about data compression called TheData
Compression Book (M&T Books, 1991). M&T also publishes Dr. Dobb's, so I asked
them to send me a free copy to review. They did, but would you believe that
they sent me, their favorite C columnist, a rejected copy? It still had the
reject sticker on it. The spine was crinkled a little bit from being at the
hottom of the carton. I suspect. Didn't hurt the reading of it any, though.
I'll bet they don't send Ray Duncan any damaged goods.
Data compression has always been a fascinating study for programmers and
Nelson does justice to the subject. He covers many of the more popular
compression algorithms and explains them well, including C code to implement
them. The book includes a diskette with the code for the algorithms. He also
provides a running history of data compression that adds dimension to the
evolution of the technology.
I have often observed that those who write about Huffman, LZ77, LZ78, and LZW
compression do not explain those algorithms well enough. You need to visualize
what happens inside those processes to understand them, and nothing helps more
than pictures. Many explanations try to do it with words alone. Nelson uses
simple and effective graphs and C-code fragments to illustrate the compression
and decompression steps for each of the algorithms. There is a delightful
chapter on speech compression, and Nelson tops the book off with an archiving
program in C that uses the LZSS algorithm and that you can integrate into your
applications, extending it to add other compression algorithms if you want.
This book is a good addition to your technical library, notwithstanding an odd
acknowledgment of Robert X. Cringely, the pseudonym of an author that Ray
Duncan generously associates with yellow journalism in the March "Programmer s
Bookshelf."
There are three clues that Nelson wrote the book well before he published it.
One is the use of the old-style K&R prototypes and function parameter blocks.
He has included ANSI prototypes under the control of the
STDC__ compile-time variable, but the function declarations of some of the
programs use the obsolete K&R style. Nelson attempts to justify this by
implying that not everyone will be using ANSI compilers. but the book is
inconsistent in this regard-other programs use ANSI function declarations. The
second clue is his reference to Turbo C 1.0. The third clue is his use of the
nod-eighties catch-phrase. "Where's the beef?" as a paragraph headed Come on,
Mark. 23-Skidoo.

_C PROGRAMMING COLUMN_
by Al Stevens


[LISTING ONE]

/* ------------- applicat.c ------------- */

#include "dflat.h"

static int ScreenHeight;
static BOOL AltDown = FALSE;

extern DBOX Display;
extern DBOX Windows;

#ifdef INCLUDE_LOGGING
extern DBOX Log;
#endif

#ifdef INCLUDE_SHELLDOS
static void ShellDOS(WINDOW);
#endif
static void CreateMenu(WINDOW);
static void CreateStatusBar(WINDOW);
static void SelectColors(WINDOW);
static void SetScreenHeight(int);
static void SelectLines(WINDOW);

#ifdef INCLUDE_WINDOWOPTIONS
static void SelectTexture(void);
static void SelectBorder(WINDOW);
static void SelectTitle(WINDOW);
static void SelectStatusBar(WINDOW);
#endif

#ifdef INCLUDE_MULTI_WINDOWS
static void CloseAll(WINDOW, int);
static void MoreWindows(WINDOW);
static void ChooseWindow(WINDOW, int);
static int WindowSel;
static WINDOW oldFocus;
static char *Menus[9] = {
 "~1. ",
 "~2. ",
 "~3. ",
 "~4. ",
 "~5. ",
 "~6. ",

 "~7. ",
 "~8. ",
 "~9. "
};
#endif

/* --------------- CREATE_WINDOW Message -------------- */
static int CreateWindowMsg(WINDOW wnd)
{
 int rtn;
 static BOOL DisplayModified = FALSE;
 ScreenHeight = SCREENHEIGHT;
 if (!isVGA() && !DisplayModified) {
 /* ---- modify Display Dialog Box for EGA, CGA ---- */
 CTLWINDOW *ct, *ct1;
 int i;
 ct = FindCommand(&Display, ID_OK, BUTTON);
 if (isEGA())
 ct1 = FindCommand(&Display,ID_50LINES,RADIOBUTTON);
 else {
 CTLWINDOW *ct2;
 ct2 = FindCommand(&Display,ID_COLOR,RADIOBUTTON)-1;
 ct2->dwnd.w++;
 for (i = 0; i < 7; i++)
 (ct2+i)->dwnd.x += 8;
 ct1 = FindCommand(&Display,ID_25LINES,RADIOBUTTON)-1;
 }
 for (i = 0; i < 4; i++)
 *ct1++ = *ct++;
 DisplayModified = TRUE;
 }
#ifdef INCLUDE_WINDOWOPTIONS
 if (cfg.Border)
 SetCheckBox(&Display, ID_BORDER);
 if (cfg.Title)
 SetCheckBox(&Display, ID_TITLE);
 if (cfg.StatusBar)
 SetCheckBox(&Display, ID_STATUSBAR);
 if (cfg.Texture)
 SetCheckBox(&Display, ID_TEXTURE);
#endif
 if (cfg.mono == 1)
 PushRadioButton(&Display, ID_MONO);
 else if (cfg.mono == 2)
 PushRadioButton(&Display, ID_REVERSE);
 else
 PushRadioButton(&Display, ID_COLOR);
 if (cfg.ScreenLines == 25)
 PushRadioButton(&Display, ID_25LINES);
 else if (cfg.ScreenLines == 43)
 PushRadioButton(&Display, ID_43LINES);
 else if (cfg.ScreenLines == 50)
 PushRadioButton(&Display, ID_50LINES);
 if (SCREENHEIGHT != cfg.ScreenLines) {
 SetScreenHeight(cfg.ScreenLines);
 if (WindowHeight(wnd) == ScreenHeight 
 SCREENHEIGHT-1 < GetBottom(wnd)) {
 WindowHeight(wnd) = SCREENHEIGHT-1;
 GetBottom(wnd) = GetTop(wnd)+WindowHeight(wnd)-1;

 wnd->RestoredRC = WindowRect(wnd);
 }
 }
 SelectColors(wnd);
#ifdef INCLUDE_WINDOWOPTIONS
 SelectBorder(wnd);
 SelectTitle(wnd);
 SelectStatusBar(wnd);
#endif
 rtn = BaseWndProc(APPLICATION, wnd, CREATE_WINDOW, 0, 0);
 if (wnd->extension != NULL)
 CreateMenu(wnd);
 CreateStatusBar(wnd);
 LoadHelpFile();
 SendMessage(NULL, SHOW_MOUSE, 0, 0);
 return rtn;
}

/* --------- ADDSTATUS Message ---------- */
static void AddStatusMsg(WINDOW wnd, PARAM p1)
{
 if (wnd->StatusBar != NULL) {
 if (p1 && *(char *)p1)
 SendMessage(wnd->StatusBar, SETTEXT, p1, 0);
 else
 SendMessage(wnd->StatusBar, CLEARTEXT, 0, 0);
 SendMessage(wnd->StatusBar, PAINT, 0, 0);
 }
}

/* -------- SETFOCUS Message -------- */
static void SetFocusMsg(WINDOW wnd, BOOL p1)
{
 if (p1)
 SendMessage(inFocus, SETFOCUS, FALSE, 0);
 /* --- remove window from list --- */
 RemoveFocusWindow(wnd);
 /* --- move window to end/beginning of list --- */
 p1 ? AppendFocusWindow(wnd) : PrependFocusWindow(wnd);
 inFocus = p1 ? wnd : NULL;
 SendMessage(wnd, BORDER, 0, 0);
}

/* ------- SIZE Message -------- */
static void SizeMsg(WINDOW wnd, PARAM p1, PARAM p2)
{
 BOOL WasVisible;
 WasVisible = isVisible(wnd);
 if (WasVisible)
 SendMessage(wnd, HIDE_WINDOW, 0, 0);
 if (p1-GetLeft(wnd) < 30)
 p1 = GetLeft(wnd) + 30;
 BaseWndProc(APPLICATION, wnd, SIZE, p1, p2);
 CreateMenu(wnd);
 CreateStatusBar(wnd);
 if (WasVisible)
 SendMessage(wnd, SHOW_WINDOW, 0, 0);
}


/* ----------- KEYBOARD Message ------------ */
static int KeyboardMsg(WINDOW wnd, PARAM p1, PARAM p2)
{
 AltDown = FALSE;
 if (WindowMoving WindowSizing (int) p1 == F1)
 return BaseWndProc(APPLICATION, wnd, KEYBOARD, p1, p2);
 switch ((int) p1) {
 case ALT_F4:
 PostMessage(wnd, CLOSE_WINDOW, 0, 0);
 return TRUE;
#ifdef INCLUDE_MULTI_WINDOWS
 case ALT_F6:
 SetNextFocus(inFocus);
 SkipSystemWindows(FALSE);
 return TRUE;
#endif
 case ALT_HYPHEN:
 BuildSystemMenu(wnd);
 return TRUE;
 default:
 break;
 }
 PostMessage(wnd->MenuBarWnd, KEYBOARD, p1, p2);
 return TRUE;
}

/* --------- SHIFT_CHANGED Message -------- */
static void ShiftChangedMsg(WINDOW wnd, PARAM p1)
{
 if ((int)p1 & ALTKEY)
 AltDown = TRUE;
 else if (AltDown) {
 AltDown = FALSE;
 if (wnd->MenuBarWnd != inFocus)
 SendMessage(NULL, HIDE_CURSOR, 0, 0);
 SendMessage(wnd->MenuBarWnd, KEYBOARD, F10, 0);
 }
}

/* -------- COMMAND Message ------- */
static void CommandMsg(WINDOW wnd, PARAM p1, PARAM p2)
{
 switch ((int)p1) {
 case ID_HELP:
 DisplayHelp(wnd, DFlatApplication);
 break;
 case ID_HELPHELP:
 DisplayHelp(wnd, "HelpHelp");
 break;
 case ID_EXTHELP:
 DisplayHelp(wnd, "ExtHelp");
 break;
 case ID_KEYSHELP:
 DisplayHelp(wnd, "KeysHelp");
 break;
 case ID_HELPINDEX:
 DisplayHelp(wnd, "HelpIndex");
 break;
#ifdef TESTING_DFLAT

 case ID_LOADHELP:
 LoadHelpFile();
 break;
#endif
#ifdef INCLUDE_LOGGING
 case ID_LOG:
 MessageLog(wnd);
 break;
#endif
#ifdef INCLUDE_SHELLDOS
 case ID_DOS:
 ShellDOS(wnd);
 break;
#endif
 case ID_EXIT:
 case ID_SYSCLOSE:
 PostMessage(wnd, CLOSE_WINDOW, 0, 0);
 break;
 case ID_DISPLAY:
 if (DialogBox(wnd, &Display, TRUE, NULL)) {
 SendMessage(wnd, HIDE_WINDOW, 0, 0);
 SelectColors(wnd);
 SelectLines(wnd);
#ifdef INCLUDE_WINDOWOPTIONS
 SelectBorder(wnd);
 SelectTitle(wnd);
 SelectStatusBar(wnd);
 SelectTexture();
#endif
 CreateMenu(wnd);
 CreateStatusBar(wnd);
 SendMessage(wnd, SHOW_WINDOW, 0, 0);
 }
 break;
 case ID_SAVEOPTIONS:
 SaveConfig();
 break;
#ifdef INCLUDE_MULTI_WINDOWS
 case ID_WINDOW:
 ChooseWindow(wnd, (int)p2-2);
 break;
 case ID_CLOSEALL:
 CloseAll(wnd, FALSE);
 break;
 case ID_MOREWINDOWS:
 MoreWindows(wnd);
 break;
#endif
#ifdef INCLUDE_RESTORE
 case ID_SYSRESTORE:
#endif
 case ID_SYSMOVE:
 case ID_SYSSIZE:
#ifdef INCLUDE_MINIMIZE
 case ID_SYSMINIMIZE:
#endif
#ifdef INCLUDE_MAXIMIZE
 case ID_SYSMAXIMIZE:
#endif

 BaseWndProc(APPLICATION, wnd, COMMAND, p1, p2);
 break;
 default:
 if (inFocus != wnd->MenuBarWnd && inFocus != wnd)
 PostMessage(inFocus, COMMAND, p1, p2);
 break;
 }
}

/* --------- CLOSE_WINDOW Message -------- */
static int CloseWindowMsg(WINDOW wnd)
{
 int rtn;
#ifdef INCLUDE_MULTI_WINDOWS
 CloseAll(wnd, TRUE);
#endif
 PostMessage(NULL, STOP, 0, 0);
 rtn = BaseWndProc(APPLICATION, wnd, CLOSE_WINDOW, 0, 0);
 if (ScreenHeight != SCREENHEIGHT)
 SetScreenHeight(ScreenHeight);
 UnLoadHelpFile();
 return rtn;
}

/* --- APPLICATION Window Class window processing module --- */
int ApplicationProc(WINDOW wnd, MESSAGE msg, PARAM p1, PARAM p2)
{
 switch (msg) {
 case CREATE_WINDOW:
 return CreateWindowMsg(wnd);
 case HIDE_WINDOW:
 if (wnd == inFocus)
 inFocus = NULL;
 break;
 case ADDSTATUS:
 AddStatusMsg(wnd, p1);
 return TRUE;
 case SETFOCUS:
 if ((int)p1 == (inFocus != wnd)) {
 SetFocusMsg(wnd, (BOOL) p1);
 return TRUE;
 }
 break;
 case SIZE:
 SizeMsg(wnd, p1, p2);
 return TRUE;
#ifdef INCLUDE_MINIMIZE
 case MINIMIZE:
 return TRUE;
#endif
 case KEYBOARD:
 return KeyboardMsg(wnd, p1, p2);
 case SHIFT_CHANGED:
 ShiftChangedMsg(wnd, p1);
 return TRUE;
 case PAINT:
 if (isVisible(wnd)) {
#ifdef INCLUDE_WINDOWOPTIONS
 int cl = cfg.Texture ? APPLCHAR : ' ';

#else
 int cl = APPLCHAR;
#endif
 ClearWindow(wnd, (RECT *)p1, cl);
 }
 return TRUE;
 case COMMAND:
 CommandMsg(wnd, p1, p2);
 return TRUE;
 case CLOSE_WINDOW:
 return CloseWindowMsg(wnd);
 default:
 break;
 }
 return BaseWndProc(APPLICATION, wnd, msg, p1, p2);
}

#ifdef INCLUDE_SHELLDOS
static void SwitchCursor(void)
{
 SendMessage(NULL, SAVE_CURSOR, 0, 0);
 SwapCursorStack();
 SendMessage(NULL, RESTORE_CURSOR, 0, 0);
}

/* ------- Shell out to DOS ---------- */
static void ShellDOS(WINDOW wnd)
{
 SendMessage(wnd, HIDE_WINDOW, 0, 0);
 SwitchCursor();
 if (ScreenHeight != SCREENHEIGHT)
 SetScreenHeight(ScreenHeight);
 SendMessage(NULL, HIDE_MOUSE, 0, 0);
 printf("To return to %s, execute the DOS exit command.",
 DFlatApplication);
 fflush(stdout);
 spawnl(P_WAIT, getenv("COMSPEC"), NULL);
 if (SCREENHEIGHT != cfg.ScreenLines)
 SetScreenHeight(cfg.ScreenLines);
 SwitchCursor();
 SendMessage(wnd, SHOW_WINDOW, 0, 0);
 SendMessage(NULL, SHOW_MOUSE, 0, 0);
}
#endif

/* -------- Create the menu bar -------- */
static void CreateMenu(WINDOW wnd)
{
 AddAttribute(wnd, HASMENUBAR);
 if (wnd->MenuBarWnd != NULL)
 SendMessage(wnd->MenuBarWnd, CLOSE_WINDOW, 0, 0);
 wnd->MenuBarWnd = CreateWindow(MENUBAR,
 NULL,
 GetClientLeft(wnd),
 GetClientTop(wnd)-1,
 1,
 ClientWidth(wnd),
 NULL,
 wnd,

 NULL,
 0);
 SendMessage(wnd->MenuBarWnd,BUILDMENU,
 (PARAM)wnd->extension,0);
 AddAttribute(wnd->MenuBarWnd, VISIBLE);
}

/* ----------- Create the status bar ------------- */
static void CreateStatusBar(WINDOW wnd)
{
 if (wnd->StatusBar != NULL) {
 SendMessage(wnd->StatusBar, CLOSE_WINDOW, 0, 0);
 wnd->StatusBar = NULL;
 }
 if (TestAttribute(wnd, HASSTATUSBAR)) {
 wnd->StatusBar = CreateWindow(STATUSBAR,
 NULL,
 GetClientLeft(wnd),
 GetBottom(wnd),
 1,
 ClientWidth(wnd),
 NULL,
 wnd,
 NULL,
 0);
 AddAttribute(wnd->StatusBar, VISIBLE);
 }
}

#ifdef INCLUDE_MULTI_WINDOWS
/* -------- return the name of a document window ------- */
static char *WindowName(WINDOW wnd)
{
 if (GetTitle(wnd) == NULL) {
 if (GetClass(wnd) == DIALOG)
 return ((DBOX *)(wnd->extension))->HelpName;
 else
 return "Untitled";
 }
 else
 return GetTitle(wnd);
}

/* ----------- Prepare the Window menu ------------ */
void PrepWindowMenu(void *w, struct Menu *mnu)
{
 WINDOW wnd = w;
 struct PopDown *p0 = mnu->Selections;
 struct PopDown *pd = mnu->Selections + 2;
 struct PopDown *ca = mnu->Selections + 13;
 int MenuNo = 0;
 WINDOW cwnd;
 mnu->Selection = 0;
 oldFocus = NULL;
 if (GetClass(wnd) != APPLICATION) {
 int i;
 oldFocus = wnd;
 /* ----- point to the APPLICATION window ----- */
 while (GetClass(wnd) != APPLICATION)

 if ((wnd = GetParent(wnd)) == NULL)
 return;
 /* ----- get the first 9 document windows ----- */
 for (i = 0; i < wnd->ChildCt && MenuNo < 9; i++) {
 cwnd = *(wnd->Children + i);
 if (GetClass(cwnd) != MENUBAR &&
 GetClass(cwnd) != STATUSBAR) {
 /* --- add the document window to the menu --- */
 strncpy(Menus[MenuNo]+4, WindowName(cwnd), 20);
 pd->SelectionTitle = Menus[MenuNo];
 if (cwnd == oldFocus) {
 /* -- mark the current document -- */
 pd->Attrib = CHECKED;
 mnu->Selection = MenuNo+2;
 }
 else
 pd->Attrib &= ~CHECKED;
 pd++;
 MenuNo++;
 }
 }
 }
 if (MenuNo)
 p0->SelectionTitle = "~Close all";
 else
 p0->SelectionTitle = NULL;
 if (MenuNo >= 9) {
 *pd++ = *ca;
 if (mnu->Selection == 0)
 mnu->Selection = 11;
 }
 pd->SelectionTitle = NULL;
}

/* window processing module for the More Windows dialog box */
static int WindowPrep(WINDOW wnd,MESSAGE msg,PARAM p1,PARAM p2)
{
 switch (msg) {
 case INITIATE_DIALOG: {
 WINDOW wnd1;
 WINDOW cwnd = ControlWindow(&Windows,ID_WINDOWLIST);
 WINDOW pwnd = GetParent(wnd);
 int sel = 0, i;
 if (cwnd == NULL)
 return FALSE;
 for (i = 0; i < pwnd->ChildCt; i++) {
 wnd1 = *(pwnd->Children + i);
 if (wnd1 != wnd && GetClass(wnd1) != MENUBAR &&
 GetClass(wnd1) != STATUSBAR) {
 if (wnd1 == oldFocus)
 WindowSel = sel;
 SendMessage(cwnd, ADDTEXT,
 (PARAM) WindowName(wnd1), 0);
 sel++;
 }
 }
 SendMessage(cwnd, LB_SETSELECTION, WindowSel, 0);
 AddAttribute(cwnd, VSCROLLBAR);
 PostMessage(cwnd, SHOW_WINDOW, 0, 0);

 break;
 }
 case COMMAND:
 switch ((int) p1) {
 case ID_OK:
 if ((int)p2 == 0)
 WindowSel = SendMessage(
 ControlWindow(&Windows,
 ID_WINDOWLIST),
 LB_CURRENTSELECTION, 0, 0);
 break;
 case ID_WINDOWLIST:
 if ((int) p2 == LB_CHOOSE)
 SendMessage(wnd, COMMAND, ID_OK, 0);
 break;
 default:
 break;
 }
 break;
 default:
 break;
 }
 return DefaultWndProc(wnd, msg, p1, p2);
}

/* ---- the More Windows command on the Window menu ---- */
static void MoreWindows(WINDOW wnd)
{
 if (DialogBox(wnd, &Windows, TRUE, WindowPrep))
 ChooseWindow(wnd, WindowSel);
}

/* ----- user chose a window from the Window menu
 or the More Window dialog box ----- */
static void ChooseWindow(WINDOW wnd, int WindowNo)
{
 int i;
 WINDOW cwnd;
 for (i = 0; i < wnd->ChildCt; i++) {
 cwnd = *(wnd->Children + i);
 if (GetClass(cwnd) != MENUBAR &&
 GetClass(cwnd) != STATUSBAR)
 if (WindowNo-- == 0)
 break;
 }
 if (wnd->ChildCt) {
 SendMessage(cwnd, SETFOCUS, TRUE, 0);
 if (cwnd->condition == ISMINIMIZED)
 SendMessage(cwnd, RESTORE, 0, 0);
 }
}

/* ----- Close all document windows ----- */
static void CloseAll(WINDOW wnd, int closing)
{
 WINDOW wnd1;
 int i;
 SendMessage(wnd, SETFOCUS, TRUE, 0);
 for (i = wnd->ChildCt; i > 0; --i) {

 wnd1 = *(wnd->Children + i - 1);
 if (GetClass(wnd1) != MENUBAR &&
 GetClass(wnd1) != STATUSBAR) {
 ClearVisible(wnd1);
 SendMessage(wnd1, CLOSE_WINDOW, 0, 0);
 }
 }
 if (!closing)
 SendMessage(wnd, PAINT, 0, 0);
}

#endif /* #ifdef INCLUDE_MULTI_WINDOWS */

static void DoWindowColors(WINDOW wnd)
{
 WINDOW cwnd;
 int i;
 InitWindowColors(wnd);
 for (i = 0; i < wnd->ChildCt; i++) {
 cwnd = *(wnd->Children + i);
 InitWindowColors(cwnd);
 if (GetClass(cwnd) == TEXT && GetText(cwnd) != NULL)
 SendMessage(cwnd, CLEARTEXT, 0, 0);
 }
}

/* ----- set up colors for the application window ------ */
static void SelectColors(WINDOW wnd)
{
 if (RadioButtonSetting(&Display, ID_MONO))
 cfg.mono = 1;
 else if (RadioButtonSetting(&Display, ID_REVERSE))
 cfg.mono = 2;
 else
 cfg.mono = 0;
 if ((ismono() video_mode == 2) && cfg.mono == 0)
 cfg.mono = 1;

 if (cfg.mono == 1)
 memcpy(cfg.clr, bw, sizeof bw);
 else if (cfg.mono == 2)
 memcpy(cfg.clr, reverse, sizeof reverse);
 else
 memcpy(cfg.clr, color, sizeof color);
 DoWindowColors(wnd);
}

/* ---- select screen lines ---- */
static void SelectLines(WINDOW wnd)
{
 cfg.ScreenLines = 25;
 if (isEGA() isVGA()) {
 if (RadioButtonSetting(&Display, ID_43LINES))
 cfg.ScreenLines = 43;
 else if (RadioButtonSetting(&Display, ID_50LINES))
 cfg.ScreenLines = 50;
 }
 if (SCREENHEIGHT != cfg.ScreenLines) {
 int FullScreen = WindowHeight(wnd) == SCREENHEIGHT;

 SetScreenHeight(cfg.ScreenLines);
 if (FullScreen SCREENHEIGHT-1 < GetBottom(wnd))
 SendMessage(wnd, SIZE, (PARAM) GetRight(wnd),
 SCREENHEIGHT-1);
 }
}

/* ---- set the screen height in the video hardware ---- */
static void SetScreenHeight(int height)
{
 if (isEGA() isVGA()) {
 SendMessage(NULL, SAVE_CURSOR, 0, 0);
 switch (height) {
 case 25:
 Set25();
 break;
 case 43:
 Set43();
 break;
 case 50:
 Set50();
 break;
 default:
 break;
 }
 SendMessage(NULL, RESTORE_CURSOR, 0, 0);
 SendMessage(NULL, RESET_MOUSE, 0, 0);
 SendMessage(NULL, SHOW_MOUSE, 0, 0);
 }
}

#ifdef INCLUDE_WINDOWOPTIONS

/* ----- select the screen texture ----- */
static void SelectTexture(void)
{
 cfg.Texture = CheckBoxSetting(&Display, ID_TEXTURE);
}

/* -- select whether the application screen has a border -- */
static void SelectBorder(WINDOW wnd)
{
 cfg.Border = CheckBoxSetting(&Display, ID_BORDER);
 if (cfg.Border)
 AddAttribute(wnd, HASBORDER);
 else
 ClearAttribute(wnd, HASBORDER);
}

/* select whether the application screen has a status bar */
static void SelectStatusBar(WINDOW wnd)
{
 cfg.StatusBar = CheckBoxSetting(&Display, ID_STATUSBAR);
 if (cfg.StatusBar)
 AddAttribute(wnd, HASSTATUSBAR);
 else
 ClearAttribute(wnd, HASSTATUSBAR);
}


/* select whether the application screen has a title bar */
static void SelectTitle(WINDOW wnd)
{
 cfg.Title = CheckBoxSetting(&Display, ID_TITLE);
 if (cfg.Title)
 AddAttribute(wnd, HASTITLEBAR);
 else
 ClearAttribute(wnd, HASTITLEBAR);
}

#endif



















































May, 1992
STRUCTURED PROGRAMMING


The Triumph of the Black Box




Jeff Duntemann, KG7JF


God help us, I've held out as long as I could, but there is now no escape: I
have to buy a car, The Magic Van is doing fine at eight years and 95,000
miles, but Carol's poor 12-year-old Colt met its maker not long ago, and we
gotta have something else that moves pretty soon. So I've been looking under
hoods and rolling my eyes.
In January of 1974 I replaced the carburetor of my 1968 Chevelle, freezing my
tail in 5 degree temperatures and a windchill cold enough to liquefy nitrogen,
out in the open on a Chicago street--and it worked. I didn't know what I was
doing and didn't have the right tools to do it with, but poor Shakespeare
tumed right over with a roar once I tightened the last bolt, and ran like a
champ through several more Chicago winters until the rust uglies ate him.
Now, when I look under the hood of what they call a Chevy. I couldn't even
tell you where the carburetor is. I don't even know if they still have them.
The engine compartment looks like the set of a bad Frankenstein movie. They
tell me there are five or six computers in there somewhere, faithfully keeping
today's Chevies running clean, straight, and on the road. Five or six
computers, running five or six different pieces of software, are fighting for
control of the tin can I'm hurtling down the interstate in at more than a mile
a minute.
This is supposed to inspire confidence?
Yeek. I know far too much about software to allow it under the hood of my car.
I want Shakespeare back! (If anybody has but would part with a 1968 Chevelle
two-door hardtop in good shape, puh-leez let me know.)


The Black Box Reconsidered


The essence of a black box is that you don't have to understand its internals
and you don't have to fool with it. In the best of all worlds, A car would be
an indestructible black block of science-fiction synthetic with wheels that
turned. You get in, and it goes where it should and stops when it must.
Failing that, a car should be simple enough so that any knucklehead with a
wrench can keep it running. Complexity should not be subject to corrosion.
And this, I think, is what bothers me about Turbo Vision. It's too complex to
really understand, but by the same token you still have to fool with it,
sometimes at a fairly deep level. As I suggested last month, it's a black box,
but not nearly a black enough one. This is a problem that will bedevil vendors
of event-driven frameworks until we finally hit upon some sort of golden mean
between power and complexity.
So where's the pointer to that golden mean?
I can think of at least two, from the battling titans of the compiler world:
Microsoft's Visual Basic and Borland's Object Vision. One, like Turbo Vision,
isn't quite black enough, and the other is maybe a hair too black. One is a
relatively conservative vision, and the other is a totally radical one. But
hey, we're in the ballpark. I'm going to tell you about Object Vision first,
because it sets a far limit on the concept of "visual development" it's
essentially 100 percent visual. After a close look at OV. Visual Basic will
seem as familiar as an old pair of shoes.


Borland's Nonesuch


Object Vision just sort of came out of nowhere, and for the longest time
nobody was quite sure what to make of it. Products that fit no established
category have that trouble a lot. Most people I spoke to described OV as a
graphical front end for Paradox. This is dead wrong. OV doesn't require that
you own Paradox, nor does it insist on working with Paradox database files. It
can create Paradox tables all by itself--just as well as it can create dBase
tables, or Btrieve tables. If you'd prefer to store data in ASCII--or if you
already have tabular data stored in ASCII--OV is perfectly happy there, too.
If it isn't a front end to a database, neither is it a database manager as
we've come to know the term. To use a table in OV you have to create an
application. There's no easy "browse table" mode that simply lets you select a
table and poke around in it.
No. Despite Borland's protestations to the contrary, Object Vision is a
programming language, and probably the most remarkable one to appear in some
time. It isn't the best programming language to appear recently, by any means;
(Visual Basic beats it hands down in utility) but it's the wildest conceptual
ride I've taken in a while.


Visual Syntax


The trick is this: Object Vision is a Windows-based, event-driven programming
language with a totally visual syntax. There's so little of our familiar
one-step-ahead-of-another sequential statement coding that it might as well
not be there at all. Instead, virtually all the elements of a program are
drawn on the screen with interactive tools.
An OV application consists of four major elements:
Forms and Fields. Object Vision calls individual data items fields. There are
no variables per se; every data item both holds data and displays it somewhere
on the screen. Organizing the screen is done using forms, windows within which
fields are arranged.
Creating forms and fields is done from the OV form tool, a mouse-essential
draw program that should seem familiar and comfortable to anyone who has ever
used a paint or draw program under Windows. The form tool lets you create a
new form and make it the size you want, and then lets you pick fields from a
tool bar of field types and arrange them on the form. Once the fields are
positioned, the form tool helps you apply properties to the fields on the
forms: attributes such as color, labels, label fonts, picture strings, date
formats, numeric formats, and so on. Fields may be marked as noneditable, and
as such are skipped when the user moves from field to field within a form.
From a height, defining forms and fields is equivalent to defining your
variables and your user interface, all at the same time. It's by far the
easiest and most straightforward part of OV work.
Value Trees. It's uphill from there. To apply values to the fields you create
using the form tool, you must define a vabie tree for each field not connected
to a link. (See "Links," further on.) A valtte tree is a rule, drawn as a
graphical diagram in a special window, that dictates the value a field has at
any given time. It's a little like a formula attached to a spreadsheet cell,
except that value trees can be much more complex than spreadsheet formulas.
A value tree is a tree because there is only one final selected value (the
root of the tree) but many possible paths between that root and different
potential values. These values (called conclusions) are the leaves of the
tree. The path from the root to one of the leaves depends on one or more
conditions, tests composed of logical operators such as = or <> and @
functions.
Each time the value of a field is recalculated, OV follows the logic from the
root outward, making tests along the way, until it reaches one of the
conclusions. This conclusion (a value reached as the resuft of some logical
decision) becomes the new value of the field. The conclusions (values) may be
defined in terms of literal constants, other fields, @ functions, or
combinations of these, combined in expressions via the common arithmetic
operators.
This is a difficult concept to put across in only a few words. Figure 1 shows
a value tree from one of the Borland example applications. The trick in
understanding such diagrams is to avoid the temptation to think of them as
flowcharts, readable from the top down. Start at the root, and follow the
logic out to each one of the leaves. After a time or two the lights will come
on.
Event Trees. At the core of event-driven programming is the idea that code
exists to respond to events occurring outside the running application. OV
allows you to define responses to various events and attach them to both fond
and fields. The supported events include mouse clicks, selection and
deselection, opening or closing of a form, any change to a field or a form, or
an artificial event" created through an @ function called @EVENT.
You attach responses to events by building an efIent tree for a specific event
and associating it with some object in the application. Like a value tree, an
event tree is a graphical diagram displayed in a special window. Also, in a
fashion similar to a value tree, an event tree begins at a root the event in
question and expands through one or more logic paths to a conclusion, which
for an event tree is some action rather than the assignment of a value to a
field. (However, you can assign a value to a field in an event tree, using the
@ASSIGN function which got me in trouble at one point.) For each occurrence of
an event, only one of those conclusions is chosen and executed, depending on
how the sequences of tests work out along the way.
The set of components for building an event tree is pretty rich. Some are
explicit, such as @ functions for perfomilng the AND, OR, and NOT logical
functions: others can be synthesized, such as creating a CASE construct from a
multiple branch and the predefined OTHERWISE clause.
Figure 2 is an event tree from one of the Borland example applications. See if
you can follow the logic; again, beginning at the event block and following
each logic path to a final conclusion.
Links. All OV file I/O is done through links, logical connections to physical
files. A link contains buffer variables that may be written to prior to the
actual physical update of the desired record in a file. When the link is
defined, different fields in the physical record may be connected to different
fields in the OV application. The same physical field, in fact, may be
connected to one OV field on write and a different OV field on read. A suite
of @ functions such as @TOP, @NEXT, @PREVIOUS, @LOCAL, @UPDATE, and so on
allows you to move around in the physical table file and manipulate records.
A universe of subtlety underlies OV links, and I confess I've barely scratched
the surface. Inexact matches on search are supported, as are filters, logical
tests a physical record must pass before being delivered into the OV link. For
example, you can define a filter on a Product Code field, and only display
records with product codes greater than 100, or beginning with "B," and so on.
Virtual fields are also supported. This allows you to add a read-only field to
a database record, the value of which is calculated through some sort of
expression. A Sales Tax virtual field could be created by multiplying a Sales
Amount field by a Sales Tax Percent field. You can read from the Sales Tax
virtual field as though it were present in the physical database, but not
write to it. Virtual fields are recalculated whenever any of the component
fields in their expressions change.
Links may also be created through the Windows DDE channel. There are some wild
possibilities here, because OV has the power to load and run a Windows
application if an OV application needs to link to that other application via
DDE.
The documentation failed me on DDE links. Whereas the manuals probably include
most or all of the detailed information, they aren't much help in suggesting
just what one can do with DDE links.



No, No! Another Paradigm Change!


Thinking in Object Vision takes some getting used to. Gosh, it almost feels
lIke another paradigm change! I'll stop short of saying that--Esther Dyson
might repeat it, and it could catch on. But consider conceptual wrenches like
this: Every variable in OV is visible a field--and is displayed on some form,
whether you want to display it or not. This drove me nuts until the lights
started popping in my head, and I simply defined a fond to hold all those
variables that I didn't need to keep on display. Then I simply chose not to
display that particular form! You have to see the variables, but you don't
have to see the form they're attached to. Ergo, you don't really have to see
the variables.
Then I realized that I could bring the hidden form up on demand as a kind of
watch window to see what my "invisible" variables were doing. More lights
start to come on. This is really, truly, visual programming!
As Cliff Secord said after his first wild ride as the Rocketeer, "I... like
it!"


Only Square One


I suppose it's possible that Borland doesn't truly realize what they have
here. They seem obsessed with the notion that it's programming with the
programming squeezed out, as though programming were still universally seen as
an unapproachable black art, 1975-style.
The problem is, with all the programming squeezed out, OV has to do back flips
to get certain things done, in a way reminiscent of Prolog, which would sooner
die than admit to being a procedural programming language. (It died.) Given
that I'm only a beginner, it's possible that I'm not really thinking in OV
terms yet, but I feel that OV is actually harder to use than it would be if it
allowed a little more procedural control in certain areas. The suite of @
functions is quite rich, but I'd like something similar to Turbo Pascal
waiting in the background to give me something even if it has to be couched in
@ function form--that isn't supported out of the box. You can write your own @
functions as DLLs, but the documentation is sparse, and the process looks kind
of forbidding.
Maybe that's a symptom of a broader problem: documentation that simply isn't
all there. This very subtle product has only two thinnish manuals: a tutorial
and a reference guide. They do an adequate job explaining the individual
pieces (events, @ functions, and so on), but there is very little "cookbook"
advice on how to hook the pieces together into useful applications. Such
advice is critical because OV isn't like anything most of us are used to
using.
I killed a lot of time trying to figure out something as simple as how to
return control to the top field of a form once I had written the fields in the
form out to a table through a link. The button that updated the table cleared
the form, but clearing a form doesn't return control to the top. A great deal
of trying this-and-that went on before I realized that @FORMSELECT would
return control to the form top. Because control was in the form, the form was
already selected, and it was by no means obvious that selecting the form again
would bring control to the top without adverse effects.
It worked. The manuals should say so. They don't.
A third volume devoted to cookbook advice is necessary. Until it appears (or
until some solid third-party books on the package appear) you'd better plan on
figuring a lot of the "big picture" stuff out on your own.


Global Warfare


Other improvements would help as well. For a product that hints at an
object-oriented nature, OV data storage looks amazingly like FORTRAN COMMON.
All fields in an OV app are global! I at first assumed that fields in a form
were known only within that form, but wrongo. If you have a customer address
form and a vendor address form, you need to name one address field Cust
Address, and the other Vendor Address, because a single field named Address
would cause gross confusion. This would be no harder than providing a
"Form.Field" notation; for example, Vendor Form.Address.
I got in trouble because I copied a button named "Save To Database" from a
completed form and pasted the copy into a new form that I was designing. I
changed the new buttons event tree to reflect the database served by the form
it was in. Alas, the copied button, as tt happened, was the same button as the
one in the first form! It was one button present in two database update forms,
and its event tree updated only one database.
Simply allowing a button to display an arbitrary text label instead of its
True Name would help a lot. As it is, you end up with wordy (and physically
large) buttons with names such as "Save Vendor Info to Database" or "Change a
Product Record." Such elaborately self-documenting buttons might be considered
a feature rather than a bug if it weren't so hard to fit them into an
already-crowded data-entry form.
As they exist today, forms manage flow of control but not data scoping. Some
sort of locality would really help.


The View from a Height


But what Object Vision needs most desperately is something remarkably simple:
a perusable hard-copy description of the application under development. That
handy printout of the current state of the source code is such a fundamental
need that we generally don't think of it as a "feature." Lordy, if you can't
prInt out the source, how do you ever keep things straight once application
grows out of toy mode into something complex enough to do useful work?
Alas there is no way to print out a single, concise summary of which fields
have been defined and what forms they extst in, nor which @ functions they
invoke, nor what their value trees draw on to generate their values. You
either keep all this in the back of your head, or you're reduced to inspecting
fields manually, one by one until you find what you need to know.
I got in trouble here too. During my early experiments in OV, I used the
@ASSIGN function in a button's event tree to assign an arbitrary test value to
a data field. I reused that button when I actually began building a working
application, and forgot about its back door access to the "Part Number" field.
Then when I tried completing a form containing Part Number, I found it stuck
with the value 1003. No matter what I did, the value 1003 kept reappearing at
unpredictable moments. I quickly remembered using @ASSIGN the week before to
put the value 1003 in "Part Number" --but I forgot which button's event tree
was the one that invoked @ASSIGN! By then I had 30-odd buttons in eight
different forms, and had to hunt through five of those forms before I found
the culprit.
Eventually I discovered that pictures of forms can be sent to the printer, as
an pictures of event and value trees. This is helpful, but it's by no means
enough, and it must be done piecemeal. What we need (and this cannot be
difficult for Borland to do) is a dialog box that controls generation of a
"hard-copy application summary." I want to be able to use check boxes to
select what sort of summary information I want (specific-named forms vs. all
forms, types of fields or named fields vs. all fields: alphabetical listing of
fields and their properties: cross references of fields by value tree and
event tree, and so on) and then let the system prtntet produce a hard copy
summary of the app.
In addition to that an Actor or Smalltalk-like field browser would be
wonderful. In one window I would like to see the name of a field, which forms
it is used in, what its properties are, and zoomable/scrollable panes
displaying its event and value trees.
Apart from its error messages, there are no debugging tools of any kind in
Object Vision: I won't carp too loudly because this is a problem with event
tree to assign an arbitrary test value to myself don't have anything brilliant
to suggest. Event driven debugging is an area that still needs a lot of
research and I hope Borland and other firms offering event driven tools are
putting some money and mental horsepower into that research.


Window on the Future


Don't let me leave you with the impression that Object Vision is unusable, or
that I don't care for it. On the contrary, it's a work of brilliance that I
like a lot, and I'm building a fairly sophisticated mail-order retailing
system with it. Borland's claim that you can be building Windows applications
with OV in half an hour is absurd, unless a two-dimensional version of "Hello
world!" is all you're interested in. You'll need a day or two to read the
manuals closely, and another day or two just to explore and mess around. But
after a couple of days of study, you'll be able to start building a useful
application, and might even be able to finish it and perfect it inside of a
couple of weeks.
That ain't half an hour, but it's still damned amazing. We're talking Windows
here, and the time required to learn a new Windows development environment and
produce useful results is usually measured in months.
I recommend that you try Object Vision if you get the chance, in part because
it's a quick learn and lots of fun, and in part because it's a peek at the
future of programming. As GUIs get more and more complex, we may not be able
to afford the luxury of to-the-metal. SDK-style development for any but the
most lucrative horizontal applications.
Visual development is one possible answer to compressing the Windows
development cycle. Object Vision's event and value trees are sentinal stuff,
and while the package as a whole needs some maturing (and a whole new set of
manuals) the idea is solid gold. The box is pretty black, but unlike Turbo
Vision, you don't have to fool with what's under the hood--you just have to
figure out the controls by trial and error.
This visual development business is fascinating. We'll take a look at Visual
Basic next month.


A Better Source Code Lister


Back to Turbo Vision. Using the DIRLIST.PAS unit I presented last month, I
rewrote my long-listed JLIST9.PAS source-code lister yet again, producing
JLIST10--and losing several hundred lines of code in the proceess. Listing One
(page 146) is the whole thing: a simple source-code lister that takes any
reason able number of file specs on the command line (such as *.ASM or
?LIST.PAS) and puts all matching files out, neatly to a LaserJet II or
compatible printer, with a summary banner and line numbers.
I was able to can about 300 lines of list-manager code, by incorporating Turbo
Vision collections. Furthermore, the code I canned was necessarily specific to
lists of just one type, and needed source code changes to apply to lists of
other types. Not serious changes, but changes nonetheless. JLIST10 is a much
better example of the power of TV collections than I was able to present last
time.
You can probably modify JLIST10 to run on other printers. All of the
printer-specific code is gathered toward the front. Mostly, your printer needs
to be able to print in a 16.66 pitch font. Apart from that requirement, any
printer, laser or otherwise, should work. Just splice in the appropriate
printer-control codes.



How Irish Hackers Keep Warm


The latest J. Peterman's mail-order catalog has a page devoted to the "Irish
Hacking Jacket," an expensive, tweedy, rumpled sort of overcoat that looks as
though it comes with simulated moss already growing on it. The copy on the
page begins this way: "Hacking? What does it mean? It means fooling around."
So far so good. I thought maybe the were on to something. But then, the next
line: "On a horse."
It's a whole new twist on mobile computing. Maybe I don't need a car. Maybe I
should just move to Ireland.


_STRUCTURED PROGRAMMING COLUMN_
by Jeff Duntemann

[LISTING ONE]

{--------------------------------------------------------------}
{ JLIST10 }
{ Multifile source code lister with 8-char tab expansion }
{ by Jeff Duntemann }
{ Turbo Pascal V6.0 }
{ Last update 1/1/92 }
{--------------------------------------------------------------}

PROGRAM JList10;

USES DOS,CRT,Printer, { Standard Borland units }
 DirList, { From DDJ for 4/92 }
 When2; { From DDJ for 1/92 }
CONST
 Up = True;
 Down = False;
 Single = True;
 Double = False;
 SingleRule = Chr(196); { D }
 DoubleRule = Chr(205); { M }

 JLogo : ARRAY[1..4] OF STRING =

 (' DBDD ZDDDDDDDBDDD ZDD?',
 ' 3 3 B @DDD? 3 ? 3 3',
 ' 3 3 3 3 3 3 3 3',
 ' DY @D A DDDDY A A @DDY');
 ESC1 = Chr($1B);
 ESC2 = ESC1+Chr($5B);

 LinesPerPage = 75; { 75 assumes 8 lines per inch }

TYPE
 String80 = STRING[80];
VAR
 InChar : Char;
 PrintPage : Boolean;
 Space10 : String80;
 ListLine : String;
 I,J : Integer;
 FileSpecs : String80;
 FileInfo : String;
 PrintCommand : String80;
 FilesToPrint : PDirEntryCollection;
 FileTime,Now : When; { "When" stamps for time/date processing }


{---------------------------------------------------------------}
{ PRINTER CONTROL ROUTINES }
{ These routines are all, to some extent, printer dependent. }
{ Here, the control codes are specific to the HP LJII/III. }
{---------------------------------------------------------------}
PROCEDURE PrinterReset;
BEGIN
 Write(LST,ESC1+'E');
END;

PROCEDURE PrinterToXY(X,Y : Integer);
BEGIN
 Write(LST,ESC1+'&a',Y-1,'R');
 Write(LST,ESC1+'&a',X-1,'C');
END;

PROCEDURE SetPrinterLinesPerInch(Lines : Integer);
BEGIN
 Write(LST,ESC1+'&l',Lines,'D');
END;

PROCEDURE SetLinePrinterFont;
BEGIN
 Write(LST,ESC1+'(s16.66H'); { Select Lineprinter font }
END;

PROCEDURE SetIBMCharacterSet;
BEGIN
 Write(LST,ESC1+'(10U'); { Select IBM PC symbol set }
END;

{-----------------------------------------}
{ END PRINTER-DEPENDENT CODE }
{-----------------------------------------}

PROCEDURE SendFormFeed;
BEGIN
 Write(LST,Chr(12))
END;

FUNCTION ForceCase(Up : BOOLEAN; Target : String) : String;
CONST
 Uppercase : SET OF Char = ['A'..'Z'];
 Lowercase : SET OF Char = ['a'..'z'];
VAR
 I : INTEGER;
BEGIN
 IF Up THEN FOR I := 1 TO Length(Target) DO
 IF Target[I] IN Lowercase THEN
 Target[I] := UpCase(Target[I])
 ELSE { NULL }
 ELSE FOR I := 1 TO Length(Target) DO
 IF Target[I] IN Uppercase THEN
 Target[I] := Chr(Ord(Target[I])+32);
 ForceCase := Target
END;

PROCEDURE PrintRule(ShowSingle : Boolean; StartColumn,EndColumn : Integer);
VAR

 RuleChar : Char;
 I : Integer;
BEGIN
 IF ShowSingle THEN RuleChar := SingleRule ELSE RuleChar := DoubleRule;
 FOR I := 1 TO StartColumn-1 DO Write(LST,' ');
 FOR I := StartColumn TO EndColumn DO Write(LST,RuleChar);
END;

PROCEDURE PrintStartBanner(FilesToPrint : PDirEntryCollection);
VAR
 TotalFiles : Integer;
 TotalBytes : LongInt;

PROCEDURE ShowSpecs(Target : PDirEntry); FAR;
BEGIN
 TotalFiles := Succ(TotalFiles);
 TotalBytes := TotalBytes + Target^.Entry.Size;
 Writeln(LST,Target^.DirLine);
END;
BEGIN
 TotalFiles := 0; TotalBytes := 0;
 SetPrinterLinesPerInch(12);
 FOR I := 1 TO 7 DO
 BEGIN
 PrintRule(Double,1,134); Writeln(LST);
 END;
 FOR I := 1 TO 4 DO Writeln(LST,JLogo[I]);
 FOR I := 1 TO 7 DO
 BEGIN
 PrintRule(Double,1,134); Writeln(LST);
 END;
 SetPrinterLinesPerInch(6);
 PrinterToXY(1,12);
 Write (LST,'Printer job initiated at '+Now.GetTimeString+'m');
 Writeln(LST,' on '+Now.GetLongDateString);
 PrintRule(Single,1,134); Writeln(LST);
 Writeln(LST,'Requested filespec: ',FileSpecs);
 Writeln(LST,'Files to be printed:');
 Writeln(LST);

 FilesToPrint^.ForEach(@ShowSpecs);

 PrintRule(Single,1,134); Writeln(LST);
 Writeln(LST,'Total number of files to be printed: ',TotalFiles);
 Writeln(LST,'Total number of bytes to be printed: ',TotalBytes);
 SendFormFeed;
END;

{->>>>PrintFile<<<<-}
PROCEDURE PrintFile(ToBePrinted : PDirEntry);
VAR
 LineNumber,PageNumber : Integer;
 ListFileName : String80;
 ListFile : Text;

PROCEDURE PrintLine(LineToPrint : String; LineNumber : Integer);
CONST
 TabChar = Chr(9);
VAR

 I,J,LinePos,UpstreamPos,AddBlanks : Integer;
 Space8 : String80;
BEGIN
 Space8 := ' ';
 Write(LST,Space8,LineNumber : 4,' ');
 LinePos := 1;
 FOR I := 1 TO Length(LineToPrint) DO
 IF LineToPrint[I] = TabChar THEN { Expand tabs }
 BEGIN
 UpstreamPos := (((LinePos + 7) DIV 8) * 8) + 1;
 AddBlanks := UpstreamPos - LinePos;
 FOR J := 1 TO AddBlanks DO Write(LST,' ');
 LinePos := UpstreamPos
 END
 ELSE
 BEGIN
 Write(LST,LineToPrint[I]);
 LinePos := Succ(LinePos)
 END;
 Writeln(LST)
END;

PROCEDURE PrintHeader;
VAR
 I : Integer;
 Space8 : String80;
BEGIN
 Space8 := ' ';
 Writeln(LST,Space8,'FILE: ',ForceCase(Up,ListFileName),
 ' Version of ',FileTime.GetDateString,' ',
 FileTime.GetTimeString,'m Printed on ',
 Now.GetLongDateString,' at ',Now.GetTimeString,'m.',
 ' Page ',PageNumber);
 Write(LST,Space8);
 FOR I := 1 TO 116 DO Write(LST,Chr(196)); Writeln(LST);
 Writeln(LST);
 Writeln(LST);
END;

BEGIN { PrintFile }
 LineNumber := 1; PageNumber := 1; Space10 := ' ';
 ListFileName := ToBePrinted^.Path+ToBePrinted^.Entry.Name;
 Assign(ListFile,ListFileName);
 Reset(ListFile);

 IF NOT EOF(ListFile) THEN PrintHeader;
 WHILE NOT EOF(ListFile) DO
 BEGIN
 Readln(ListFile,ListLine);
 PrintLine(ListLine,LineNumber);
 LineNumber := Succ(LineNumber);
 IF ((LineNumber-1) DIV LinesPerPage) > (PageNumber - 1) THEN
 BEGIN
 PageNumber := Succ(PageNumber);
 SendFormFeed;
 PrintHeader;
 END
 END;
 IF (LineNumber MOD LinesPerPage) > 1 THEN SendFormFeed;

 Close(ListFile);
END; { PrintFile }

PROCEDURE SetupPrinter;
BEGIN
 SetLinePrinterFont;
 SetIBMCharacterSet;
END;

PROCEDURE PrintAllFiles(FilesToPrint : PDirEntryCollection);
{ This is the FAR local routine passed to the iterator method. }
{ It's called once for each item in the collection: }
PROCEDURE PrintOneFile(Target : PDirEntry); FAR;
BEGIN
 FileTime.PutWhenStamp(Target^.Entry.Time);
 PrintFile(Target);
END;

BEGIN
 { This is how you iterate a procedure over a collection: }
 FilesToPrint^.ForEach(@PrintOneFile);
END;

BEGIN { JLIST10 Main }
 IF ParamCount = 0 THEN
 BEGIN
 Writeln('>>>JLIST10<<< by Jeff Duntemann');
 Writeln(' Multifile listing utility');
 Writeln(' for the HP Laserjet Series II');
 Writeln(' Version of 12/31/91 -- Expands fixed 8-char tabs...');
 Writeln(' WARNING: Emits printer control strings that are');
 Writeln(' *highly* specific to the HP Laserjet II!');
 Writeln;
 Writeln('Invocation syntax:');
 Writeln;
 Writeln(' JLIST10 <filespec>,[<filespec>..] CR');
 Writeln;
 Writeln('where <filespec> is the file or files to be printed,');
 Writeln('using the DOS filespec conventions, including wildcard');
 Writeln('characters * and ?. A banner will be printed initially');
 Writeln('with a summary of all files to be printed IF any wildcard');
 Writeln('characters were entered as part of the file specification.');
 END
 ELSE
 BEGIN
 Now.PutNow; { Fill a When stamp with today's time and date }
 FileSpecs := ''; { Concatenate all file specs into 1 string: }
 FOR I := 1 TO ParamCount DO FileSpecs := FileSpecs+' '+ParamStr(I);
 FilesToPrint := New(PDirEntryCollection, InitCommandLine(128,16,1));
 IF FilesToPrint^.Count > 0 THEN
 BEGIN
 Writeln;
 Write('>>>Jlist10 is printing ',FilesToPrint^.Count,' file(s)...');
 SetupPrinter;
 IF FilesToPrint^.Count > 1 THEN PrintStartBanner(FilesToPrint);
 SetPrinterLinesPerInch(8);

 PrintAllFiles(FilesToPrint);


 PrinterReset; { Reset printer at job end }
 Writeln;
 END
 ELSE
 Writeln('No files match that file spec.');
 END;
END.























































May, 1992
GRAPHICS PROGRAMMING


Potato Heads, Fast VGAs, and More




Michael Abrash


I'm not a doomsayer who thinks American education lags hopelessly behind the
rest of the Westem world, but a recent experience did make me wonder. Not so
long ago, I received a letter from one Melvyn J. Lafitte requesting that I
spend some time in this column describing fast 3-D animation techniques.
Melvyn hoped that I would be so kind as to discuss, among other things, hidden
surface removal and perspective projection, performed in real time, of course,
and preferably in mode X. Sound familiar?
Melvyn shared with me a hidden surface approach that he had developed. His
technique invoked defining polygon vertices in clockwise order, as viewed from
the visible side. Then, he explained, one can use the cross-product equations
found in any math book to determine which way the perpendicular to the polygon
is pointing. Better yet, be pointed out, it's necessary to calculate only the
Z component of the perpendicular, and only the sign of the Z component need
actually be tested.
What Melvyn described is, of course, backface removal, a key hidden-surface
technique that we've used heavily over the past three months. In general,
other hidden surface techniques must be used in conjunction with backface
removal, but backface removal is nonetheless important and highly efficient.
Simply put, Melvyn had devised for himself one of the fundamental techniques
of 3-D drawing.
Melvyn is from Moens, France. At the time he wrote me, Melvyn was 17 years
old. Try to imagine any American 17-year-old of your acquaintance inventing
backface removal. Try to imagine any teenager you know even using the phrase
"the cross-product equations found in any math book." Not to mention that
Melvyn was able to write a highly technical letter in English; and if Melvyn's
English was something less than flawless, it was perfectly understandable,
and, in my experience, vastly better than an average, or even well-educated,
American's French. Please understand, I believe we Americans excel in a wide
variety of ways, but I worry that when it comes to math and foreign languages,
we are becoming a nation of ttes depomme de pomme de terre.
Anyway, Melvyn has gotten his wish over the last few months; we've been busy
with some pretty intense 3-D animation for quite a while now. It's time to
take a break and catch up with the mail.


16-bit VGA and a Monochrome Screen


A dual monitor system--one VGA, one monochrome adapter--is virtually mandatory
for serious PC graphics development, because the debugger output has to have
somewhere to show up without disturbing the graphics program being debugged.
Sure, you can use a remote system, but that requires two computers and tends
to be annoyingly slow. A two monitor setup is cheaper and more convenient, and
just about every graphics developer I know goes the two-monitor route.
Unfortunately, it's well documented that the presence of an 8 bit monochrome
adapter forces any and all, 16-bit VGAs to revert to 8-bit operation. (See my
May, 1990 DDJ article, Demystifying 16-Bit VGA.) Each access to the VGA can
then transfer only 1 byte, just as if the VGA had an 8-bit connector. At the
very least, that halves the per formance of word-sized accesses to VGA display
memory, as in the case of REP STOSW, which is typically used to fill
rectangles. At the worst, it also causes some buses to think the adapter was
designed for the PC, and therefore to insert additional wait states to slow
accesses down to the PC's miserably slow bus speed, typically halving the
performance of all accesses to VGA display memory. Worse, the two speed
reductions are multiplicative, so access to VGA display memory can take as
much as four times longer if an 8-bit monochrome card is installed. The end
result is that if an 8-bit monochrome adapter is installed, your souped-up 386
or 486 16-bit Super-VGA system can wind up with 8-bit VGA performance that's
scarcely faster than IBM's original adapter.
The solution is obvious: Buy a 16-bit monochrome card. Except that there
aren't any that I know of. Monochrome cards are low-end, low-margin items, and
no one is going to bother making a 16-bit version for the tiny fraction of
users who want two monitors and both know and care about the speed
implications of 8-bit monochrome adapters. (I'd gladly pay an extra $50 for a
16-bit monochrome adapter, in case anyone's listening.)
So a hardware solution isn't feasible but a software solution is, Listing One
(page 148) shows that solution, in the form of a utility, SETBUS, contributed
by STB BIOS guru Charles Marslett. SET BUS can force any VGA based on the
Tseng Labs ET4000 chip into 16-bit mode even if a monochrome adapter is
installed. Just run the utility, and your VGA will once again provide the
unfettered 16 bit performance you enjoyed before you installed a monochrome
adapter. Both John Bridges's VIDSPEED and my Zen timer measure raw memory
speed on the ET4000-based VGA in my dual-monitor system at 4.3 times faster
after SETBUS 16 than it normally is. Before you get too excited, bear in mind
that the improvement you'll actually see with real-world applications will
vary from one VGA-based package to another, depending on how intensively
display memory is accessed and whether word operands are used, and will
generally be less sometimes much less than four times. Windows, for example,
clearly spends most of its time doing things other than accessing display
memory; it's visibly faster after SETBUS 16, but not even two times as fast.
Still, SETBUS makes all VGA software at least somewhat faster, and it's a lot
cheaper than faster video hardware.
Is there a downside to SETBUS? There sure is. When SETBUS 16 is in effect,
your monochrome screen will do screwy things whenever you access it, because
the VGA interferes with those memory accesses. I can't even guarantee that you
won't damage something if you use the monochrome adapter while SETBUS 16 is in
effect, although I've been doing that myself with no problems. In short, use
SETBUS at your own risk.
So what good is SETBUS if if means you can't use your monochrome adapter? Oh,
you can still use your monochrome adapter just fine, you just can't get 16-bit
VGA at the same time. The trick is to use batch files so that you get 16-bit
VGA when you need it, and a ftinctional monochmme display when vou need that;
that is, use batch files to put your VGA in the right state at the right time.
For example. you could put SETBUS 16 at the start of the batch files you use
to start Windows and other color display-only programs, and SETBUS 8 at the
end; that way, graphics programs would run at full speed, but the monochrome
adapter would still work fine the rest of the time.
Yes, SETBUS is a bit of a nuisance, but it's worth it; what's the point of
paying for a 16-bit VGA if it's always an 8-bit VGA? SETBUS isn't perfect, but
it's given me back the full graphics performance of my system, and it'll do
just fine u ntil someone comes out with a 16-bit monochrome adapter.
Thanks, Charles!


X-Sharp Bug Fixes


Tom Moran, of Saratoga, Calif., decided to port the X-Sharp 3-D antmatton
package we've been developing over the past four months to Ada, and caught
three bugs in the process. Two of the bugs have no effect, but the third is
significant, because it's a bug in the matrix concatenation code. This bug
shows up only when the concatenation involves a translation as well as a
rotation, but when that happens, things go bad in a hurry. The bug rears its
ugly head when performing translation from world space to view space; I didn't
catch it myself because I'd performed only rotation to view space, not
translation. In any case, Listing One from last month fixes this problem (I
snuck the fix in at the last minute), and the current distribution version of
X-Sharp fixes all known bugs.
X-Sharp is available as the file X-SHARPn.ARC in the DDJ Forum on CompuServe,
on M&T Online, and in the graphic.disp conference on Bix. Alternatively, you
can send me a 360K or 720K formatted diskette and an addressed, stamped
diskette mailer, care of DDJ 411 Borel Ave., San Mateo, CA 94402.,and I'll
send you the latest copy of X-Sharp. There's no charge, but it'd be very much
appreciated if you'd slip in a dollar or so to help out the folks at the
Vermont Association for the Blind and Visually Impaired. I'm available on a
daily basis to discuss X-Sharp on M&T Online and Bix (user name mabrash in
both cases).


Keep the BIOS Informed


Bill Lindley, of Mesa, Ariz., wrote to suggest that when programming the VGA
to a nonstandard mode such as the 640x400 page-flipped 16-color mode I
discussed in the December column, it's a good idea to tell the BIOS about the
new screen size, for a couple of reasons. For one thing, pop-up utilities
often use the BIOS variables; Bill's memory-resident screen printer, EGAD
Screen Print, determines the number of scan lines to print by multiplying the
BIOS "number of text rows" variable times the "character height" variable. For
another, the BIOS itself may do a poor job of displaying text if not given
proper information; the active text area may not mawh the screen dimensions,
or an inappropriate graphics font may be used. (Of course, the BIOS isn't
going to be able to display text anyway in highly nonstandard modes such as
mode x, but it will do fine in slightly nonstandard modes such as 640x400
16-color mode.) In the case of the 640x400 16-color model discussed in
December, Bill suggests that the code in Listing Two (page 148) be called
immediately after putting the VGA into that mode, in order to tell the BIOS
that we're working with 25 rows of 16-pixel-high text. I think this is an
excellent suggestion; it can't hurt, and may save you from getting aggravating
tech support calls down the road.


Bifblt Compiling


Performance whiz David Stafford, of Borland Japan, sent along the strange but
effective idea of bitblt compiling. Bitblts, as you likely know, are
operations that combine parts of one or more source bitmaps and a destination
bitmap, using logical operations such as AND, OR, and replace. Bitblts are at
the core of bitmapped windowing graphics, and therefore deserving of
considerable attention to performance.
Normally, a standard, two-operand bitblt is processed by picking up each
source byte in turn, processing it if necessary (for example, for transparency
or monochrome-to-color expansion), and combining it appropriately with the
destination. The processing for transparency or color expansion can take a
fair amount of time, and even operations such as straight replace require that
the source be read, which takes at least some time.
Now imagine this. You, the programmer, have an 8x8 monochrome icon that you'll
repeatedly draw, color-expanded, with transparency. (That is, 1 bits in the
icon turn into the foreground color, which is then drawn to the destination,
and 0 bits in the icon cause the destination to be preserved.) You could do
this with code such as that in Listing Three (page 148), which color expands,
checks for transparency, and draws in a single step, as shown in Figure 1.
That works fine, but something seems a little out of kilter from a performance
perspective, because the color expansion and transparency check are done
exactly the same way every time the icon is drawn. Why do all that work n
times rather than once?
The compiled bitblt alternative, which does expansion and transparency just
once, is shown in Listing Four (page 150). Function CompileReplaceXpar() turns
the icon's data not into other data, but rather into code that performs a
bitblt with the transparent replace raster op. In other words, this function
compiles the icon's data, together with the raster op (replace), the color
expansion, and the need for transparency, into code that does exactly what a
standard blt would have done, as shown in Figure 2. Function ExecuteCompiled()
then executes the compiled code and produces precisely the same results as the
equivalent blt. There's one important difference, though: ExecuteCompiled()
doesn't have to do any testing, branching, or even reading of the source, so
it's much faster than Listing Three, three times faster, in fact, when the
sample test shown in Listing Five (page 150) is run on a fast 16-bit VGA.
Listing Four could be optimized further in a number of ways, such as rewriting
the bitblt compiler function in assembler, putting the compiled code in the
code segment to avoid far branches, and using 16-bit MOVs and 0- or 1-byte
addressing displacements whenever possible. For now, though. I just want to
give you a taste of the power of compiled bitblts--and a three-times speed-up
should do the trick.
Usually, it pays to turn code into data, in the form of jump tables, lookup
tables, state machines, and the like. Occasionally, though, it pays to turn
data into code, as in the case of compiled bitblts. Strange concept,
impressive results.


Suggested Reading



If you want a broad understanding of the math that underlies computer
graphics, I highly recommend Mathematical Elements for Computer Graphics,
Second Edition, by David F. Rogers and J. Alan Adams (McGraw-Hill, 1990, ISBN
0-07-053529-9). Unlike Foley and van Dam's Computer Graphics, this is not an
encyclopedic graphics reference, nor does it mean to be; rather, it pulls
together the mathematical theory behind several fundamental areas of computer
graphics. After the traditional and largely pointless first chapter on
graphics hardware, the book covers two-dimensional transformations,
three-dimensional transformations, plane curves, space curves, and surface
description and generation, all in a straightforward and thorough fashion.
This is not light reading, although I found it easier going than Foley and van
Dam; the tone is that of a textbook (albeit without exercises for the reader),
and an amazing volume of information is dispensed, in the form of clear,
concise explanations and examples, over the course of about 500 pages.
Particularly noteworthy are the 130 pages on space curves, including Bziers
and B-splines, and the 100 pages on surfaces. In short, this book is an
excellent and serious introduction to the mathematics of computer graphics.


Next Time


Next month, we'll do some more 3-D animation, and maybe something else as
well. There's way more in the topic backlog than I'll ever have room to cover
in this column, so please write and let me know what you'd like to see most.
Don't follow the lead of Gonzalo Medina, though, who wrote me from Venezuela
in Spanish. Although I had six years of Spanish in high school, I can't read
it for beans. So call me Mr. Potato Head, if you want--but write me in
English.
_GRAPHICS PROGRAMMING COLUMN_
by Michael Abrash


[LISTING ONE]

/* Utility to force an ET4000-based VGA into 16- or 8-bit operation even if
a monochrome adapter is in the system. (Note that only 16-bit memory access,
not 16-bit I/O, is enabled; the I/O state is not altered.) The monochrome
adapter won't work properly while SETBUS 16 is in effect. Tested with
Borland C++ 3.0. Courtesy of Charles Marslett of STB. Commented and
reformatted by Michael Abrash. */

/*****************************************************************************
 * This utility isn't known to cause problems, but use it at your own risk, as
 * the monochrome and VGA adapters will both respond to accesses to monochrome
 * display memory while SETBUS 16 is in effect, resulting in bus contention.
 ****************************************************************************/
#include <dos.h>
#include <stdio.h>
#include <string.h>

void main(int argc, char *argv[])
{
 int crtc, val;
 union REGS regset;

 if (inp(0x3CC) & 0x01) /* decide where to address the CRT */
 crtc = 0x3D4; /* Controller by reading the I/O */
 else /* Address Select bit of the */
 crtc = 0x3B4; /* Miscellaneous Output register */

 outportb(0x3BF, 0x03); /* key sequence to enable access to */
 outportb(crtc+4, 0xA0); /* ET4000-specific registers */
 outportb(crtc, 0x36); /* get the current setting of the Video */
 val = inp(crtc+1); /* System Configuration 1 register */

 /* Decide whether 16- or 8-bit access desired, and configure accordingly */
 if (argc == 2) {
 if (strcmp(argv[1], "16") == 0) {
 outportb(crtc+1, val 0x40); /* 16-bit memory access */
 goto ModeSet;
 } else if (strcmp(argv[1], "8") == 0) {
 outportb(crtc+1, val & 0xBF); /* 8-bit memory access */
ModeSet:
 regset.x.ax = 0x0003;
 int86(0x10, &regset, &regset); /* do a text mode mode set */
 exit(0);
 }
 }
 fprintf(stderr, "Usage: SETBUS 16\n");
 fprintf(stderr, " or SETBUS 8\n");

 exit(1);
}



[LISTING TWO]

/* Function to tell the BIOS to set up properly sized characters for 25 rows
of
 16 pixel high text in 640x400 graphics mode. Call immediately after mode set.
 Based on a contribution by Bill Lindley. */

#include <dos.h>

void Set640x400()
{
 union REGS regs;

 regs.h.ah = 0x11; /* character generator function */
 regs.h.al = 0x24; /* use ROM 8x16 character set for graphics */
 regs.h.bl = 2; /* 25 rows */
 int86(0x10, &regs, &regs); /* invoke the BIOS video interrupt
 to set up the text */
}




[LISTING THREE]


/* Draws the 8x8 monochrome icon pointed to by IconPtr at coordinates (X,Y) in
 the DestWidth-wide bitmap starting at DestSeg:0, using the transparent
 replace raster op and color Color. Destination bitmap must be an
 8-bit-per-pixel bitmap. 1-bits in the pattern are converted to drawing color
 and drawn, and 0-bits are skipped over, preserving destination (that is, are
 transparent). Tested with Borland C++ 3.0; when USE_C is 0, uses
 BC++ dependent inline assembly. */

#define USE_C 0 /* set to 1 to compile C code, 0 to compile assembler */
#if !USE_C
#pragma inline /* tell the compiler there's inline ASM code */
#else
#include <dos.h>
#endif

void DrawReplaceXpar(unsigned int X, unsigned int Y,
 unsigned int DestSeg, unsigned int DestWidth,
 unsigned int Color, unsigned char *IconPtr)
{
#if USE_C
 unsigned char far *ScreenPtr, Temp;
 int i,j;

 /* Point to the first destination pixel */
 ScreenPtr = MK_FP(DestSeg, Y*DestWidth+X);
 for (i=0; i<8; i++) { /* do the 8 icon rows */
 Temp = *IconPtr++; /* get the next icon row */
 for (j=0; j<8; j++) { /* do the 8 pixels per icon row */
 if (Temp & 0x80) /* draw this pixel if the */

 *ScreenPtr = Color; /* corresponding icon bit is 1 */
 ScreenPtr++; /* point to the next destination pixel */
 Temp <<= 1; /* shift the next icon bit into place */
 }
 /* Point to the start of the next row */
 ScreenPtr += DestWidth - 8;
 }
#else
 asm cld /* make LODSB increment SI */
 asm mov es,DestSeg /* point ES to the bitmap */
 asm mov ax,Y
 asm mov bx,DestWidth
 asm mul bx /* DestWidth*Y+X = offset of first */
 asm add ax,X /* dest pixel */
 asm mov di,ax /* point ES:DI to the first dest pixel */
 asm mov si,IconPtr /* point DS:SI to the first icon byte */
 asm mov dx,8 /* we'll do 8 rows */
 asm mov ah,byte ptr Color /* color we'll draw with */
 asm sub bx,8 /* offset from end of one dest row to start of next */
RowLoop:
 asm lodsb /* get the next icon row */
 asm mov cx,8 /* we'll do 8 pixels on each row */
PixelLoop:
 asm shl al,1 /* shift the next icon bit into CF */
 asm jnc NoDraw /* bit is 0, so don't draw */
 asm mov es:[di],ah /* bit is 1, so draw this pixel */
NoDraw:
 asm inc di /* point to the next destination pixel */
 asm loop PixelLoop /* do the next pixel */
 asm add di,bx /* point to the start of the next dest row */
 asm dec dx /* count down rows */
 asm jnz RowLoop /* do the next row */
#endif
}



[LISTING FOUR]

/* Compiled bitblit code for drawing the 8x8 monochrome icon pointed to by
IconPtr at coordinates (X,Y) in the DestWidth-wide bitmap starting at
DestSeg:0, using the transparent replace raster op and color Color.
CompileReplaceXpar() generates code to perform desired bitblit, and
ExecuteCompiled() executes that code to perform bitblit. Generally faster than
a standard approach when drawing the same icon many times, because this way
color expansion, transparency, and reading the source are only performed once,
at expansion time, rather than every time an icon is drawn. Destination bitmap
must be an 8-bit-per-pixel bitmap. 1-bits in pattern are converted to drawing
color and drawn, and 0-bits are skipped over, preserving destination (that is,
are transparent). Tested with Borland C++ 3.0; uses BC++ dependent inline
assembly. */

#pragma inline /* tell the compiler there's inline ASM code */

/* Generates far-callable code to bitblit the specified icon, and stores code
in BufferToCompileInto. Code is simply a series of MOV [DI+xxxx],AL
instructions, where xxxx is the offset from upper left corner of icon of
each pixel to be drawn. */
void CompileReplaceXpar(unsigned int DestWidth,

 unsigned char *IconPtr, unsigned char *BufferToCompileInto)
{
 unsigned int i, j, PixelOffset = 0;
 unsigned char Temp;

 for (i=0; i<8; i++) { /* do the 8 icon rows */
 Temp = *IconPtr++; /* get the next icon row */
 for (j=0; j<8; j++) { /* do the 8 pixels per icon row */
 if (Temp & 0x80) { /* generate the code to draw this pixel if the
 corresponding icon bit is 1. Code is the hex bytes
 for the instruction MOV [DI+PixelOffset],AL */
 *BufferToCompileInto++ = 0x88; /* MOV opcode */
 *BufferToCompileInto++ = 0x85; /* mod-reg-rm byte */
 *((unsigned int *)BufferToCompileInto)++ = PixelOffset;
 /* addressing displacement */
 }
 PixelOffset++; /* point to the next destination pixel */
 Temp <<= 1; /* shift the next icon bit into place */
 }
 /* Point to the start of the next row in the destination */
 PixelOffset += DestWidth - 8;
 }
 /* Put a RET at the end to return to the calling code, and done */
 *BufferToCompileInto = 0xCB; /* RETF instruction = 0xCB */
}
void ExecuteCompiled(unsigned int X, unsigned int Y,
 unsigned int DestSeg, unsigned int DestWidth, unsigned int Color,
 unsigned char far *BufferToExecute)
{
 asm push ds /* preserve the default data segment */
 asm mov ds,DestSeg /* point ES to the bitmap */
 asm mov ax,Y
 asm mul word ptr DestWidth /* DestWidth*Y+X = offset of */
 asm add ax,X /* first dest pixel */
 asm mov di,ax /* point ES:DI to the first dest pixel */
 asm mov al,Color /* color with which to draw */
 asm call dword ptr BufferToExecute
 /* perform a far call to execute the
 compiled bitblit code, and done */
 asm pop ds /* restore the default data segment */
}




[LISTING FIVE]

/* Sample program to tile screen with an 8x8 monochrome icon.
Tested with Borland C++ 3.0. */

#include <dos.h>
#include <conio.h>

#define USE_COMPILED_BITBLITS 1 /* set to 1 and link to Listing 4 to use
 compiled biblits, set to 0 and link to
 Listing 3 to use conventional bitblits */
#define SCREEN_WIDTH 320
#define SCREEN_HEIGHT 200
#define SCREEN_SEGMENT 0xa000


#if USE_COMPILED_BITBLITS
extern void CompileReplaceXpar(unsigned int, unsigned char *, unsigned char
*);
extern void ExecuteCompiled(unsigned int, unsigned int, unsigned int,
 unsigned int, unsigned int, unsigned char far *);
#else
extern void DrawReplaceXpar(unsigned int, unsigned int,
 unsigned int, unsigned int, unsigned int, unsigned char *);
#endif

static unsigned char TestIcon[8] =
 {0x88, 0x44, 0x22, 0x11, 0x11, 0x22, 0x44, 0x88};
#if USE_COMPILED_BITBLITS
/* Storage for the compiled icon-drawing code; must be large enough for the
largest possible code size, because no error checking is performed */
static unsigned char CompiledBuffer[1000];
#endif

void main()
{
 unsigned int x, y;
 union REGS regset;
#if USE_COMPILED_BITBLITS
 unsigned char far *CompiledBufferPtr;
#endif
 regset.x.ax = 0x0013; /* AL = 0x13 selects 320x200 */
 int86(0x10, &regset, &regset); /* 256-color graphics mode */
#if USE_COMPILED_BITBLITS
 /* Generate the code for drawing the icon with the transparent
 replace raster op, and store the code in CompiledBuffer */
 CompileReplaceXpar(SCREEN_WIDTH, TestIcon, CompiledBuffer);
 CompiledBufferPtr = MK_FP(_DS, CompiledBuffer);
#endif
 /* Tile TestIcon over the entire screen */
 for (y=0; y<SCREEN_HEIGHT; y += 8) {
 for (x=0; x<SCREEN_WIDTH; x += 8) {
#if USE_COMPILED_BITBLITS
 /* Draw the icon by executing the code in CompiledBuffer */
 ExecuteCompiled(x, y, SCREEN_SEGMENT, SCREEN_WIDTH, 14,
 CompiledBufferPtr);
#else
 /* Draw the icon with the transparent replace raster op, color
 expanding it and handling transparency on the fly */
 DrawReplaceXpar(x, y, SCREEN_SEGMENT, SCREEN_WIDTH, 14,TestIcon);
#endif
 }
 }
 getch(); /* wait for a key press */
 regset.x.ax = 0x0003; /* AL = 3 selects 80x25 text mode */
 int86(0x10, &regset, &regset); /* return to text mode */
 exit(1); /* done */
}










May, 1992
PROGRAMMER'S BOOKSHELF


The Grand Master of System Software




 Ray Duncan


Breathes there the computer science undergraduate with soul so dead, who never
to himself hath said "Knuth is God!" Apologies to Sir Walter Scott
Donald Knuth has become a legend in his own time for, among other things, his
multivolume algorithms reference The Art of Computer Programming and his
reinvention of computer typesetting. Even the average corporation-cubbyhole
applications programmer, who (according to Yourdon at least) has probably
never read any programming textbook at all, will be moved to genuflect when
the Knuth's name is mentioned--Knuth's books are so lucid, so comprehensive.
and so entertaining that they have become the standard against which all other
computer books are judged. Well, it's about time to expand the pantheon.
Andrew S. Tanenbaum, a professor of computer science at Vrije Universitet in
Amsterdam, is building up a library of texts on systems programming that
rivals Knuth's work in both quantity and quantity. In this month's
installment, I want to introduce you to Dr. Tanenbaum's four most important
efforts.
Structured Computer Organization's title makes it sound dry, formal, and
concrete, but the title is just Tanenbaum's way of waming you that he proposes
to describe the digital computer at every level,from the electrons to the
command prompt. The book begins with a brief history of mechanical and
electronic computing, from Babbage to the present. It then embarks on a grand
tour of computer architecture, beginning with the fundamentals of digital
logic and progressing through registers, memory, buses, microprogramming,
conventional instruction sets, addressing schemes, virtual memory, flow of
control, and operating systems. After a brief digression into the workings of
language translators, linkers, and loaders, Structured Computer Organization
finishes up with a survey of cutting-edge technology in RISC machines and
parallel processing. This book, while cosmic in scope, is a pleasure to read
and is an excellent starting point for any programmer who seeks a basic
understanding of the underlying hardware and its relationship to system
software.
Computer Networks attacks the intimidating topic of networking and
communications with much the same thoroughness and tenacity that we saw in
Structured Computer Organization. If there's any stone that Tanenbaum left
unturned in this book, I certainly didn't notice it. And just in time,
too--with the rapid decline in price of high-performance, twisted pair
Ethernet and even higher performance, fiber optic backbones, Local Area
Networks (LANs) and Wide Area Networks (WANs) are springing up like mushrooms
after a Redmond rainstorm, so we must all learn to speak the language of the
network gurus. Have you ever wondered about the significance of the seven
layers of the OSI reference model, or what ISDN, X.25, SNA, SDLC, and ASN.1
really stand for? Do you want to understand the differences between a
repeater, a router, a gateway, and a bridge? Do you feel perplexed and
alienated when people glibly talk about TCP/IP UseNet, token ring, and the
X.400 standard for electronic mail? Computer Networks will straighten you out.
I daresay I've at least browsed through nearly every general text on systems
software published during the last 20 years, but Tanenbaum's Operating
Systems: Design and Implementation is by far my favorite. All the fundamental
issues of operating-systems architecture and programming, from device drivers
to security and protection mechanisms, are beautifully explained in this
superb book. Unlike most of its competitors. Operating Systems: Design and
Implementation is not strong on theory but weak on practice, nor is it
scornful of the personal computer. In the latter part of the book. Tanenbaum
illustrates nearly every operating-system concept that he has discussed in the
form of full source code for a UNIX clone called MINIX that will run on the
IBM PC and the Macintosh. If you are not in the mood to type in the 250 pages
of MINIX source code, Tanenbaum thoughtfully makes it available on diskette,
together with all the development tools needed to rebuild the system. We're
definitely talking Hackers' Heaven here.
Tanenbaum originally intended for is newest book. Modern Operating Systems, to
be an updated edition of Operating Systems: Design and Implementation, but (as
he says in the introduction) the book pulled him in another direction and
ended up with a distinctly different emphasis. Although Modern Operating
Systems shares some material with its predecessor, the theoretical discussions
are interspersed with lengthy case studies of UNIX, MS-DOS, Mach, and a
University of Amsterdam experimental operating system called "Amoeba." The
special strength of Modern Operating Systems is its extensive treatment of
distributed operating systems, and distributed file systems, while its special
weakness is the alarming number of typographical errors. Surprisingly,
Tanenbaum barely, mentions several operat ing systems that you night think
would merit coverage in this book. such as VAX//VMS and the proprietary IBM
mainframe systems. (Perhaps he doesn't consider them modern enough!) OS/2 is
ignored. except in the context of MS-DOS:
What does the future of MS-DOS hold? Technically, it is completely obsolete.
Programming it is a nightmare. To the user, it is idiosyncratic and
unfriendly. IBM and Microsoft realized this years ago, and spent millions of
dollars prodticing a modern, powerful, and easy to use replacement, OS/2, only
to discover that the users were not interested.
But I don't want to leave you with the impression that Tanenbaum is singling
out MS-DOS for unfair criticism. In another place, he comments:
Many users, especially beginners, find the MS-DOS command line interface
cryptic at best and downright hostile at worst, probably even worse than UNIX,
whose shell has rarely been accused of being friendly to novices.
Indeed, Tanenbaum mostly takes a generous attitude toward MS-DOS, considering
his academic background, and takes care to point out its good features, such
as the extreme configurability that it gains from installable device drivers
and the CONFIG.SYS file.
When you consider all four of Tanenbaum's books together, the range of topics
this fellow can write about clearly and authoritatively is nothing short of
astounding. Every one of his books is an excellent investment that will serve
you well, and--if you are a serious programmer at least three out of the four
books should find a place in your library.






































May, 1992
OF INTEREST


Tami Zemel


ImageSoft is shipping IconoClass, a C++ application organizer jointly
developed by Glockenspiel and Coopers & Lybrand. IconoClass allows you to view
code by class, component, module, or application and logically group together
related classes, members, modules, and functions. Classes and member functions
can be a part of either single or multiple applications. You can import any
class library without modification and browse all the entities in C++.
IconoClass allows integration of work by several developers, while providing
locking to prevent accidental modifications.
IconoClass is portable across Windows, Presentation Manager, and Motif. The
price is $699 for Windows and $899 for Presentation Manager. Reader service
no. 20.
ImageSoft Inc. 2 Haven Avenue Port Washington, NY 11050 516-767-2233
Forth Dimenstons, published by the Forth Interest Group (FIG), has announced a
contest for authors writing about the Forth Programming Language. The theme,
"Forth on a Grand Scale," applies both to projects requiring many programmers
and to applications or systems with large amounts of code and/or significant
complexity.
Entries should include a hard copy and a diskette. The deadline is August 3,
1992.
Forth Interest Group P.O. Box 8231 San Jose, CA 95155 408-277-0668
Emu Net, a host/slave communications development tool that enables development
with only one side of the network present, has been released by Wm.
Christensen Communications Systems. Emu Net lets you simulate either the
master or the slave side of a network when that hardware is not available,
enabling development of a communications interface for connection to data
collectors, industrial controllers, and so on.
A database file is defined for the missing device to configure the hardware
setup and software protocol. Multiple data bases can be created to build an
unlimited protocol library, and sample master and slave databases are included
with the software.
You can define up to 20 send and 20 receive strings per database, including up
to 160 characters per string. A master system sends a string, waits for a
receive string, runs a compare function. and reports results; a slave system
receives character strings, runs a compare function, reports results and
responds with a send string to the master.
Emu-Net retails for $129.95. Reader service no. 21.
Wm. Christensen Communications Systems 30 Silverdome Industrial Park Pontiac,
MT 48342 313-858-2200
Programmer's "Bag of Tricks" is a new library from DataPak Software designed
to complement the Mac Toolbox. Bag of Tricks includes sections for list boxes,
modem and communications, offscreen drawing, scroll bars, dialog support, and
other routines. The library includes Think C source code, with interface files
for MPW and Think Pascal.
Programmer's "Bag of Tricks" sells for $134.50. Reader service no. 22.
DataPak Software 9317 NE Highway Vancouver, WA 98665-8900 800-327-6703 or
206-573-9155
Double Click Software has announced Winix, a visual UNIX toolkit for Windows.
The development environment, which runs under Windows, presents a visual,
point-and-click interface for each UNIX tool in the toolkit that makes it easy
for experienced UNIX programmers to develop Windows 3 apps. The tools allow
you to: point and click between directories, perform operations on text files,
and use the notation of regular expressions to search for, filter, and replace
information.
Winix also aids in advanced operations that require stringing commands
together via the UNIX scripting language. The graphic symbols for each tool
are linked with a click of the mouse. The, flow of text, generated by each
operation can be visually directed to files, windows, or subsequent
operations; when text is directed to a window, Winix automatically calls its
text browser so that the results can be reviewed.
Winix is available for $35, a free copy of Double Click's DCedit text editor
for Windows is included as an introductory offer. Reader service no. 23.
Double Click Software Inc. 3833 Washburn Avenue South Minneapolis. MN 55410
612-920-7829
FLL 3.2. Fifo Elcktronik's serial communications analyzer, uses standard
serial ports COM1 and COM2 to debug, log, and generate asynchronous data.
Incoming bidirectional data is logged in hex/ASCII format, allowing detection
of normally unprintable characters, while a programmable trigger facility
avoids unnecessary data sampling.
Data can be transmitted from the keyboard, files, or user-programmable
strings. Together, the trigger and the string-transinission feature can
transmit a string as a response to received data. Control signal inputs are
monitored continuously on screen, and outputs are set as needed.
The FLL BreakOut Box (FLLBOB) is an additional option that can be used to
simplifv the physical connection to the serial line. Supports standard baud
rates of up to 115,200.
FLL 3.2 retails for $95; FLLBOB costs $195. Reader service no. 24.
Fifo Elektronik Slttervgen 4 203 51 Halmstad Sweden +46-35-101230
Knowledge Shop 2.0 is a development tool and C-code generator newly released
from Decision System Software. Written in C++, Knowledge Shop was designed for
building rule-based system into applications. Features include a graphical,
direct-manipulation interface, immediate testing at any stage of completion,
color-coded rule tracing, automatic elimination of rule logic gaps, automatic
documentation, automatic ANSI C source-code generation, and more.
With Knowledge Ship, you visually link nodes to construct a data flow diagram
that models the problem, import or create rules to define the data flow
diagram's processing, then automatically generate ANSI C source code that can
be embedded in any C or C++ program.
Supports full math and string functions and goal and data driven processing
methodologies.
Knowledge Shop costs $495, no proprietary libraries or runtime modules are
required. Reader service no. 25.
Decision System Software 160 West Street Cromwell, CT 06416 203-632-7570
The Trace Research and Development Center at the University of
Wisconsin-Madison, in conjunction with IBM, has developed AccessDOS, a TSR
program that helps circumvent many problems encountered by disabled computer
users. Significant features include: StickyKeys, which allows performance of
multiple-key operations using a single finger or typing stick; MouseKeys,
which allows the numeric keypad to perform all mouse functions; ToggleKeys,
which provides audible tones for toggle-key status; RepeatKeys, which controls
auto-repeat, including the rate at which keys repeat and when auto-repeat
commences; SlowKeys, which controls the amount of time a key must be pressed
before it is accepted as input, BounceKeys, which controls the amount of time
a key must be released to prevent problems with accidental multiple key
presses. SerialKeys, which allows performance of keyboard and mouse functions
using external computer interfaces designed for people with disabilities; and
ShowSounds, which gives a visual indication of warning beeps and alerts.
AccessDOS is free and can be ordered through IBM by calling 800-426-7282.
Reader service no. 26.
Trace Research & Development Center University of Wisconsin-Madison 5-151
Waisman Center 1500 Highland Avenue Madison, WI 53705 608-262-6966
New from Intel is the C860 development toolkit, a DOS-based C cross-compiler
kit that generates code for the 64-bit i860 RISC processor. The kit includes a
C compiler, assembler, linker, utilities, and source-level debugger.
Features include: software pipelining and instructon-scheduling optimization
to generate code that uses the i860 CPU's dual operation mode and
dual-instruction mode; and built-in vectorizer that rearranges code and data
dependencies to take advantage of on-chip pipelines and caches for maximum
performance in matrix calculations. The compiler conforms to the ANSI standard
and generates code in COFF format. The debugger has a windowed interface and
allots you to load, execute, and debug code on i860 CPU target systems at the
source code level as well as view and modify memory and registers.
The C860 tool kit costs $4000. Reader service no. 27.
Intel Corp. 3065 Bowers Avenue Santa Clara, CA 95052 8065 800-874-6835
AnSoft has announced the PGL ToolKit, a set of graphics libraries for
producing high-resolution printer output that supports C, Basic, Pascal,
Fortran, Clipper, and assembly language.
The libraries may be used by themselves or integrated with any screen graphics
library. They support high-resolution, black-and-white and color output; PCX
file format; the ability to preview drawings on screen; and printing through
parallel or serial port interfaces. Programming is device independent, and
more than 80 functions are included for creating vector drawings and/or
printing user-generated bitmap images.
The PGL ToolKit costs $195. Reader service no. 28.
AnSoft Inc. 8254 Stone Trail Ct. Laurel, MD 20723 301-470-2335
SPLICER, a genetic-algorithm tool used to solve search and optimization
problems, has been released by COSMIC. SPLICER provides framework and
structure for building genetic-algorithm applications.
Genetic algorithms iteratively apply genetically inspired operators to
populations of potential solutions, and create new populations while solving
the problem at hand. SPLlCER's Genetic-Algorithm (GA) kernel comprises all
functions necessary to manipulate populations, including creation of
populations and population members, the iterative population model, fitness
scaling, and general parent selection and sampling functions are defined and
stored in interchangeable fitness modules created in C. Within a fitness
module, you can create a fitness function, set the initial values for various
SPLICER control parameters, create a function to graphically display the best
solutions, and describe the problem. SPLICER is written in Think C, has an
event driven user interface, and provides graphic output in windows.
The price for SPLICER, including example executables and source code, is $200.
Reader service no. 29.
COSMIC University of Georgia 382 E. Broad Street Athens, GA 30602 404-542-3265
Microsoft has released version 7.0 of their C/C+ + compiler. C/C++ 7.0
contains a comprehensive set of tools for Windows 3.1 and conforms to the C++
AT&T 2.1 specification. The Microsoft Foundation Classes provide objects for
Windows, with over 60 C++ classes, including classes for the Windows graphics
system (GDI), Object Linking and Embedding (OLE), and menus. C++ source code
is included for all foundation classes.
The suggested retail price is $499; upgrades are $139. Reader service no. 30.
Microsoft Corp. 1 Microsoft Way Redmond, WA 98052-6399 206-882-8080
John Wiley & Sons has announced a new series of programming books from The
Coriolis Group. The series includes such titles as Practical C++ Algorithms
and Data Strnctures, by Bryan Flamig. This volume applies to a variety of
compilers and development environments and includes the following: guidelines
for building abstract data types from data structures and algorithms; coverage
of quick sort, merge sort, hash tables, B-trees, and string searching and
parsing; and examples of array, linked list, stack, queue, and tree-structure
implementations.
Also in the series is Advanced Graphics Programming Using C, by Loren Heiny.
Subjects dealt with include: ray tracing; manipulation of large
geometric-object libraries; patterned, textured, and mirrored surfaces;
fractal landscapes; natural-object rendering; and tips on storing and printing
image files.
To receive a list of additional titles, contact the Coriolis Group. Reader
service no. 31.
The Coriolis Group 7721 E. Gray Road, Suite 204 Scottsdale, AZ 85260
602-483-0192
Release 3.0 of XVT's GUI Toolkit is now available. New to this version is a
more efficient and flexible event dispatching model and several event types.
Both dynamic and resource based window definition and creation are now
possible, hierarchical menus are supported, and new portable control types are
included. Many functions, variables, and macros have been rewritten as per the
new features, and the 2.0 APT has been recreated with an upward compatibility
layer of portable code. Thus, existing programs will run as before or better.

Prices for XVT 3.0 start at $1450, depending on the GUT you, need to support.
Source code is available. Reader service no. 32.
XVT 4900 Pearl East Circle Boulder CO 80308 303-443-4223




























































May, 1992
SWAINE'S FLAMES


Irish Facts




Michael Swaine


Such of my prose as sees the light of publication about this time every year
tends to brocade its sentences, bleed about the heart, and bay at the moon.
I can see it coming, but can't seem to stop it; it affects anything written
within about a week of Saint Patrick's Day. This though I'm only a quarter
Irish, don't know from what county said quarter came, and have only the most
cookie-cutter conception of what it is to be Irish.
My idea of, say, County Cork, for example, is Blarney Castle, stone-fenced
farms, ruddy-faced children, and cheery wakes, while in fact County Cork is
now Silicon Glen, and Cork itself the site of that model of automation,
Apple's European manufacturing plant. Leprechaun-height robots rolling across
the stone floors, I fancy, as I sit here typing on one of those same Cork
computers.
Well, not really. My Mac II was probably built in California, but it makes a
better story if it's a Cork box, that being what is called an Irish fact.
Called that at least by Hugh Kenner, whom you more likely know for his book
reviews in Byte magazine than for his books about the Irish soul, and he no
more Irish than myself.
The Irish soul, Kenner says, or actually it was University of Cork professor
John Montague, although Kenner said something similar, is heavy with loss and
language, and loss of language. Irish writers changed from Gaelic to English
in the 19th century, a language loss that has expressed itself most often in
the metaphor of blindness. according to Montague, and I merely mention that
the latest studio CD from those Dublin dropouts, U2, features a song called
"Love is Blindness."
Citing U2 as emblematic of the Irish soul shows how superficial is my grasp of
said soul, as does my mentioning, with a meaningful eyebrow raised at CD-ROM
vendors, that said CD is packaged in recyclable and biodegradable cardboard.
But U2's medium is vocal, and as Hugh Kenner points out, and it is Kenner this
time, all language has its roots in vocalizations. There's surely a Master's
thesis in the question of whether computer languages are more popular when
their lexicons are pronounceable. Or in the case of Forth, their dyslexicons.
Which, by way of preamble, more or less introduces a story I recently read by
a more or less Irish writer on the subjects of automation and that precise and
slippery language, the law. San Francisco Examiner reporter Kathleen Sullivan,
who was born in Oregon but is more Irish than Kenner and I put together, wrote
in the March 15th issue a story that spoke to the Irish in me. Postal workers
in the United States, Sullivan reports, are suffering from the automation of
the 1960s. The late '60s-vintage letter sorting machines, it seems, are
injuring workers right and left, with up to 90 percent of workers in Peoria
showing some symptoms of injury from the equipment, and 21.5 percent of San
Francisco workers showing clear carpal-tunnel syndrome symptoms. The machines
don't meet OSHA standards for key travel distance, the employees are required
to produce way more key presses per minute than is regarded safe. The Postal
Service has been repeatedly cited by OSHA for willful violations of employee
safety standards and has repeatedly failed to comply.
It doesn't have to, you see. As a quasi-governmental agency, the Postal
Service is not subject to the fines or shutdowns or legal actions that OSHA
would by now have taken against a purely private-sector firm. But the
government is, in this case, exempt from the law. That's an American fact.
The Postal Employee Safety and Health Act, expected to be introduced in
Congress this month, may solve the problem, but the whole mess underscores the
fact that user-interface design is not just an aesthetic issue. It can be a
matter of health and safety.
The Irish whose soul Hugh Kenner knows would have known how to deal with the
Postal Service. When Ireland's govemment was making the transition to English,
although still keeping Irish as the official language, Irish being what Gaelic
is called by the Irish, few of whom spoke it even then, many Irish patriots
staged demonstrations at theIr post offices, demanding to be served in their
native language, notwithstanding they didn't understand it. Afterward, they
wrote letters to their local papers about the matter, in impassioned English.
It's a fact.
Michael Swaine editor-at-large





































June, 1992
June, 1992
EDITORIAL


Yes, You Can Make a Difference




Jonathan Erickson


Those of you who have sent in your disks and dollars over the past few months
have shown that you do care and that we can make a difference. Thanks to you,
the DDJ "Careware" project has already helped hundreds of people throughout
the country and we hope to help many, many more. Briefly, "careware" is
software developed by Dr. Dobb's authors and distributed to readers. Any money
sent in for the software is donated by the authors to charitable organizations
of their choosing.
Credit goes foremost to contributing editor Al Stevens who launched the
Careware project with D-Flat, the C library he developed and sends to anyone
who mails in a formatted diskette and self-addressed stamped mailer. All Al
asks in return is that if you care, enclose a dollar or two that he can give
to the Brevard County Food Bank in Florida. Over the past year, you have
generously donated nearly $3000 to the Food Bank.
With his X-Sharp 3-D animation library, contributing editor Michael Abrash
followed Al's lead. Michael sends his toolkit to anyone who mails in a
diskette and SASE mailer and, if you include a couple of dollars, Michael
forwards it to the Vermont Association for the Blind and Visually Impaired. In
the past month alone, you've donated over $100 to this fine organization.
Now that 386BSD is available, Bill and Lynne Jolitz are the most recent
participants in the Careware project. Bill and Lynne will be providing
"Tiny386BSD," a minimal implementation of a UNIX-like operating system for
386/486 PCs. With Tiny386BSD, you can boot your PC and experiment with a
UNIX-like operating system by performing some minimal UNIX-like operations. In
reality, Tiny386BSD is the 386BSD boot disk-- a single floppy that lets you
configure the PC's hard disk and load the binary floppy distribution (if you
have it).
To get a copy of Tiny386BSD, send a high-density 3.5- or 5.25-inch diskette
and SASE mailer to Tiny386BSD, DDJ, 411 Borel Ave., San Mateo, CA 94402.
Again, if you care to enclose a dollar or two, Bill and Lynne will donate it
to the Children's Support League of the East Bay, a clearing house that
benefits disadvantaged children. Some of the agencies receiving assistance
from the League have taught disabled children to learn and play through
computers, given respite weekends to abused children, and set up after-school
programs in depressed neighborhoods.
The complete 386BSD system is available over the Internet at both reyes.
stanford.edu and agate.berkeley.edu in the /pub directory as well as numerous
other ftp sites. We're also planning to make 386BSD available via M&T Online
and the DDJ Forum on CompuServe.


Speaking of Internet Access...


For those of you who have been asking for DDJ source code listings via the
Internet, they're available for anonymous ftp from site ftp.mv.com in the
/pub/ddj directory. Login as "anonymous" and give your network address as the
password.


Win a PowerBook! Enter Our Handprinting Recognition Contest


The DDJ handwriting recognition contest is rapidly moving forward. By the time
you receive next month's issue, the test harness with sample data will be
complete and available. Currently the code has been tested on the Macintosh,
SPARC, and DOS platforms.
But the really exciting news is that, thanks to Apple Computer, the
first-place prize for the contest will be a Macintosh PowerBook 100 notebook
computer.
Remember--you don't need a pen computer to participate in the contest. The
test harness is platform-independent. You simply plug in your C function and
check the results.
Turn to page 168 of this issue for more details on the contest and watch next
month for an in-depth description of the harness and rules.


The Kent Porter Scholarship


Once again, I'm pleased to announce that we're accepting applications for our
annual Kent Porter Scholarship, an award for computer science majors enrolled
in accredited colleges and universities. Kent was a longtime DDJ columnist,
editor, and programmer who passed away in 1989. In his memory, we established
a scholarship and, over the past couple of years, have awarded half a dozen
grants.
The purpose of the scholarship is to recognize academic achievement and
potential, and to financially assist continuing students in the pursuit of
their educational goals. At the request of Kent's family, special
consideration is given (but not limited) to students raising children while
attending school. Scholarships will be awarded in increments of $500 for the
coming year.
To apply for a scholarship, request in writing an application from: The Kent
Porter Scholarship, Dr. Dobb's Journal, 411 Borel Ave., San Mateo, CA 94402.



















June, 1992
LETTERS







GNU Notes


Dear DDJ,
I read Ray Duncan's historical book review ("Programmer's Bookshelf," March
1992) with amazement. He feels Richard Stallman is an angry iconoclast tilting
at windmills. Mr. Stallman writes the finest C compiler I've ever used (and
I've used more than a dozen). In the GNU Manifesto, Mr. Stallman writes, "I
consider that the golden rule requires that if I like a program I must share
it with other people who like it." I agree with this attitude: It is much
easier and more productive to give away software I've found useful to other
computer users.
Mr. Duncan says, "At present, he is furiously rewriting a 20-year-old
operating system so that he can give it away to spite AT&T--while the rest of
the world moves on to new operating system architectures, new programming
paradigms, and new user interfaces." But with low-cost workstations, everyone
is adopting UNIX and going away from proprietary architectures, which Mr.
Duncan seems to be endorsing. What is newer is not necessarily better. X
Windows, a networked graphical standard, runs well on modern processors
running UNIX. X Windows is freely available in source form, and is an
excellent example of programming.
I do agree with Jeff Duntemann's March "Structured Programming" column. It is
fundamental that the problem be understood in the analysis. (It is useful if
the implementor has the problem.) I agree that clear text is always more
useful than a lot of meaningless diagrams. I've also found very good software
writers are normally quite good English writers--both share language.
Arpad Elo's letter in the March "Letters" column also raised a useful point.
The point was that current technology is worse (I think it is much better),
but in order to reuse something, one has to know it exists. While the
complaint is registered that linking an empty program requires 30 seconds,
this is only the case with this system, not necessarily the general case.
In the same issue, Michael Swaine raises the point in his "Programming
Paradigms" column that "anything a small company can do, a well-managed large
company can also do." I disagree. Companies don't write programs, people do.
Most successful software companies are started by excellent programmers. As
they grow, it is much more difficult to find these talented programmers.
Marty Leisner
Rochester, New York
Dear DDJ,
While Ray Duncan has a right to his opinion of the GNU project, I would like
to correct certain statements that are simply inaccurate.
The purpose of the GNU project has nothing to do with spiting AT&T. Even AT&T
does not regard GNU in this light; in fact, Bell Labs has provided support to
the Free Software Foundation. We are angry at AT&T now because of its patent
threats against the users of X Windows, but until this began, GNU developers
felt no particular enmity for that company.
The actual purpose of the GNU project is a positive one--to give users
freedom. Specifically, free software gives users the freedom to study, share,
change, and improve the software that they use. In 1983, I wanted to have
those freedoms when using software. It seemed that the only way I could
achieve them was to write the software myself--so I set to work. Others who
used my programs felt inspired to join in the effort, and we have produced a
large body of useful software.
But why write a UNIX-compatible system rather than something completely new?
Because this was the most reliable way to write a system that users would find
usable. Writing an entire software system is a large enough task (nearing
completion after eight years) even if we do not always explore uncharted
territory. Today, users' demand for systems to be UNIX-compatible seems to be
ever-increasing, and our Mach-based multiserver system promises to be among
the most powerful and clean.
If we had chosen to add to proprietary software rather than replacing it, we
could have gone "farther" in a purely technical sense. Whether this would have
been better is a matter of values. I value freedom more than technical
advances, so I'm happy with the choice I made. Others who value mainly
material things may see the GNU project as pointless (though some do value GNU
software purely for its technical quality, or because they appreciate being
able to fix problems when they wish).
Duncan's disappointment with the GNU project reveals his own values. He
doesn't share our values, but he should recognize the difference between
succeeding at our own goals and failing at his.
Richard Stallman,
Cambridge, Massachusetts


Neil Loves Bob


Dear DDJ,
I was quite excited to read David Betz's article about Bob ("Your Own Tiny
Object-Oriented Language," September 1991). Bob is an object-oriented language
that is small enough to wrap my arms around, and I don't have to go learn
weird syntax like Smalltalk.
I have Bob running on my 386 at home and on my Sparc at work. Pretty cool!
I don't really understand how strings work or what string functions are
available (none except concatenation?) or how to build them. Can I index into
strings like an array?
Was a console going to be added to Bob? (Good old read-eval-print!) I guess
I'm looking for the ability to load Bob source files and to execute arbitrary
expressions. (If load were a Bob command, it might be simpler.) A console
would greatly aid in debugging Bob programs.
My eventual goal is to make Bob callable from Visual Basic. I'd rather code in
Bob than Basic. To be useful, I would have to send Bob strings (expressions)
to execute just like in a console.
Please let me know what your development plans are for Bob. I wouldn't want to
duplicate efforts. Are there any good books on interpreters I can read?
Neil Galarneau
Malden, Massachusetts
David responds: I'm glad you've enjoyed Bob! I figured there were people who
might enjoy a C-like language rather than my usual fare of Lisp-like
languages.
I'll start with your first question. Strings are implemented as arrays of
characters. You can use literal strings by including them in double quotes in
your source file (just like in C). You can dynamically allocate strings at run
time by using the newstring() function. Once you've got a string, you can
index it just like a regular array. The only difference is that you can only
store characters into a string, whereas you can store an object of any type in
an array. You're right that the only built-in string function is
concatenation, but you can write your own string functions using indexing and
concatenation.
I'd love to add a Lisp-style listener window to Bob. The reason that I didn't
do that right off the bat was that it isn't trivial to allow either
expressions or definitions at the top level in C. It wasn't really designed to
be an interactive language and requires a fair amount of lookahead to tell the
difference between a function call and a function definition. My parser wasn't
able to backup and retry when it failed to build a function definition from
the input. I'm planning on solving this problem eventually. Also, a Visual
Basic interface would be relatively easy to build once Bob has an interactive
mode.
Most of the books on interpreters that I've read have been about Lisp
interpreters. I'm sure you didn't miss the fact that Bob is really a Lisp-like
language in disguise. I started out with an old book by John Allen called The
Anatomy of Lisp and have also used Abelson and Sussman's book Structure and
Interpretation of Computer Programs.


Swap Savvy


Dear DDJ,
I was interested to see Greg Renzelman's suggestion for a generic swap macro
in the April 1992 "Letters" column. The problem is one of a general category
of problems in which one needs to perform the same operation on different
types of data. Common arithmetic operators fall into this category, but the C
compiler selects the correct function or generates the correct inline code for
the data type automatically.
It is a class of problems at which C++ excels. So my answer to "C Q&A" #36 is
to use C++. C++ allows you to overload functions and operators explicitly. For
example, the effect of the generic swap macro can be obtained in C++ by
defining functions such as that in Example 1(a) for each data type.
Example 1

 (a)


 inline swap (int a, int b) {int t; t=a;a=b;b=t;}
 inline swap (float a, float b) {float t; t=a;a=b;b=t;}

 (b)

 define swap (a,b) ((a)^=(b), (b)^=(a),(a)^=(b))

 (c)

 #define Swap_(A, B) MemSwap (&A, &B, sizeof (A))
 void MemSwap (void *A, void *B, unsigned Len);

Whenever you use the swap function, C++ will select the appropriate version
based on the data type of the arguments. The inline specification causes C++
to generate inline code--just like a macro--rather than a function call.
It is possible, by the way, to write a semigeneric swap macro in C without
using a temporary variable, as shown in Example 1(b). This works for char,
int, and long values, but not for pointer or float values. Compilers tend to
balk at performing the exclusive OR operation on pointers and floating-point
values.
Walter Williams
Enfield, Connecticut
Dear DDJ,
Greg Renzelman's letter talks about writing a generic swap macro. The main
problem I see with his approach is that a large amount of code will be
generated for each macro invocation. It would be more space efficient to write
a function combined with a macro to accomplish the task, as in Example 1(c),
where the body of MemSwap corresponds, more or less, to the body of the macro
Greg defines. If speed is at a premium (as it usually is), then the MemSwap
function can be optimized or written in assembly language.
Stuart Downing
Dexter, Michigan


Link Clipper and Sail, or The American Way


Dear DDJ,
In the March 1992 "Letters" column, Arpad Elo, Jr. expressed opinions about
libraries, calling conventions, etc. The statements made perfect sense and
reflect my beliefs as well. However, I also found some comments which I have
mixed feelings about, specifically concerning Nantucket's Clipper.
First of all, Clipper is not a Nantucket database language. Clipper is
actually a derivative of dBase, marketed by Ashton-Tate, which is now owned by
Borland. To further complicate the issue, its syntax is similar to JPLDIS,
which is reportedly the role model.
Next is the fact that Clipper produces a 160K EXE for a NULL program. This is
true, but Clipper is meant to work with databases. As a consequence, it must
also support indexes, relations between databases, file I/O, error handling,
keyboard handling, etc. As a matter of fact, Clipper is quite possibly the
easiest language in the world for coding pop ups. These pop ups can give the
user vital information during the execution of an application and also give
the program the capability for on-line help with a minimum of coding. I have
used these features to give help at the press of the F1 key and allow changes
to the help system at run time. All of that in a little over 20 lines. With
this "gorilla" you get a host of valuable functions.
My third observation concerns Compile/Link speed. In every version of Clipper
the culprit is the linker. How do you improve this? Replace the linker. I code
heavily in C and assembly and as such, I own several Borland products. In
these packages comes a great little linker, TLINK 1.0. It may be slightly
limited (it can't generate overlays), but it is only 10K and can link faster
than any other linker I have tried. Next on the list is the linker that comes
with Borland C++. This linker is also fast and is extremely capable. If you
don't have these, Microsoft's linker and also BLINKER claim fast linking.
With TLINK 1.0, Clipper Summer '87 can compile/link tons of source and custom
libraries and produce a full app in 18 seconds on a 16-MHz 386SX.
You could also enhance linking by using caching programs or creating a huge
RAM disk and putting your libraries on it. There's more than one way to make
Clipper fast.
Also I have heard that Hello World can be reduced to approximately 5K with
Clipper 5.0, using its runtime capabilities.
The point of this letter is this: Pick the right language for the job. Don't
use Clipper to do non-database jobs, and don't think about writing a database
app in C or assembly language. It can be done, but the time it would take to
code a multiuser, multiple-database application with record- and file-locking
features, online help and the myriad of other features it would need, would be
unbearable.
I believe the best programmer is the one who knows several languages, realizes
their benefits and drawbacks, and has the sense to code using the optimum
language. If there isn't one, then roll your own. Isn't that what America's
about?
John McKnight
Bad Axe, Michigan


Calling Aunt Rose


Dear DDJ,
Congratulations on a consistently superb publication. Dr. Dobb's Journal
continues to set a mark all computer-related magazines would do well to
emulate.
I can only echo the sentiments in Jonathan Erickson's excellent February 1992
Editorial on the RBOC/BBS confrontation. As he implied, the issues may be
wider than just bulletin boards. Essentially, the local telephone companies
are desperately seeking to maximize their take from any use of telephone
service beyond a voice call to Aunt Rose. Effective data use of the wires by
the mass of computer users is being held hostage--to the detriment of our
society and economy. The stand-off between RBOC and BBSs is simply the most
visible and current confrontation.
Jack Rickard Editor, Boardwatch Magazine
Lakewood, Colorado














June, 1992
PERSONAL SUPERCOMPUTING


Cray's ideas turn a PC into a virtual-memory 64-bit supercomputer




Ian Hirschsohn


Ian holds a BSc in Mechanical Engineering and an MS in Aerospace Engineering.
He is the principal author of DISSPLA and cofounder of ISSCO. He can be
reached at Integral Research, 249 S. Highway 101, Suite 270, Solana Beach, CA
92075.


For any number of reasons, scientific and engineering computing has
historically been the domain of mini- and supercomputers. For starters,
scientists and engineers were the early computer users, and mainframes were
the first system on the scene. Furthermore, the number-crunching needs of
scientific applications require huge programs to manipulate and analyze vast
amounts of data, and it takes big systems to provide the necessary horsepower.
While microcomputers have made incredible in-roads in virtually every field,
they've come up short in satisfying the computing needs of scientists and
engineers. However, recent advances in multiprocessing architectures may
finally be starting to tip the scales in favor of PCs in the engineering
arena.
You can now, for instance, assemble EISA 486-based PCs with plug-in,
multiprocessor 64-bit RISC cards and gigabyte disks that provide you with the
effective performance of conventional mini- or mainframe computers--and for
well under $20,000. In effect, you can have on your desktop a "personal
supercomputer" that performs number-crunching tasks formerly relegated to big
systems. But even though you can build a multiple-processor personal
supercomputer, you'll still find your work cut out for you if you try to
download multi-megabyte mainframe applications. Why? Because DOS and UNIX, the
dominant PC operating systems, were never oriented toward the mainframe
environment. DOS has yet to tap 32-bit protected mode--let alone 64-bit RISC
processors--and multiprocessor support is still in UNIX's future.
In short, mainframe applications involve more than just compiling existing
programs with Lahey Fortran using a Phar Lap 32-bit DOS-Extender--many apps
are so big that they require virtual memory. These applications are typically
floating-point intensive. Consequently, the five- to six-digit slide-rule
precision of 32-bit floating point produces results that are pure fantasy
after a few iterations. Typically, they are also data intensive, demanding
maximum bandwidth disk and device I/O with binary record-oriented data
handling. (DOS and UNIX byte-stream I/O was not designed for that.) DOS and
XENIX have yet to speak 9-track tape and IBM 3480 cassette. So how can you
port mainframe data to a PC?
To execute mainframe, you have to think mainframe; and to think mainframe, you
must be familiar with the ideas of Seymour Cray, the genius behind today's
supercomputing and mainframe architectures. In this and future articles, I'll
discuss a PC-based system called PORT that adheres to Cray's design
principles, thereby enabling what I call, "personal supercomputing." PORT is a
software environment somewhat analogous to, say, Desqview plus Phar Lap. A key
difference is that PORT is portable, so its host system and processor(s) are
determined by the author of its CP/PP interface programs (discussed later).
The computation- and I/O-intensive applications I will examine demand a
386/486 PC (or clone) "muscle machine" with 16+ Mbytes RAM, 200+ Mbytes hard
disk, TI340x0 graphics accelerator with one or more plug-in i860 cards--under
DOS 4.0 or higher. i860 cards are available from CSPI, Microway, Alacron, and
others--I used a Hyperspeed D860. In contrast, my next article will use a
386SX laptop "wimp" with 4 Mbytes RAM and 60+ Mbytes disk--sans i860.
Thanks to Cray and his approach, PORT is able to implement almost any
multi-megabyte mainframe application on a PC with plug-in RISC coprocessor(s).
PORT's ability has been demonstrated with seismic cross-sections, real-time,
photographic-image manipulation, projected world maps, large printed-circuit
board autorouting, complex-image three-dimensional solid modeling, NC tooling,
and other compute-/data-intensive applications.
PORT represents a "super DOS extender" that turns a 386/486 PC into a
virtual-memory, 64-bit, mainframe-capable machine, while maintaining full
compatibility with DOS and Windows. Any system which presumes to supplant the
wealth of PC applications is a bit naive; PORT augments--not
replaces--existing apps.


Supercomputing


Supercomputing conjures up visions of inverting massive matrices, fluid
dynamics, and other esoteric applications. But many work-a-day applications
defy the most loaded PCs and even workstations. Consider Figure 1(a), which
shows an original photograph and Figure 1(b), which has been electronically
retouched by adding in the eye detail. This sort of retouch is routinely done
on PCs (or on Macs, via Photoshop or Colorlab). The difference here is that
the photograph was manipulated as an 8x8, 1000-lines-per-inch image at 24 bits
per pixel. This represents 8000x8000x3, or 192 Mbytes as a single frame in
real time. By comparison, anything beyond a 10-Mbyte TIFF file tends to be
impractically slow on a PC or Macintosh. Photographic-image processing is an
extreme test of a system's hardware and software ability to handle massive I/O
fast, yet there is minimal computation and negligible floating point.
Figure 2 shows a photograph of a seismic cross section. Seismic processing
consumes more supercomputer dollars than any other single application. (It
costs $30 million to drill a hole, whether it has oil or not. Saving just one
dry hole pays for a Cray Y-MP.) Thousands of acoustic traces are mapped into
geologic-depth cross sections, requiring extensive floating-point operations
on the data to account for varying sonic velocity through different rock
strata. Just the ability to read the 1- to 5-Mbyte individual 9-track field
tape records is nontrivial. Geophysicists prefer to study the "big picture" on
paper media, where they can scratch away with felt-tip pens. The hard copy is
typically 4 feet wide by 10 to 20 feet long and is produced on 300-dpi
electrostatic plotters. These plotters must be fed data continuously at full
speed, otherwise they halt and leave ugly toner bands. Advanced Technologies'
PC Micromax excels at seismic cross sections on a VGA screen, but the area
covered is limited and it cannot handle the field tapes.
Supercomputing is not a different set of applications--it is simply a matter
of scale. For example, Wordperfect is a capable tool for indexed manuscripts,
but what if you have to cross-reference the Encyclopedia Britannica or the
service manuals for a Boeing 747? All these examples are easily within the
capability of a 386/486 PC with plug-in RISC card(s).


Thinking Cray


The DOS/UNIX/Windows trend is toward running as many concurrent tasks as
possible on the same processor. Cray realized that no single processor can be
all things to all tasks. (See the accompanying textbox entitled, "CDC 6600:
Anatomy of a Supercomputer.") The structure of the ideal computation processor
and ideal I/O processor are at loggerheads. The 386/486, with its small
register set and powerful interrupt handling, is an excellent I/O processor.
RISC processors such as the i860 with its 64 registers, four on-chip
processors, and blistering pipelined floating point is an ideal computation
processor--but a poor I/O vehicle. (See the textbox entitled, "RISC: Rhetoric
and Reason.")
In what I call a "PC+i860" (that is, a 380/486 PC with one or more i860
processors) we have the hardware model of a supercomputer that can adhere to
the Cray philosophy: It can dedicate all the resources of a multiple-processor
configuration to the single application--the diametric opposite of
multitasking systems. The Cray elegance is that if the 386/486 and the i860(s)
are all focused on the current application, managing them is simplified
considerably.


In the Steps of the Masters


Although PORT has only recently been implemented on the PC, its
predecessor--the Superset system--has been in active service since 1979. The
Superset 48-bit system was developed starting in 1977 to move the DISSPLA
graphics library off mainframes. (DISSPLA is a proprietary Fortran package
formerly of ISSCO, now Computer Associates, and one of the most widely used on
mainframes.) DISSPLA was too large for the 16-bit PDP 11 and Data General
Nova--the prevalent minicomputers of the mid-seventies. Consequently, we
designed a custom 48-bit RISC machine using AMD 2903 bit-slice processors,
which required us to develop a complete virtual-memory Fortran system with all
utilities and libraries from scratch. The 48-bit custom hardware was
overshadowed by the workstations of the '80s, but Superset still uses it for
its photo-retouch system, which outperforms competing products on standard
workstations even today.
The hardware may be outdated, but the system's ability to handle
multi-megabyte applications in a virtual-memory, multiple RISC environment is
timely. In 1989 I set about converting the 1,000,000 or so lines of code to a
machine-independent, virtual-memory, 64-bit version. The result was PORT.
Key to personal supercomputing is that PORT was developed expressly for
mainframe applications from the outset, not as an afterthought. From
1967-1976, as part of ISSCO, we had to support DISSPLA on two dozen IBM,
UNIVAC, DEC, CDC, Burroughs, and Honeywell mainframes with their diverse
operating systems. As applications programmers, our criterion when we designed
the 48-bit RISC system was to execute the application as efficiently as
possible. The nuances of forks, environments, parents, and children just added
to the overhead. Not being systems programmers, we simply copied the best
features of each mainframe system: the architecture of the CDC 6600/6400/7600,
the UNIVAC Exec 8 file system, the IBM OS/370 command syntax, Burroughs'
Master Control Program and integrity checking, and others. PORT, therefore,
represents an incarnation of systems developed for machines where money was no
object and the proven ideas of some of the finest minds. (Developed, in other
words, by PhDs, not MBAs.)


The Cray Approach Implemented


PORT religiously adheres to Cray's CDC 6600 schema. The 386/486 becomes the
Peripheral Processor (PP) and the RISC coprocessor is turned into the
Computation Processor (CP). All data transfers between the processors are
memory-mapped through common mailboxes, as in the CDC 6600. The i860 card
local memory becomes the shared memory.
More Details.
To see how this all integrates, consider the Fortran code in Figure 3. The
Fortran program executes entirely on the i860 together with all libraries and
other utilities. The PRINT statement causes a call to the FORMAT I/O handler
in the PORT kernel (a Dynamically Linked Library), which formats the string
Value is 12345.678901 via the i860. The string is passed to the central PP
interface, which composes a 5x64-bit word mailbox containing the code for
"Write Text To Screen" and a pointer to the string in i860 card memory. The
i860 now sets a semaphore flag, or "rings the PP's bell" via an interrupt,
depending on the i860 card. Upon detecting the semaphore or interrupt, the
386/486 PP copies the mailbox contents to its own memory and executes the I/O
request. It then copies the string and proceeds to display the line on the
screen. Once the mailbox contents and string are transferred, the PP releases
the i860 enabling the 386/486 and i860 to process in parallel.
Figure 3: Executing a Fortran program on the i860

 print 101,VALUE with VALUE=12345.678901
 101 format ('Value is ',F12.6)

While the i860 is computing, the PP is free to do other chores such as
servicing RS-232 COM I/O, network transfer, disk/tape caching, and tape
streaming. PORT turns all available PC extended memory into a disk/tape cache
pool (a 31- to 95-Mbyte cache). Large cache is invaluable for streaming-tape
I/O, electrostatic plotters, and other devices requiring a sustained,
high-speed data flow.

PORT runs almost entirely on the i860, so the 386/486 is free to run DOS
multitasking (including Windows) provided the 386/486 PP is available when
needed.


RISC From Day One


Most RISC compilers originated as CISC compilers with code generators modified
to output RISC code. This strategy tends to overwhelm the on-board RISC
instruction cache, resulting in "cache thrash." (Again, see "RISC: Rhetoric
and Reason.")
Designed for a custom bit-slice RISC board from the outset, the PORT Fortran/C
compiler outputs metacode--not direct RISC instructions. The resident RISC
program decodes the metacode and performs the requisite operations. This
approach is tantamount to defining a custom Fortran/C instruction set, using
the RISC instructions as programmable microcode. The entire i860 version of
the decoder is only 18 Kbytes, so it rarely cache misses. Like CISC
microcodes, the PORT decoder was carefully handcoded in RISC assembly to
maximize performance. It plugs almost every free cycle, uses all of the 64
i860 registers, trips off memory references as many cycles ahead as possible,
and uses few internal subroutines.
The metacode decoder turns the RISC processor into a "Fortran/C engine." With
the front-end PP to field all I/O, the metacode has no I/O instructions
whatsoever (just a "PP Interrupt and Wait"), thereby sidestepping the single
most complicated section of CISC microcodes.
Although the metacode approach is superior, it is avoided by almost all RISC
systems because of the perceived overhead of decoding each meta-instruction.
PORT uses 15 to 20 i860 instructions to decode each meta-instruction--a heavy
penalty for A = B. What is overlooked is that just one cache miss consumes the
equivalent of eight to ten RISC instructions: In practice the efficiency of a
handcoded decoder more than makes up for the overhead. Because the overhead to
decode A = B is the same as for A = SIN(B), A = SQRT(B), and A(I,J) = B(J +
20)**I(L), it becomes apparent why it is not a dominating factor in actual
applications.
Also overlooked is that the sheer volume of RISC code generated by native RISC
compilers is prohibitive unless many basic operations are executed by
subroutines such as divide, multiple subscripts, modulus, and others. The
call/return overhead (usually involving memory and often a stack) for this
so-called "threaded code" is substantially greater than the register ops used
to decode a meta-instruction. The efficiency of generating native RISC code is
illusionary--even with infinite cache. I believe that most system programmers
are new to high-performance RISC; understandably, they still apple CISC
methods and prejudices.
A key aspect of the metacode approach is its indispensability to
multiprocessor operation. The decoder tests each meta-instruction for
semaphore bits from ancillary processors (and the PP) as part of the decode
sequence. Thus the multiprocessor handling is all under software control,
simplifying it and providing a direct mechanism for the application to manage
it. (Significantly, it enables PORT to be machine independent.)


Metacode Custom to Fortran/C


A detailed description of the PORT metacode is left to a subsequent article.
Because it is central to the potential for supercomputer performance and to
multiprocessing, I'll highlight its salient features.
All PORT metacode instructions are of the form: A = B op C. Examples of this
form are A = B + C, I = J/K, if(B = C) go to A, and call A(Blist,Count). All
meta-instructions are 64-bit words, with the identical format to speed
decoding. Array references are integral operand modes. For example, A(J) =
B(K,L)* C(M + N) is a single instruction. (Part of the power of the metacode
approach is incorporating Fortran/C indirect addressing modes as native.)
Like the mainframes that PORT emulates, all operands are 64 bit, including
integer, floating point, and pointers; strings are 64-bit aligned. The i860
and other RISC processors have a 64-bit memory path, so the time savings for
shorter operands is minimal. On the other hand, 64 bits can pack an awful lot.
A feature of this metacode approach is that the decoder can choose a more
efficient algorithm, depending on the value of the operands. For example, the
i860 has no divide instruction, integer or floating point. An integer divide
involves converting to floating point, iterating a Newton-Raphson
approximation, and converting back to integer--just like the CDC 6600. (The
i860 multiply is so fast, the time is not much more than a typical CISC IDIV.)
Internally, the PORT decoder uses fewer steps and 32-bit register operations
if the operands are found to be less than 32 bits, which is the usual case.
Most important to our focus on supercomputing, the PORT metacode defines many
high-level operatives to be "direct instructions." (SQRT, LOG, SIN, ATAN2,
B**I, B**C, type COMPLEX ops, and most intrinsics are meta-instructions.)
Furthermore, block ops (copy, initialize, search, and checksum) are also
direct meta-instructions, as are type CHARACTER ops. Current work on the
metacode is largely focused on expanding the block meta-instructions. These
include vector/matrix multiply, sort, vector scale+translate, and others.
The user can extend the metacode to incorporate operatives specific to his own
application, such as 3-D transform, polar coordinates, map projections,
specialized sorts, and even string search, ignoring spaces and case.
Experience has shown that implementing such operatives in direct RISC assembly
typically produces orders of magnitude performance improvement. These added
intrinsics are referenced as if they were subroutines. For instance, CALL
ARYMOV(A(J),N,B(K)) copies N 64-bit words from A(J) to B(K) as a direct
meta-instruction.


Virtual Memory Without Virtual Memory


A shortcoming of Cray's model is its impracticality for traditional "virtual
memory," which allows executing programs larger than real memory by swapping
pages to and from disk--transparently to the program. This limits program size
even on a Cray Y-MP. The obvious reason for avoiding virtual memory is the
virtual-to-real address-translate overhead on every memory reference. Less
obvious, but more serious, is that the real-memory pages end up scattered all
over memory. Thus, real memory becomes fragmented. To use memory-mapped I/O
between the peripheral processor and computation processors would require
going through the same translate table. The bookkeeping, coherence protocol,
and overhead make multiprocessor virtual memory a nightmare. To be practical
and allow high-speed burst DMA, shared memory areas should be contiguous
blocks.
PORT implements virtual memory by observing that 85 percent of memory
references are local and don't require address translation in the first place.
Measuring local addresses from the start of each subroutine takes care of
these 85 percent. This leaves only four instances in which virtual memory is
actually required--array/COMMON references, pointers, arguments, and
call/returns. PORT takes care of these as part of the metacode decoder via
software "microcode." The RISC overhead for this 15 percent of addresses is
not severe, and much of it can be buried between memory references and in free
cycles. As a software scheme, it is harware-independent, which makes it
portable.
Memory overhead being the bane of RISC, a significant feature of this scheme
is that 85 percent of memory references use 13-bit fields. Thus A = B op C can
be specified in a single 64-bit RISC, word. This utilizes RISC cache more
efficiently and speeds decoding, thereby improving performance.
Key to multiprocessor supercomputing is that the scheme uses massive pages:
currently 32 Kbytes, soon to be 64 Kbytes. Fewer pages make it practical to
exchange pages to form the requisite contiguous memory blocks and lock them in
real memory. Thus other processors sharing the data perceive it as contiguous
memory. Finally, PORT circumvents the chief limitation of the Cray model by
being tailored specifically to Fortran/C.


Multiprocessing


Multiprocessing is commonly viewed as a collection of identical,
self-sufficient processors on a common bus, each executing its own "thread."
Such symmetrical organization requires the application to be broken into
self-sufficient tasks, but this is not always possible. Even when it is
feasible, breaking a massive program into free-standing threads, complete with
all intercommunication, can be as much work as writing the application in the
first place. The operating-system overhead on each processor can neutralize
the performance benefits. Lastly, no matter how fast the bus, the rush-hour
traffic jams tend to degrade the system. The processors invariably domino
until all are waiting in line for the bus. Although symmetric multiprocessing
is expounded in many articles, it has yet to see widespread commercial use.
In almost every application my colleagues and I have studied--CAD, seismic,
image processing, 3-D modeling, and even editors and compilers--80 to 95
percent of the computation is concentrated in less than 5 percent of the code.
Experience has shown that multiprocessing just one or two subroutines
typically produces an order of magnitude performance improvement. In cases
such as 3-D rendering, seismic wiggle-trace fills (Figure 2), Fast Fourier
Transforms, RGB-to-CMYK transformation (Figure 1), critical-path routing,
vector-to-raster conversion, and so on, transferring the data back and forth
to symmetric processors across a bus can take more time than the processing
itself. On a similar note, I/O overhead has proved the nemesis of array
processors, causing them to fall into disfavor of late.
Based on the way most applications operate, PORT extends Cray's model to
pragmatic multiprocessing. The multiple ancillary RISC processors have access
to the memory of the RISC Computation Processor. This hardware configuration
is widely available. For example, the Hyperspeed D860 PC/AT card has two i860s
sharing a common memory pool, and multiple cards can be interconnected via
64-bit, memory-to-memory flat cable across the top of the cards. Mercury,
DuPont, and CSPI have similar solutions for workstations. Thus the data
resides in shared or commonly accessible memory, eliminating the need for bus
transfers. (At the 1991 ACM Siggraph conference, Hyperspeed exhibited ten
i860s in a PC popping up Mandelbrot fractals faster than a dedicated Cray
Y-MP. At the 1992 NCGA conference, they demonstrated eight i860 PC ray-tracing
images of 25 transparent spheres with 25 levels of reflection in roughly three
seconds--about 400 Mflops!)
The body of the application executes in the Computation Processor, but PORT
provides the mechanism for the application to access subprograms running in
the ancillary RISC processors. The subprograms are typically a few hundred
lines of critical RISC code; data is shared via COMMON block arrays and
communication is via mailboxes. The application controls the sequencing and
synchronization of the ancillary processors using calls to PORT provided
system subroutines. This hands-on pragmatic approach has proved remarkably
effective and application programmers appreciate having the control. For
example, in Figure 1 and Figure 2, the application typically uses two to four
auxiliary RISC processors.


Proof of the Pudding


Table 2 shows the performance of PORT on a 20-MHz PC with a plug-in 33-MHz
i860 card vs. the HP 9000 series 720--today's superworkstation. The tests used
were the DISSPLA User Manual sample plots, a set of large graphics examples
running on dozens of mainframes/superminis and not slanted to any machine.
DISSPLA is supplied by Computer Associates as CA-DISSPLA and the equivalent
library under PORT as its Graphics Subroutine Library.
More Details.
These results show that the PC+i860 under PORT can, in the case of DM7004,
match the HP 720 at its full 50 MHz. Note that in the short plot cases
involving minimal computation per vector where the HP 720 outperforms the
PC+i860, the latter is bound by the speed of the 16-bit PC ISA bus. (The times
are identical for 386/20 and 486/33 host PCs. Upcoming EISA i860 cards and the
Hauppage 4860 should eliminate this mini/micro bottleneck.)
Interestingly, according to Table 1, the PC+i860 is four to ten times slower
than the HP 720, which in turn outperforms the Sparc and RS/6000. Arguably,
the code for GSL has diverged from DISSPLA over the years and may be more
efficient in many instances. On the other hand, PORT runs all 64 bit with
software virtual memory and avoids a globally optimizing compiler. The bottom
line is that both packages produce the identical output.
Table 1: RISC performance under popular benchmarks as provided by Personal
Workstation (June 1991). Values are presented for rough comparison only
because the performance on actual large-scale applications may be different
for the RISC processors. (Higher numbers are faster.)

 Processor Dhrystone Linpack
 2.0/2.1 with Single Double
 register (32-bit) (64-bit)
 --------------------------------------------------------------------

 CISC
 486/25 via DOS Extender (typical) 26,300 1.16 1.08

 486/33 via DOS Extender (typical) 34,000 1.50 1.40
 RISC
 i860/33 (Microway Number Smasher) 29,819 1.23 1.11
 SPARCstation SLC 18,255 2.25 1.20
 Silicon Graphics Iris 25D 24,630 2.62 1.35
 Motorola 88000/25 (Everex 8825) 50,033 1.67 1.02
 MIPS 3000/33 (Magnum 3000/33) 56,012 6.48 4.80
 IBM RS/6000 (POWERstation 320) 45,454 8.15 7.29
 HP 9000 series (model 720, 50 MHz) 86,335 17.0 14.4

My purpose is not to present a horse race, but to vindicate the "obsolete"
mainframe methodology of PORT. The PORT results are also more consistent with
the spec-sheet timings for both processors.
These results illustrate the performance of a single i860 as the Computation
Processor. Experience has shown that the introduction of ancillary RISC
coprocessor(s) improves throughput so dramatically that there is no
comparison. An analogous example is the performance of a Silicon Graphics
Indigo rendering 3-D models via a MIPS 3000 coprocessor.


Supercomputing by Low Entropy


PORT strives to achieve performance by low entropy rather than brute force: It
focuses on minimizing overhead and presenting RISC processors with the maximum
information in the minimum bits. It takes the view that the application
programmer can maximize resource use more effectively than a big-brother
system. Foxpro 2.0, Turbo C, and Norton Back-up also testify to the
low-entropy approach.
To illustrate PORT's bit efficiency, the basic PORT system--including the
extended Fortran/C compiler, linker, editor, virtual-memory manager, file
manager, and all system libraries packs onto a single 1.44-Mbyte floppy--the
distillation of several hundred thousand lines of Fortran source code. For
mainframe users, the entire GSL (DISSPLA) library, plus drivers for 150
devices and utilities, packs onto two 1.44-Mbyte floppies. (On average there
are 1.2 meta-instructions per executable source statement.) The importance of
bit efficiency is placed not on program exchange, but on reducing RISC memory
access and thereby on performance.
We have seen the future, and it is multiprocessor RISC. Current systems will
have to come to terms with this, sooner or later. The methodology incarnated
into PORT is field proven and catholic. Whether PORT ever becomes a factor or
not, hopefully it will help keep Microsoft and Sun honest. In a future
article, I'll show that the PORT approach can be implemented on almost any
platform, including UNIX and Sparc. Although the PORT approach works best with
add-on RISC processor(s), I will show that it works surprisingly well in the
single-processor environment of a stock 386SX laptop.


Bibliography


Brooks, F.P. The Mythical Man-Month. Reading, Mass.: Addison-Wesley, 1975.
Lundstrom, D.E. A Few Good Men from UNIVAC. Cambridge, Mass.: MIT Press, 1987.
Margulis, N. i860 Microprocessor Architecture. Berkeley, Calif:
Osborne/McGraw-Hill, 1990.
Siewiorek, D.P. and J.P. Koopman. The Architecture of Supercomputers. San
Diego, Calif.: Academic Press, 1991.


RISC: Rhetoric and Reason


Reduced Instruction Set Computers (RISC) are touted as the panacea of
computing, promising one instruction per cycle and multi-megaflop floating
point. Yet in many benchmarks, RISC performance isn't much better than Complex
Instruction Set Computers (CISC) such as the 80386/486 and 680x0.
Paradoxically, both are correct. RISC is capable of phenomenal performance,
but most current systems do not utilize its full potential. Almost all CISC
processors are RISC processors internally because they have a RISC processor
driven by microcode burned into on-chip ROM. Explicit RISC processors, on the
other hand, allow their RISC code to be loaded into on-chip RAM cache. Hence,
any RISC strategy functionally similar to a CISC processor is unlikely to
achieve a spectacular speed improvement, which explains why the 486 in 32-bit
protected mode is vexing Sparc and other RISC processors (see Table 1).
The i860 incorporates four independent processors on the same silicon wafer:
integer unit, floating-point adder, floating-point multiplier, and graphics
unit. These processors can operate in parallel and can be fed one instruction
each cycle. Thus, a 40-MHz i860 is theoretically capable of 80 Mflops. By
definition, however, RISC instructions are primitive, and it takes many of
them to perform the same function as a CISC instruction. For example, the
80x86 instruction: ADD Value, 10 requires the i860 sequence shown in Figure 4.
Although the i860 needs five 32-bit instructions, the 5+1 cycles to execute
them is roughly the same as the CISC instruction. Indeed, the cycle count for
CISC processors largely represents the count of RISC microcode instructions.
Figure 4: Equivalent i860 instructions for the 80x86 instruction ADD
Value, 10

 ORH Value_HI,r0,r3 Place upper and lower 16
 OR Value_LO,r3,r3 bits of VALUE addr in r3
 LD.L 0 (r3),r4 Load [0+r3] into r4
 ADDS 10,r4,r4 r4 = r4 + 10
 ST.L r4,0 (r3) Store r4 in [0+r3]

The example in Figure 4 illustrates several of the features and failings of
RISC. On the plus side, once register r4 is loaded, it can be manipulated at
the rate of one operation per cycle. (The sequence can be modified to perform
an Add, Shift, and Test as if it were a single custom instruction in almost
the same time as a simple ADD. The i860 can perform a 64-bit, floating-point
multiply in four cycles, a 32-bit in three cycles, and an add in two, so the
example could include floating-point operations at a speed far beyond the
386/486. Furthermore, the units can be "pipelined"--with a new multiply/add
initiated every cycle in single precision.)
The minus side is more subtle. The i860 is not telepathic; the instruction
sequence must be loaded in instruction cache and ready to go. If it is not,
the i860 "cache misses," requiring the RISC instructions to be loaded from
memory. A cache miss is expensive. On any 40-MHz processor it costs at least
three to four cycles, and on the i860 it is much worse. The i860 minimizes
memory overhead by transferring four consecutive 64-bit words as a block. The
address lines are loaded only on the first word, and the next three words load
in half the time. The down side is that any cache miss costs eight to ten
cycles. Thus unless the RISC program can reside in i/cache with few misses,
the performance is usually lackluster.
The LD.L does not complete the load to r4--it merely initiates it. The ADDS
waits until r4 is loaded before proceeding to add. Here, a data-cache miss
again costs eight to ten cycles. If you can find eight to ten instructions to
insert between the LD.L and ADDS, there is no wait. Therein lies part of
RISC's power. A clever compiler could theoretically find eight to ten
instructions to plug the wait. In real life, only the programmer understands
his algorithm well enough to do the substantial reorganization needed.
Globally optimizing compilers also have the nasty habit of reorganizing when
no reorganizing is desired--often producing wrong code.
This example also alludes to why the traditional compiler approach of
producing direct RISC object code is not the most effective strategy. It takes
at least five times as many RISC instruction bytes to do the same thing as a
CISC instruction, yet the real-estate expense of on-chip i/cache limits its
size. The i860 has only 4 Kbytes i/cache (optimistically about 70 lines of
Fortran or C). If you have a loop of 71 lines, it will cache miss every eight
RISC instructions. If the loop has the typical numerous branches and calls, it
cache misses almost every other RISC instruction ("cache thrash").
As evidence of these observations, Table 1 shows the performance of the 33-MHz
i860 on the Dhrystone and Linpack benchmarks. According to these benchmarks
the i860 is barely able to best a 25-MHz 486 and is supposedly 14 to 17 times
slower than an HP 9000/720 in floating point. Even adjusting for the 50-MHz
clock rate of the HP 720, it is supposedly an order of magnitude faster than
the i860. Yet the timings of the i860 operations are comparable to the HP 720.
Why? Because the HP 720 has 256 Kbytes cache to the 4 Kbytes of the i860. I
contend the timings are measuring i860 cache thrash!
Bigger cache is not necessarily the solution. The new i860XP has 16 Kbytes of
i/cache, but many large, compute-intensive applications have critical loops
that easily exceed that if the compiler outputs RISC code. Even if the i/cache
were infinite, it would take several cycles to load instructions from memory.
This explains why the i860 can do somersaults on small hand-coded assembly
functions such as Fast Fourier Transforms and the CSPI array processor
library, yet be a dog as a general-purpose processor.
The most desirable RISC strategy is to implement only one instance of RISC
code for each high-level instruction (one instance of A = B+C, one of A = B*C,
and so on, rather than a copy for every A = B+C in the program). This strategy
loads i/cache only once. Such a "metacode" approach is a CISC! Interestingly
on the i860, these CISC meta-instructions, regarded as data, go into its 8
Kbytes of data cache so that the four-word memory load acts as a "32-byte
prefetch queue."


CDC 6600: Anatomy of a Supercomputer


The CDC 6600 was Seymour Cray's seminal supercomputer design and its
architecture is fundamental to today's Cray machines and most superminis.
Studying the CDC 6600 architecture enables us to focus on the salient
supercomputer features without bogging down in later refinements.

The CDC 6600 revolutionized mainframe design in the early '60s by implementing
a 60-bit CPU that had no I/O capability. Instead it was surrounded by ten
Peripheral Processor Units (PPUs), the sole function of which was to service
I/O requests. One PPU was devoted to the banks of tape drives, another to
disks, and yet another to the operator console. Other PPUs were assigned to
servicing the hundreds of user terminals, remote job-entry stations, card
readers, and so on.
Also revolutionary was that PPU to CPU communication was memory mapped--the
PPUs had direct access to the main memory of the CPU. The PPUs would load
requests or even entire jobs in areas of main memory and cause the CPU to jump
to the prepared areas. Thus there was none of the bus overhead or
time-consuming ACK/NACK protocols of other, bus oriented, mainframe
architectures.
Because the CPU was unencumbered with I/O constraints, its internal
architecture could focus on expediting computation. The CDC 6600 had a
blistering-fast floating-point unit, and even its 60-bit integer ops were far
beyond any competitor. Today, 30 years later, the CDC 6600 is still a machine
to beat.
The Chippewa Falls Operating System (CHOPS) also broke ground in
operating-system design, delivering performance far beyond the bloated IBM
OS/360 and other systems. (CHOPS was superseded by MACE, which later became
KRONOS, then NOS.) The saving grace of the IBM System 360 was its Virtual
Memory; the CDC 6600 memory allocation was claustrophobic. Although the
efficacy of this low-entropy system and architecture were central to the
design of PORT, the lessons of its memory limitations and lack of execution
checks were also important. (The follow-on CDC 7600 was so unreliable that
programs were often run twice to check the results.)


























































June, 1992
FINDING SIGNIFICANCE IN NOISY DATA


Roy E. Kimbrell


Roy is a senior staff member at Planning Research Corp. (PRC). He has managed
the software development of a large military intelligence project. In his
spare time he is building a simulation development system for the PC. Roy can
be reached at PRC, 1410 Wall St., Bellevue, NE 68005.


Data containing cyclic variations can be treacherous. Despite our ability to
find the frequencies of the variations, certain features of the data remain
elusive. Suppose you are counting cars traversing a well-traveled (but
hypothetical) section of road. You will find the usual daily variations, but
you also will find that taking advantage of your knowledge of the traffic's
typical behavior is tricky. The question, "Is today's traffic significantly
heavy or light?" is hard to answer.
If you have been driving long, you have come to expect the following kind of
behavior from traffic: During week-days, there will be a high point in the
morning and in the evening, the rush hour you've grown to hate. There will be
another peak about noon and a low point in the late night and early morning
hours. Weekends have their own cycles. The peaks and valleys are at different
times and the traffic density is probably lower. Looking at the data's weekly
cycle, the traffic density is heavy during the week and light on weekends. We
might find monthly variations as well--heavier traffic on the first weekend of
the month (after payday). There will be yearly variations, perhaps due to
weather, and even longer trends. Despite these cycles and trends, there will
be random, probably unexplained, variations in the traffic density--this is
noise. It is an unavoidable and even necessary feature of the kind of data we
are examining. Now suppose you were to ask questions about specific days: For
example, is this day's traffic heavier or lighter than normal? You wish to
decide this despite the noise in the data.
For example, a city engineer might want to know if a newly opened shopping
center has changed the local traffic patterns. The engineer would probably
decide that the relative density of the traffic depended on the day of the
week and the season of the year. If the city asked him to write a program to
flag days with significantly high or low traffic he might approach the problem
head on. He might establish high and low norms at each collection point for
weekdays and weekends for each season of the year. Then, if a day's traffic
count exceeded a high point or failed to reach a low point, the program would
flag that day. With enough data, the engineer might even establish an average
and a standard deviation for the weekdays and weekends in each season. With an
average and a standard deviation he could calculate the significance of a
particular day's variation from the average.
Unfortunately, this approach ties the program to a particular cycle (weekly)
and fails to capture trends. There may be unrecognized monthly cycles where
some weeks have heavy traffic and others light. Thus, a day in a normally
heavy week might be marked as significant when it really isn't. A new business
in the neighborhood could destroy the statistics you've captured and require
new analyses. If unrecognized changes occur, two kinds of errors might result:
The program might fail to flag significant days, or it might flag days with no
particular significance.
Problems with finding significance in noisy data occur everywhere: Traffic
(vehicular, pedestrian, message, telephone, animal, and aircraft), work loads
(in computers, assembly lines, and repair shops), and events (such as crimes,
accidents, occurrences of disease, insect invasions, and animal population
counts) are just a few examples. These problems have at least one feature in
common. They have a cyclic character but the cycles change in amplitude and
frequency and noise overlays these fundamental frequencies. The simple methods
of solving these problems may miss significant events or mistakenly flag
others. As the cycles change over time, the solutions only get worse.
Adjusting to changing cycles requires continuous work. More complicated
methods may be statistically sophisticated but still have difficulty adjusting
to the fundamental changes in the data. Unless the statistical basis is
somehow continually updated, eventually significant events will be missed or
insignificant ones flagged.
Fortunately, there is a better way. There is a class of methods known as
Digital Signal Processing (DSP). Speech analysis, generation, and filtering
use digital signal processing techniques heavily. The filtering techniques are
of special interest in solving our problem. In particular, a subset of digital
filtering called "adaptive filters" can give us an excellent solution to our
noisy data problem.


Adaptive Filtering


The particular adaptive filter I'll use is a lattice filter, so named because
of the shape of its schematic; see Figure 1(c). A lattice filter has one or
more stages. Each stage computes an error or deviation from the expected value
and passes it on to the next. To do this it correlates the present error value
with previous error values at that stage. At each stage there is feedback
making the filter self-correcting. Up to a point, the more stages there are in
a lattice filter, the better the filtering action. The multiple stages in the
filter capture the cycles in the data. More stages in the filter allow the
filter to capture lower-frequency cycles. The effective limit to the number of
stages is the composition of the data itself. (If there aren't any
low-frequency cycles, it makes little sense to have many stages.)
Figure 1 is a schematic of an adaptive filter. Each stage accepts "forward"
errors (e{f}[i]) and "backward" errors (e{b}[i]) from previous stages and
emits new forward and backward errors. The initial forward and backward errors
are the input data point itself. Eventually, the last forward error corrects
the original value to produce a filtered value. (The filtered value is the
value expected in the absence of noise. Notice that we base the definition of
"noise" solely on the history of previous values.)
In the lattice filter in Figure 1(a) and in the filter element in Figure 1(b)
there are filter components labeled "prior." Here the filter uses the prior
value rather than the current value; that is, we simply store the current
value and use the one we stored previously. In each stage of the filter, we
are using prior values to influence the forward error and current values to
influence the backward error. Finally, at the output of the filter, we use the
prior value to correct the input.
Inside each stage of the filter, the input forward and backward errors get
multiplied by a number derived from the trend in changes of the data values.
After the multiplication, the backward error subtracts from the forward error
and vice versa. The results are the new forward and backward errors created by
the stage. Selecting a value for the multiplier--k[i] in Figure 1(b)--is
somewhat arbitrary. We want a value that reflects the trend of the previous
inputs to the stage. A new value of k[i] may be computed every time there are
new inputs or less frequently. How often you need to compute k[i] depends on
how responsive the filter must be to changes in the data.
One way to compute k[i] requires a history of forward- and backward-error
values at each stage. Suppose we have these arrays, indexed now through last.
Now is the current value, prior(t) returns the value immediately prior to t,
and last is the oldest value in the array. We index in this fashion because
later we will want to implement a circular array to avoid copying values. We
would like the value of k to reflect trends in the relative differences
between forward-and (prior) backward-error values. The approach in Example 1
will work fairly well. Note that the method of computing the output values in
each stage of the filter seen inFigure 1(b) is linear; essentially,
output=k*input. You could devise nonlinear schemes for both output and k
computation.
This method of computing k requires several multiplications--many, if the
history of values saved is long. But because we are continuously adding new
values to the E{fb}[s] and E{ff}[s], we should only have to add new values and
subtract the oldest values, thus avoiding many of the multiplications. We use
this method in the final algorithm given later.
At some point, addition of stages or an increase in the number of historical
values kept at each stage adds nothing to the effectiveness of the filter. To
find the optimum number of stages, use a representative set of data. Then
compute the average magnitude of the forward error generated by the last stage
of the filter. Increase the number of stages and continue to compute this
average. The average will decrease as you add stages; when it levels off, new
stages are no longer yielding an advantage. You can compute the optimum
history size in the same way. Start with a short history and gradually
increase it as you repetitively compute the average forward error generated by
the last stage. At some point there will be no benefit to increasing the size
of the history.


Determining Significance


The filtered values are the expected values based on the cycles in the data
captured by the filter stages. The difference between the raw and filtered
values is the noise or error in the value. When is this error significant?
Here we can rely on statistics for help. We can compute the average and the
standard deviation of the errors. The standard deviation is a measure of the
variability of the errors. If the random noise in the data has a high
amplitude, the standard deviation will be large; otherwise it will be small.
(Notice that the frequency of the noise will be the frequency of occurrence of
the data values themselves.)
We subtract the current error from the average error and divide the result by
the standard deviation to obtain the value z. z is measured in standard
deviations, so we can speak of the difference between the current error and
the average in terms of the number of standard deviations between them.
It won't hurt to assume a normal distribution for the errors. We can then
estimate about how often we can expect to find errors of various magnitudes.
Errors within one standard deviation of the average will occur about 68
percent of the time. Errors within two standard deviations of the average will
occur about 98 percent of the time. If we take these frequencies as
probabilities, we can use them as a determiner of significance.
We would call an error significant if it fell more than a few standard
deviations from the average error. The exact number of standard deviations
would depend on the level of significance (probability) we desire.
Thus we can automatically compute the significance of any error based on our
choice of the probability. In our algorithm, the chosen number of standard
deviations (probability) is called the confidence. Table 1 shows some
confidence values from the normal distribution for useful probabilities.
Table 1: Confidence values from the normal distribution.

 Probability that the Confidence=number
 value is significant of standard deviations
 ----------------------------------------------

 68.3% 1.0
 90.0% 1.64485
 95.0% 1.95996
 97.7% 2.0
 99.0% 2.57580

The final algorithm, given next, contains the method for computing the
significance of the input value.


The Final Algorithm


The basic lattice-filter algorithm is simple (simple enough that you can get
it in silicon); see Table 2. The e{f}[st] and e{b}[st] arrays are circular
over the t{st} index. We use a circular array to avoid having to copy values
down as new values enter the array; see Table 3. With these definitions and
initial values we calculate the filtered value as shown in Example 2. This
algorithm is implemented in Listing One, page 90.

Table 2: The basic lattice-filter algorithm.

 Value Meaning
 -------------------------------------------------------------------------

 Y Input data value
 ^
 Y Output (filtered) value
 S Number of stages in the filter
 T Number of prior forward and backward error values saved at
 each stage
 e{f}[st] Array of forward error values: 0<-s<-S is the stage index,
 0<-t<T is the history-values index.
 e{b}[st] Array of backward error values, 0<-s<-S, 0<-t<T
 E{ff}[s] Array of sums of the squares of the forward errors at each
 stage, O<-s<-S. The sum is over all the prior values saved, O<-t<T.
 E{fb}[s] Array of sums of the products of the forward and backward
 errors at each stage, 0<-s<-S. The sum is over all the prior
 values saved, 0<-t<-T.
 E{f} Sum of the forward errors from output of the last stage The
 sum is over all the prior values saved, 0<-t<-T.
 priorE{ff} Prior E{ff}[S], the E{ff}[s] at the last stage
 priorE{f} Prior E{f}
 k Multiplier
 sd Standard deviation of the forward-error values produced by
 the final stage confidence Number of standard deviations used as a measure of
 significance.
 iteration Count of the number of times the filter has been executed;
 also a count of the number of data values entering the
 filter.

Table 3: Using a circular array to avoid having to copy values down as new
values enter the array.

 Value Meaning
 ------------------------------------------------------------------------

 now Index of the current time:
 int now;
 prior Index of the immediately prior time:
 #define prior (now==0? T+1:now-1)
 last Index of the last time in the array:
 # define last \
 (now==T?0:(now==T+1?1:now+2))
 plast Index of the time prior to the last time in the array:
 #define plast (now==T+1?0:now+1)
 cycle Increments the current time index in preparation for new values:
 #define cycle\
 (now==T+1?now=0:now++)
 These variables take on the following initial values:
 e{f}[st]=e{b}[st]=0, 0<-s<-S, 0<-t<-T;
 E{ff}[s]=E{fb}[s]=0, 0<-s<-S;
 E{f}=priorE{ff}=priorE{f}=sd=0;
 iteration=0;



An Example



To test the ability of the lattice filter to catch significant events, we
applied the filter to a year's count of daily flights at Eppley Airfield, the
Omaha, Nebraska city airport. All commercial airlines use this airport, as
well many company "executive" and private aircraft. The greatest number of
flights to and from Eppley are by these private and company aircraft. Weather
at the airport is good most of the year: dry but not arid. Days with low
ceilings caused by rain, snow, or fog are rare. High winds are a bigger
problem for the light company and private aircraft.
The Federal Aviation Administration graciously summarized for us the total
daily flights (inbound and outbound) for all of 1989. We had 365 data points.
We set the number of stages in the filter at 30 and the number of historical
values used also at 30. (In retrospect, this was a considerable overkill--15
for each would have been more than sufficient. The number of stages must span
the longest cycle you wish to capture. At Eppley there appeared to be a weekly
cycle and a yearly cycle, but no monthly cycle. The 15 stages would have
nicely spanned the weekly cycle, always keeping one whole cycle in the
filter.)
We fed the flight counts to the filter in sequence and plotted the actual and
expected values. We set the confidence at two standard deviations (about 98
percent) and the filter returned a significant/not_significant flag for each
day. Figure 2 shows the graphs of the actual and filtered counts for the last
three months of the year. The bars are the actual counts, the small diamonds
mark the filtered counts and the small black squares mark significant
deviations. The first few days marked significant are due only to start-up
transients.
We checked the Omaha World Herald, the local newspaper, for those days flagged
as significant. This was not a study of the aircraft traffic at Eppley
Airfield so we did not attempt to correlate all events with the traffic.
Instead, we were looking for reasonable explanations for the significant
counts. The events we found that might have affected the aircraft traffic were
weather and holidays. Generally, high winds reduced traffic, sometimes for
several days. There would then be a significantly higher count on the first
good-weather weekday. In December we noticed that the traffic was considerably
higher on a Friday preceding a forecast storm--so forecasts of bad weather
could cause significant counts as well. A significant event occurred close to
Labor Day (September 4) and two significant events occurred near New Year's
Eve. Traffic generally tapered off toward Christmas, so the low traffic over
Christmas wasn't noticed. The weather over the Christmas holidays was
extremely cold (record setting). Perhaps that contributed to the low traffic.
When we changed the number of historical values from 30 to 15, the number of
days flagged as significant increased considerably. This is because with the
smaller number of historical values, the filter reacted more quickly to
changes in the data, making the filtered data vary more.


Using Adaptive Filters to Flag Significant Events


We have shown how adaptive filters can identify significant deviations from
expected values even in highly variable data. Allowing for and filtering the
cycles in the data lets us find deviations from the expected. When the
deviations are large, we suppose something significant may have happened to
cause them.
Adaptive filters can catch significant events, but because they adapt to
changing conditions, they will miss trends. Slow changes, despite their
significance, will fool a single adaptive filter. There are two ways to catch
the slow changes. One is to use one adaptive filter with a history window and
number of stages sized to watch short cycles, and another with a larger
history window and more stages to watch longer cycles. The other way is to use
a filter designed to watch the short cycles, then sum the data over intervals
and use another filter to watch the summed data.
The advantages of using adaptive filters outweigh their drawbacks. They select
significant events with precision. The level of significance can be changed
using a single parameter. It is necessary to keep only the raw data; the
parameter data used for other significance-determining systems is no longer
necessary.


_FINDING SIGNIFICANCE IN NOISY DATA_
by Roy Kimbrell


[LISTING ONE]

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <console.h>

void I_Lfilter(void);
unsigned Lfilter(double Y, double confidence);

#define MAIN

#define T 15 /* number of historical values */
#define S 30 /* number of stages */
#define SIGNIFICANT 1
#define NOT_SIGNIFICANT !SIGNIFICANT

/* The following macros define the circular buffer. There are T+2 values in
the buffer to allow exactly T historical values. The "last" and "plast" values
are additional to the T values and are not used in the computations--they
exist in the buffer to allow them to be subtracted from current values.
*/

#define prior (now == 0 ? T+1 : now-1)
#define last (now == T ? 0 : (now == T+1 ? 1 : now+2))
#define plast (now == T+1 ? 0 : now+1)
#define cycle (now == T+1 ? now = 0 : now++)

static int now; /* index of the current historical value */

static double Ef; /* final error sum */
static double priorEf, priorEff; /* prior error sums */
static double ef[S+1][T+2]; /* array of forward errors */
static double eb[S+1][T+2]; /* array of backward errors */
double Y_hat; /* expected Y */
static double Efb[S+1], Eff[S+1];

void I_Lfilter(void){

 int s, t;

 now = 0;
 Ef = priorEf = priorEff = Y_hat = 0.0;
 for(s=0; s<=S; s++){
 Efb[s] = Eff[s] = 0;
 for(t=0; t<=T+1; t++) ef[s][t] = eb[s][t] = 0.0;
 }
 } /* I_Lfilter */

unsigned Lfilter(double Y, double confidence){

 double k, z=0.0, sd=0.0;
 int s;
 static unsigned iteration;

 ef[0][now] = eb[0][now] = Y;
 for (s=1; s<=S; s++){
 Efb[s] += ef[s-1][now] * eb[s-1][prior] - ef[s-1][last] * eb[s-1][plast];
 Eff[s] += ef[s-1][now] * ef[s-1][now] - ef[s-1][last] * ef[s-1][last];
 k = Efb[s] / Eff[s];
 ef[s][now] = ef[s-1][now] - k * eb[s-1][prior];
 eb[s][now] = eb[s-1][prior] - k * ef[s-1][now];
 }
 Y_hat = Y - ef[S][prior];
 Ef += ef[S][now] - ef[S][last];
 if (iteration != 0){
 sd = sqrt((T * priorEff - priorEf * priorEf)/(T*(T-1)));
 z = (ef[S][prior] - Ef / T) / sd;
 }
 priorEff = Eff[S]; /* use the output of the last stage */
 priorEf = Ef;
 iteration++;
 cycle;
 if (fabs(z) > confidence)
 return SIGNIFICANT;
 else
 return NOT_SIGNIFICANT;
 } /* Lfilter */

#ifdef MAIN

#define CONFIDENCE 2.0

void main(int ac, char **av){

 char buf[80];
 unsigned i=0, significant;
 double Y;
 extern double Y_hat;

 while (fgets(buf,sizeof(buf),stdin)){
 Y = atof(buf);
 significant = Lfilter(Y,CONFIDENCE);
 printf("%d\t%.4f\t%.4f",i++,Y,Y_hat);
 if (significant)
 printf("\tSIGNIFICANT\n");
 else
 printf("\n");
 }
 } /* main */


#endif


Example 1: The value of to reflects trends in the relative
differences between forward and (prior) backward error values.


At stage s,
 for (t = now; t != last; t = prior(t)){
 Efbs = eft s-1 * eb prior(t) s-1;
 Effs = (eft s-1)2;
 }
 ks = F(Efbs/Effs );



Example 2: Calculating the filtered value.

ef0 now = ef0 now = Y;
for (s = 1; s # S; s++){
 Efbs += efs-1 now * ebs-1 prior - efs-1 last * ebs-1 plast;
 Effs += efs-1 now * efs-1 now - efs-1 last * efs-1 last ;
 k = F(Efbs ,Effs);
 efs now = efs-1 now - k * ebs-1 prior;
 ebs now = ebs-1 prior - k * efs-1 now;
/* Comment The following happens to be one way of computing
 o(Y,^) but we actually do it a different way:
 o(Y,^) += k * ebs-1 prior;
*/
 }
 o(Y,^) = Y - efS prior;
Ef += efS now - efS last;
if (iteration != 0){
 sd = R(, F(T * priorEff - (priorEf)2,T * (T-1)));
 z = F( B bc [(e S up3(f) S do3(S prior) - F(Ef,T)),sd);
 }
priorEff = EffS;
priorEf = Ef;
iteration++;
cycle;
if ( B bc (z) > confidence)
 return SIGNIFICANT;
else
 return NOT_SIGNIFICANT;

















June, 1992
CONTOURING DATA FIELDS


Maps are simply an application of gridded data formats




Bruce (Bear) Giles


Bear, a programmer for National Systems & Research, can be reached at 325
Broadway, MC 425, R/E/FS4, Boulder, CO 80303 or bear@fsl.noaa.gov.


A great deal of scientific and engineering data is available in a gridded (or
regularly sampled) data format. For example, the Nested Grid Model (NGM) used
by the National Weather Service consists of a series of atmospheric
measurements at every 1.25 degrees of latitude and 2.5 degrees of longitude
across the continental United States. This data is generated by supercomputers
in Suitland, Maryland every 12 hours.
The most common way to view such data is through contouring. An example of
contoured data familiar to many people is the topological maps used by hikers
and skiers. Such maps are much easier to use than books specifying the
elevation of thousands of different locations.


Properties of Contour Lines


Contour lines of gridded data have several properties: They never cross, never
split or join other contour lines, and never stop, except at the edge of the
data field.
While it is easy to imagine situations where the physical contour lines
violate one or more of these rules, such situations never arise in sampled
data. The closest possibility is the unlikely event of adjoining data points
having precisely the same value as the contour level. This could split the
contour line, but because this causes a division-by-zero error, in these cases
the contour line is simply stopped. In practice this rarely occurs.
We are using a linear interpolation between data points, so no more than one
contour line ever crosses any grid cell-face for any contour level. (This is
not guaranteed if more powerful interpolation techniques are used.)
Allowing for rotations, it is easy to show that there are only four possible
orientations of a contour line within a single grid cell; see Figure 1. The
final possibility has a symmetrical alternative, as in Figure 2. Either of
these last two possibilities may be used as long as a consistent choice is
made for each contour level.
As mentioned, a simple linear interpolation provides the precise entry and
exit points of the grid cell. A straight line is then drawn between these
points to form the contour. While it may be tempting to apply curve-smoothing
functions (such as cubic splines) to the contour lines, this should be
avoided. If curve smoothing is done, there is an excellent chance of erroneous
results (such as crossed contour lines).


The Algorithm


The contouring algorithm is quite simple: Ignoring boundary conditions for the
moment, we simply find any cell with a contour line within it and successively
apply the preceding rules. The contour line can only cross a cell-face once,
so maintaining a simple bitmap of all crossed cell-faces can prevent endless
processing of closed contours.
Of course, in the real world contours often cross the edge of the data field.
To handle this, each border is scanned for inward contour lines, and these
contours are followed until another border is encountered. The aforementioned
bitmap of cell-faces is not cleared as these contour lines are generated. All
interior cell-faces are then scanned for unmarked contour lines. These form
closed contour lines and are followed until the contour returns to its initial
point.
After all interior points have been checked, the bitmap is cleared and the
next level (if any) is contoured.


The Contouring Code


Listing One (page 91) shows the contouring routine. Listing Two (page 95)
contains a simple PostScript file generator illustrating the requirements of
Polyline() and Text(). Finally, Listing Three (page 95) contains a sample
application that generates a data field and then contours it.
The contour routine is called with four arguments: a pointer to the data field
itself, its width and height, and the contouring interval. In the accompanying
code floats are used to save space, although when contouring a nearly flat
field it may be necessary to promote all floats to doubles for accuracy.
The data field is assumed to be stored in row-major order, starting at the
upper-left corner. All data points must contain valid data, although the
extensions to allow for missing data are straightforward.
The application writer must provide two extra routines, called by the
contouring module. The first is Polyline(), which is called with a list of
points on the contour line. All coordinates are within the range 0 to 1.
The second routine is Text(), which is used to label the contour lines. It is
called with a text string and two coordinates, once again within the range 0
to 1.


Extensions


As mentioned before, modifying the code to allow for missing data and/or
irregular edges is relatively straightforward. The primary headache is
modifying startEdge() to follow a jagged border.
In many cases, nonrectangular grids are trivial to support. As long as each
contour cell has four corners, the precise topology of the data points is
irrelevant; simply remap the coordinates within Polyline() and Text(). For
triangular cells the algorithm is even simpler than for rectangular cells, as
there are fewer cases or rotations to deal with.
Filled contours, similar to the weather map in USA Today, are straightforward
in concept. Perform a polygon fill (as described in Michael Abrash's February
1991 "Graphics Programming" column) on the polygon formed by the current
contour line connected to all interior contour lines. (For example, contour
lines A, B, and C would form the larger polygon a..ab..bac..ca where x..x is
all points within contour line X.) Unfortunately, it requires substantial
analysis to determine which contour lines lie "within" another contour line;
see Figure 3 and Figure 4.
Determining the area within a contour line can be accomplished by
appropriately modifying the preceding modification. Simply count the number of
pixels which would have been set had the fill been performed, and divide by
the total number of pixels within the display.

_CONTOURING DATA FIELDS_
by Bruce (Bear) Giles



[LISTING ONE]

#include <stdio.h>
#include <math.h>
#include <malloc.h>
#include <values.h>

#if defined (NEVER)
#include <ieeefp.h>
#else
#define NaN 0xFFFFFFFF
#define isnanf(x) ((x) == NaN)
#endif

typedef unsigned short ushort;
typedef unsigned char uchar;

#define DEFAULT_LEVELS 16

/* Mnemonics for contour line bearings */
#define EAST 0
#define NORTH 1
#define WEST 2
#define SOUTH 3

/* Mnemonics for relative data point positions */
#define SAME 0
#define NEXT 1
#define OPPOSITE 2
#define ADJACENT 3

/* Bit-mapped information in 'map' field. */
#define EW_MAP 0x01
#define NS_MAP 0x02

typedef struct
 {
 float x, y;
 } LIST;
typedef struct
 {
 short dim_x; /* dimensions of grid array... */
 short dim_y;
 float max_value; /* statistics on the data... */
 float min_value;
 float mean;
 float std;
 short contour_mode; /* control variable */
 float first_level; /* first (and subsequent) */
 float step; /* contour level */
 char format[20]; /* format of contour labels */
 float *data; /* pointer to grid data */
 char *map; /* pointer to "in-use" map */
 LIST *list; /* used by 'Polyline()' */
 ushort count;
 } GRID;
typedef struct
 {

 short x;
 short y;
 uchar bearing;
 } POINT;
#define MXY_to_L(g,x,y) ((ushort) (y) * (g)->dim_x + (ushort) (x) + 1)
#define XY_to_L(g,x,y) ((ushort) (y) * (g)->dim_x + (ushort) (x))

extern void Text();
extern void Polyline();

/* Contour generation. */
void Contour();

int scaleData();
void startLine();

void startInterior();
void drawLine();

void markInUse();
uchar faceInUse();
float getDataPoint();

void initPoint();
void lastPoint();
uchar savePoint();

/* inc > 0 increment to use between contour levels.
 * < 0 number of contour levels to generate [abs(inc)].
 * = 0 generate default number of contour levels.
 */
void Contour (data, dim_x, dim_y, inc)
float *data;
int dim_x;
int dim_y;
double inc;
 {
 GRID grid;

 grid.data = data;
 grid.dim_x = dim_x;
 grid.dim_y = dim_y;

 /* Allocate buffers used to contain contour information */
 if ((grid.map = malloc ((dim_x + 1) * dim_y)) == NULL)
 {
 fprintf (stderr, "Contour(): unable to allocate buffer! (%d bytes)\n",
 (dim_x + 1) * dim_y * sizeof (LIST));

 free ((char *) grid.map);
 return;
 }
 grid.list = (LIST *) malloc (2 * dim_x * dim_y * sizeof (LIST));
 if (grid.list == (LIST *) NULL)
 {
 fprintf (stderr, "Contour(): unable to allocate buffer! (%d bytes)\n",
 2 * dim_x * dim_y * sizeof (LIST));

 free ((char *) grid.map);

 return;
 }
 /* Generate contours, if not a uniform field. */
 if (scaleData (&grid, inc))

 startLine (&grid);
 /* Release data structures. */
 free ((char *) grid.map);
 free ((char *) grid.list);
 }

/* scaleData--Determine necessary statistics for contouring data set: global
 * maximum & minimum, etc. Then initialize items used by rest of module. */
int scaleData (grid, inc)
GRID *grid;
double inc;
 {
 ushort i;
 float step, level;
 float sum, sum2, count;
 float p, *u, *v, r;
 char *s;
 short n1, n2;
 int first, n;
 long x;

 sum = sum2 = count = 0.0;

 first = 1;
 s = grid->map;
 u = grid->data;
 v = u + grid->dim_x * grid->dim_y;
 for (i = 0; i < grid->dim_x * grid->dim_y; i++, u++, v++, s++)
 {
 r = *u;
 sum += r;
 sum2 += r * r;
 count += 1.0;

 if (first)
 {
 grid->max_value = grid->min_value = r;
 first = 0;
 }
 else if (grid->max_value < r)
 grid->max_value = r;
 else if (grid->min_value > r)
 grid->min_value = r;
 }
 grid->mean = sum / count;
 if (grid->min_value == grid->max_value)
 return 0;
 grid->std = sqrt ((sum2 - sum * sum /count) / (count - 1.0));
 if (inc > 0.0)
 {
 /* Use specified increment */
 step = inc;
 n = (int) (grid->max_value - grid->min_value) / step + 1;


 while (n > 40)
 {
 step *= 2.0;
 n = (int) (grid->max_value - grid->min_value) / step + 1;
 }
 }
 else
 {
 /* Choose [specifiedreasonable] number of levels and normalize
 * increment to a reasonable value. */
 n = (inc == 0.0) ? DEFAULT_LEVELS : (short) fabs (inc);

 step = 4.0 * grid->std / (float) n;
 p = pow (10.0, floor (log10 ((double) step)));
 step = p * floor ((step + p / 2.0) / p);
 }
 n1 = (int) floor (log10 (fabs (grid->max_value)));
 n2 = -((int) floor (log10 (step)));

 if (n2 > 0)
 sprintf (grid->format, "%%%d.%df", n1 + n2 + 2, n2);
 else
 sprintf (grid->format, "%%%d.0f", n1 + 1);
 if (grid->max_value * grid->min_value < 0.0)
 level = step * floor (grid->mean / step); /* odd */
 else
 level = step * floor (grid->min_value / step);
 level -= step * floor ((float) (n - 1)/ 2);

 /* Back up to include add'l levels, if necessary */
 while (level - step > grid->min_value)
 level -= step;

 grid->first_level = level;
 grid->step = step;
 return 1;
 }
/* startLine -- Locate first point of contour lines by checking edges of
 gridded data set, then interior points, for each contour level. */
static
void startLine (grid)
GRID *grid;
 {
 ushort idx, i, edge;
 double level;
 for (idx = 0, level = grid->first_level; level < grid->max_value;
 level += grid->step, idx++)
 {

 /* Clear flags */
 grid->contour_mode = (level >= grid->mean);
 memset (grid->map, 0, grid->dim_x * grid->dim_y);

 /* Check edges */
 for (edge = 0; edge < 4; edge++)
 startEdge (grid, level, edge);
 /* Check interior points */
 startInterior (grid, level);
 }

 }
/* startEdge -- For a specified contour level and edge of gridded data set,
 * check for (properly directed) contour line. */
static
void startEdge (grid, level, bearing)
GRID *grid;
float level;

uchar bearing;
 {
 POINT point1, point2;
 float last, next;
 short i, ds;

 switch (point1.bearing = bearing)
 {
 case EAST:
 point1.x = 0;
 point1.y = 0;
 ds = 1;
 break;
 case NORTH:
 point1.x = 0;
 point1.y = grid->dim_y - 2;
 ds = 1;
 break;
 case WEST:
 point1.x = grid->dim_x - 2;
 point1.y = grid->dim_y - 2;
 ds = -1;
 break;
 case SOUTH:
 point1.x = grid->dim_x - 2;
 point1.y = 0;
 ds = -1;
 break;
 }
 switch (point1.bearing)
 {
 /* Find first point with valid data. */
 case EAST:
 case WEST:
 next = getDataPoint (grid, &point1, SAME);
 memcpy ((char *) &point2, (char *) &point1, sizeof (POINT));
 point2.x -= ds;

 for (i = 1; i < grid->dim_y; i++, point1.y = point2.y += ds)
 {
 last = next;
 next = getDataPoint (grid, &point1, NEXT);
 if (last >= level && level > next)
 {
 drawLine (grid, &point1, level);
 memcpy ((char *) &point1, (char *) &point2, sizeof (POINT));
 point1.x = point2.x + ds;

 }
 }
 break;

 /* Find first point with valid data. */
 case SOUTH:
 case NORTH:
 next = getDataPoint (grid, &point1, SAME);
 memcpy ((char *) &point2, (char *) &point1, sizeof (POINT));
 point2.y += ds;

 for (i = 1; i < grid->dim_x; i++, point1.x = point2.x += ds)
 {
 last = next;
 next = getDataPoint (grid, &point1, NEXT);
 if (last >= level && level > next)
 {
 drawLine (grid, &point1, level);

 memcpy ((char *) &point1, (char *) &point2, sizeof (POINT));
 point1.y = point2.y - ds;
 }
 }
 break;
 }
 }
/* startInterior -- For a specified contour level, check for (properly
directed) contour line for all interior data points. Do _not_ follow contour
lines detected by the 'startEdge' routine. */
static
void startInterior (grid, level)
GRID *grid;
float level;
 {
 POINT point;
 ushort x, y;
 float next, last;
 for (x = 1; x < grid->dim_x - 1; x++)
 {
 point.x = x;
 point.y = 0;
 point.bearing = EAST;
 next = getDataPoint (grid, &point, SAME);
 for (y = point.y; y < grid->dim_y; y++, point.y++)

 {
 last = next;
 next = getDataPoint (grid, &point, NEXT);
 if (last >= level && level > next)
 {
 if (!faceInUse (grid, &point, WEST))
 {
 drawLine (grid, &point, level);
 point.x = x;
 point.y = y;
 point.bearing = EAST;
 }
 }
 }
 }
 }
/* drawLine -- Given in initial contour point by either 'startEdge' or
'startInterior', follow contour line until it encounters either an edge

or previously contoured cell. */
static
void drawLine (grid, point, level)
GRID *grid;
POINT *point;
float level;
 {
 uchar exit_bearing;
 uchar adj, opp;
 float fadj, fopp;

 initPoint (grid);

 for ( ;; )
 {
 /* Add current point to vector list. If either of the points is
 * missing, return immediately (open contour). */
 if (!savePoint (grid, point, level))
 {
 lastPoint (grid);
 return;
 }
 /* Has this face of this cell been marked in use? If so, then this is
 * a closed contour. */
 if (faceInUse (grid, point, WEST))
 {
 lastPoint (grid);
 return;
 }
 /* Examine adjacent and opposite corners of cell; determine
 * appropriate action. */
 markInUse (grid, point, WEST);

 fadj = getDataPoint (grid, point, ADJACENT);
 fopp = getDataPoint (grid, point, OPPOSITE);

 /* If either point is missing, return immediately (open contour). */
 if (isnanf (fadj) isnanf (fopp))
 {
 lastPoint (grid);
 return;
 }
 adj = (fadj <= level) ? 2 : 0;
 opp = (fopp >= level) ? 1 : 0;
 switch (adj + opp)
 {
 /* Exit EAST face. */
 case 0:
 markInUse (grid, point, NORTH);
 markInUse (grid, point, SOUTH);
 exit_bearing = EAST;
 break;
 /* Exit SOUTH face. */
 case 1:
 markInUse (grid, point, NORTH);
 markInUse (grid, point, EAST);
 exit_bearing = SOUTH;
 break;
 /* Exit NORTH face. */

 case 2:
 markInUse (grid, point, EAST);

 markInUse (grid, point, SOUTH);
 exit_bearing = NORTH;
 break;
 /* Exit NORTH or SOUTH face, depending upon contour level. */
 case 3:
 exit_bearing = (grid->contour_mode) ? NORTH : SOUTH;
 break;
 }
 /* Update face number, coordinate of defining corner. */
 point->bearing = (point->bearing + exit_bearing) % 4;
 switch (point->bearing)
 {
 case EAST : point->x++; break;
 case NORTH: point->y--; break;
 case WEST : point->x--; break;
 case SOUTH: point->y++; break;
 }
 }
 }
/* markInUse -- Mark the specified cell face as contoured. This is necessary
to
 * prevent infinite processing of closed contours. see also: faceInUse */
static
void markInUse (grid, point, face)
GRID *grid;
POINT *point;
uchar face;
 {
 face = (point->bearing + face) % 4;
 switch (face)
 {
 case NORTH:
 case SOUTH:
 grid->map[MXY_to_L (grid,
 point->x, point->y + (face == SOUTH ? 1 : 0))] = NS_MAP;
 break;
 case EAST:
 case WEST:
 grid->map[MXY_to_L (grid,
 point->x + (face == EAST ? 1 : 0), point->y)] = EW_MAP;

 break;
 }
 }
/* faceInUse -- Determine if the specified cell face has been marked as
 * contoured. This is necessary to prevent infinite processing of closed
 * contours. see also: markInUse */
static
uchar faceInUse (grid, point, face)
GRID *grid;
POINT *point;
uchar face;
 {
 uchar r;
 face = (point->bearing + face) % 4;
 switch (face)
 {

 case NORTH:
 case SOUTH:
 r = grid->map[MXY_to_L (grid,
 point->x, point->y + (face == SOUTH ? 1 : 0))] & NS_MAP;
 break;
 case EAST:
 case WEST:
 r = grid->map[MXY_to_L (grid,
 point->x + (face == EAST ? 1 : 0), point->y)] & EW_MAP;
 break;
 }
 return r;
 }


/* initPoint -- Initialize the contour point list.
 * see also: savePoint, lastPoint */
static
void initPoint (grid)
GRID *grid;
 {
 grid->count = 0;
 }

/* lastPoint -- Generate the actual contour line from the contour point list.
 * see also: savePoint, initPoint */
static
void lastPoint (grid)
GRID *grid;
 {
 if (grid->count)
 Polyline (grid->count, grid->list);
 }

/* savePoints -- Add specified point to the contour point list.
 * see also: initPoint, lastPoint */
static
uchar savePoint (grid, point, level)
GRID *grid;
POINT *point;
float level;
 {
 float last, next;
 float x, y, ds;
 char s[80];

 static int cnt = 0;

 last = getDataPoint (grid, point, SAME);
 next = getDataPoint (grid, point, NEXT);

 /* Are the points the same value? */
 if (last == next)
 {
 fprintf (stderr, "(%2d, %2d, %d) ", point->x, point->y,
 point->bearing);
 fprintf (stderr, "%8g %8g ", last, next);
 fprintf (stderr, "potential divide-by-zero!\n");
 return 0;

 }
 x = (float) point->x;
 y = (float) point->y;

 ds = (float) ((last - level) / (last - next));


 switch (point->bearing)
 {
 case EAST : y += ds; break;
 case NORTH: x += ds; y += 1.0; break;
 case WEST : x += 1.0; y += 1.0 - ds; break;
 case SOUTH: x += 1.0 - ds; break;
 }

 /* Update to contour point list */
 grid->list[grid->count].x = x / (float) (grid->dim_x - 1);
 grid->list[grid->count].y = y / (float) (grid->dim_y - 1);

 /* Add text label to contour line. */
 if (!(cnt++ % 11))
 {
 sprintf (s, grid->format, level);
 Text (s, grid->list[grid->count].x, grid->list[grid->count].y);
 }
 /* Update counter */
 grid->count++;

 return 1;
 }
/* getDataPoint -- Return the value of the data point in specified corner of
 * specified cell (the 'point' parameter contains the address of the
 * top-left corner of this cell). */
static
float getDataPoint (grid, point, corner)
GRID *grid;
POINT *point;
uchar corner;
 {
 ushort dx, dy;
 ushort offset;

 switch ((point->bearing + corner) % 4)
 {
 case SAME : dx = 0; dy = 0; break;
 case NEXT : dx = 0; dy = 1; break;
 case OPPOSITE: dx = 1; dy = 1; break;
 case ADJACENT: dx = 1; dy = 0; break;

 }
 offset = XY_to_L (grid, point->x + dx, point->y + dy);
 if ((short) (point->x + dx) >= grid->dim_x 
 (short) (point->y + dy) >= grid->dim_y 
 (short) (point->x + dx) < 0 
 (short) (point->y + dy) < 0)
 {
 return NaN;
 }
 else

 return grid->data[offset];
 }






[LISTING TWO]

#include <stdio.h>

typedef struct
 {
 float x, y;
 } LIST;
void Polyline (n, list)
int n;
LIST *list;
 {
 short x0, x1, y0, y1;
 if (n < 2)
 return;
 printf ("newpath\n");
 printf ("%.6f %.6f moveto\n", list->x, 1.0 - list->y);
 list++;

 while (--n)
 {
 printf ("%.6f %.6f lineto\n", list->x, 1.0 - list->y);
 list++;
 }
 printf ("stroke\n\n");

 }
void Text (s, x, y)
char *s;
float x, y;
 {
 printf ("%.6f %.6f moveto (%s) show\n", x, 1.0 - y, s);
 }
void psOpen ()
 {
 printf ("%%!\n");
 printf ("save\n");
 printf ("\n");
 printf ("/Helvetica findfont 0.015 scalefont setfont\n");
 printf ("\n");

 printf ("72 252 translate\n");
 printf ("468 468 scale\n");
 printf ("0.001 setlinewidth\n");
 printf ("\n");
 printf ("newpath\n");
 printf (" 0 0 moveto\n");
 printf (" 0 1 lineto\n");
 printf (" 1 1 lineto\n");
 printf (" 1 0 lineto\n");
 printf (" closepath\n");

 printf ("stroke\n");
 printf ("clippath\n");
 printf ("\n");
 printf ("0.00001 setlinewidth\n");
 printf ("\n");
 }
void psClose ()
 {
 printf ("restore\n");
 printf ("showpage\n");






[LISTING THREE]

#include <stdio.h>
#include <math.h>

float data[400];

void main (argc, argv)
int argc;
char **argv;
 {
 float *s;
 int i, j;
 double x, y, r1, r2;

 s = data;
 for (i = 0, s = data; i < 20; i++)
 for (j = 0; j < 20; j++)
 {
 x = ((double) i - 9.5) / 4.0;
 y = ((double) j - 9.5) / 4.0;

 *s++ = (float) (10.0 * cos(x) * cos(y));
 }
 psOpen ();
 Contour (data, 20, 20, 2.0);

 psClose ();
 }

















June, 1992
 SCULPTING ON SILICON: AN INTERVIEW WITH CHUCK MOORE


A wide-ranging conversation on Forth--and more!




Jack Woehr


Jack is a senior project manager at Vesta Technology in Wheat Ridge, Colo. He
is a member of the ANS/ASC X3J14 Technical Committee for ANS Forth and is
currently chapter coordinator for the Forth Interest Group. Jack can be
reached as jax@well. UUCP, as JAX on GEnie, or as the Sysop of the RCFB BBS,
303-278-0364.


Chuck Moore, inventor of the Forth programming language, got his BS in Physics
from MIT in 1960 while working at the Smithsonian Astronomical Observatory in
satellite tracking and orbit determination. He then proceeded to Stanford,
where he worked on stack-beam transport magnets. From 1964 to 1970 he worked a
variety of programming jobs in the state of New York, working in Fortran,
Algol, and eventually in his own language, Forth. Moving in 1970 to the
National Radio Astronomy Observatory led to the implementation of Kitt's Peak
Forth, and Moore's reputation was assured.
Moore cofounded Forth Inc. in 1973 and worked there for ten years before
moving on to the field of microprocessor design. The Novix project ran
approximately from 1980 to 1986 and resulted in the NC4000 (and later in the
Harris RTX2000), a chip optimized for the dual-stack architecture and
well-factored instruction set implied by the Forth virtual machine. After
that, Moore says, he "had built the ultimate computer," and he kicked back to
plan his next move. In 1989, using a computer based on the Novix as one design
tool, Chuck Moore designed ShBoom, a 32-bit processor with a dedicated I/O
coprocessor which controls 1 Mbyte of DRAM. Silicon was reached in 1990, and
using his ShBoom, Moore built himself a graphic CAD workstation to design his
next idea: muP20.
Moore says, "I have a half a dozen more designs already. We're on the
threshold of an explosion: silicon being made available to ordinary people. A
lot of interesting things are going to be happening in the next decade."
Moore recently joined several of his friends at a quintessential California
lunch of pasta marinara. Present were Chuck Moore (CM), proprietor of Computer
Cowboys; Elizabeth Rather (ER), CEO of Forth Inc., "the second Forth
programmer," taught the language by Chuck Moore in the early 1970s, now Chair
of ANS X3J14 Technical Committee for Forth; George Shaw (GS), Shaw
Laboratories, close associate of Chuck Moore on the ShBoom project, member ANS
X3J14 Technical Committee for Forth; John Stevenson (JS), independent Forth
consultant, member ANS X3J14 Technical Committee for Forth; Jack Woehr (JW),
professional Forth project manager, member ANS X3J14 Technical Committee for
Forth, and frequent contributor to Dr. Dobb's Journal; Mitch Bradley (MB), Sun
Microsystems, creator of the Open Boot PROM, a Forth implementation present in
the ROM of every Sun SPARCStation, member ANS X3J14 Technical Committee for
Forth; and Ray Valdes (RV), Senior Technical Editor, Dr. Dobb's Journal
JW: Can you go back over the genealogy and inheritance of your boot-strapping
process to reach the ideal CAD machine, starting with the Novix? I understand
one processor has designed the next one; how far back does it go?
CM: Let's start with the PDP-11. I did a lot of graphic work on the PDP-11.
Some of that led to the Novix [a NC4016 dual-stack machine; see "Forth
Machines," Embedded Systems Programming, November, 1990]. More of it led to
the circuit boards on which the Novix resided. And then the Novix took over
the design activity, certainly for its own circuit boards.
RV: Now when you say design, you're talking design in the very specific sense
of circuit design, or do you mean it in the broad sense?
CM: Anything that deals with layers of traces and things on boards, or
two-dimensional...two-and-a-half-dimensions. If I look back on it, I see some
very strong traditions--that the things I'm doing now, I did a very long time
ago: I'm just doing them on a different scale.
For instance, the circuit boards [in the PDP-11 days] were 128x80 elements, or
something like that. The chips [I'm now designing] are now 600x600 elements.
So there has been an increase in complexity of two orders of magnitude, but
very little else has changed.
Then from Novix, [the Harris Semiconductor] RTX sprung off, but I wasn't
involved in that.
RV: How many elements were in the Novix chip?
CM: It was a 4000-gate array, but I didn't design that graphically. That was
one of the problems. It had to be done in high level--HDL. All of the Novix's
unsatisfactory characteristics derived from that.
ShBoom was designed on Sun workstations and Valid software. Again, it was not
geometric, it was schematic-based, and all of its unsatisfactorinesses derived
from that.
RV: Can you say what those unsatisfactory qualities were?
CM: Auto-place and -route.
RV: So, it just took up a lot of real estate?
CM: Well, my designs are like the design of a crystal: very regular, very
orthogonal. The contemporary software does not permit you to do that...does
not encourage you to do that. In fact, it tends to randomize any kind of
design. I have a drawing of Novix in which there is no structure whatsoever in
the layout.
In the case of ShBoom, it could not be routed until I had placed about 80
percent of the elements. I tried placing about 50-percent and it came out with
300 no-connects. The advice from the Oki engineers was, "You placed too much,
you over-constrained it." I said, "No, I didn't place enough."
I had left little holes, and I said, "Any reasonable person would understand
that the only thing that can go in this hole is a row of inverters." But of
course, the auto-place didn't understand that. It ended up that I had to do
all the placing anyway.
JW: I'm surprised that the software let you do so. Most of them won't even let
you get that far with it.
CM: I know. It was real painful, because I had to specify all these gates with
a three-letter identifier based on the netlist and position on an IBM
mainframe screen. It took a week just to place a few thousand gates. There's
no assistance at all in that process.
All of the auto-routing people I talked to say, "Oh yes, that was dreadful,
dreadful. The situation is much better now." And they sound just like the
Fortran compiler people, who say, "It can generate much better code than you
are familiar with."
RV: So what is the negative impact, aside from the aesthetic one, in terms of
the functionality of the resulting product? If they are placed differently,
does that impact the speed?
CM: As geometries get smaller, the interconnect capacitance starts dominating.
Unless you know where things are placed, you don't know where the interconnect
is going to be, and you have to wait for a second pass through the
place-and-route. And that second pass never happens.
JW: Are you suggesting that the "famous" Novix multiplication bug was the
result of the auto-routing software?
CM: I'm exaggerating; in that case, we didn't adequately run test vectors.
There was a workaround for Novix multiplication, so that wasn't a "bug": It
was a "design feature." [Laughter at the table.]
The interrupt problem [with the Novix] was a bug. It was very difficult to
test the effect of an interrupt coming in at an unpredictable time...
JW: ...in the middle of a two-cycle instruction.
CM: Yeah, so we underestimated the difficulty of testing asynchronous
interrupts. The only interrupts we tested happened at benign times. So that
was our fault.
MB: I'd like to hear the justification for why multiplication not generating
the right answer is not a bug.
CM: It never affected any of my applications. [General laughter.]
MB: In other words, this chip was designed just for your applications, and
that is why they were never made to sell to anybody.
CM: In fact, if it hadn't been for Greg Bailey [of Athena Programming in
Oregon, Chair of the ANS X3J14 Technical Standing Committee] nobody might have
known about the bug, because he discovered it by exhaustively testing all the
boundary conditions, which we should have done, but weren't that clever. Or
weren't that methodical.
I've got the same [design constraint] problem now with muP20. Now ShBoom is
the first microprocessor I've had access to with enough memory to actually do
the layout on a 600x600 grid.
JW: Addressing a 1-meg DRAM array.
CM: It actually uses about one-half million words to represent my new design.
I could have done it on a 386, but I couldn't have done it on an Apple, or on
a PDP-11.
So now I have a tool where I could do this graphic-level design on a
microprocessor. And indeed I do have all of these graphic elements laid out in
four planes. I have 15 elements: horizontal, vertical, corners, tees,
contacts...I just stack those up until I've got transistors and interconnects.
I run my simulator directly against this layout, with the honest capacitance,
with the real inverting logic, exactly the way it is.
JW: Did you write this simulator?
CM: It's an adaptation of code which again dates all the way back to the
PDP-11, through the Novix, to the ShBoom.
MB: The technology is very similar to spreadsheets: essentially, a spreadsheet
like rectangular grid, and you evaluate each cell in a grid.
CM: That's right, and also it's a little bit like cellular automata. Exactly
what it does depends on its neighbors.
JW: A pachinko machine! The ball is launched and goes down...

CM: Yeah, I do...I stick in five volts and watch it propagate.
GS: This simulates at an analog level, doesn't it?
CM: I have two levels, one strictly digital with gate delay timing, and an
analog level. One is faster, and one is more precise. I'm going to combine
them into one for the next design beyond muP20. What I find is that I need the
precision everywhere, to have any confidence in the design. Neither of those
capabilities are available with conventional design techniques. You do netlist
simulations and various degrees of approximations. They are really crude and
hokey and bear very little resemblance to what you get out at the end. But if
you don't know the layout, that's about the best you can do. You literally
have a factor of two uncertainty that the layout introduces in your
simulation.
JW: Because of propagation delays?
CM: Yes, call them that. The match between the transistor and the loaded drive
is critical. And the load is now mostly...at least 50 percent and
increasing...the interconnect.
Very often when I am laying this thing out, I have this little NAND gate, and
it's going to drive another NAND gate, if those gates aren't real close
together, both for reasons of remembering where they are and for reasons of
keeping interconnects small and routing easy...
JW: ...the capacitance of the interconnect can overload the circuit.
MB: Or dominate the propagation, or switching time.
CM: And it's very hard to know with accuracy what those transistors are going
to do until you build one and measure it.
This design is 1.2 micron CMOS. That's what everybody I know is working in.
0.8 is real nice...that I would prefer. 0.5 is state-of-the-art. So I'm a
factor of four in speed away from cutting-edge technology. This is very
conservative...and "very conservative" means 200 MHz.
JW: The muP20 is going to have a 200-MHz clock?
CM: On-chip clock. You don't want that kind of frequency off-chip.
The muP20 has four 5-bit instructions per 20-bit word. One reason I picked
20-bit words is that I had never heard of a 20-bit computer: It's an
unoccupied niche in the world of computers, so let's stick something in it and
see if it flies.
JW: And data memory is...?
CM: Twenty bits. Five 1Mx4 DRAM chips populate a single-board computer based
on muP20. And that's the advantage of 20 bits instead of 32: It only takes
five chips instead of eight.
JW: What kind of DRAM is going to run at this speed?
CM: The new DRAMs have 30-nanosecond page-mode access. It's designed to use
those memories, specifically, and it has on-chip DRAM timing. But it also is
designed to use the memory cards, so you don't have an onboard ROM--you plug
in a boot memory card, power up from that, either unplug it, leave the card
plugged in, whatever you prefer.
The two new technologies are high-density DRAM and the new form-factor memory
cards.
JW: You have wait-state circuitry so that you can boot from your ROM?
CM: Yes. ROM is very slow, it's 250 nanoseconds, and that is...I have a
five-nanosecond clock, so it takes 50 clock cycles to read ROM.
MB: You can get 35-nanosecond ROM from Cypress if you are willing to pay for
it.
CM: Yeah, but I'm really thinking of these cards, and I have on-chip timing
circuitry.
RV: How long have you been working on muP20?
CM: Intensely, since last winter, and it will take a few more months.
JW: Who is financing muP20?
CM: Dr. C.H. Ting, of Forth note. I had gotten really discouraged, in that I
had this capability and nobody was interested. He filled that gap, as one
level of interest.
Nevertheless, the problem remains to find a customer in non-one quantities.
ER: If you were going to go out and look for a customer, what would you be
offering? What is the wonderful thing that this chip does?
CM: It generates NTSC video. So take this chip which costs me, in quantity, a
dollar, and plug it in the back of your television set and go.
MB: What about HDTV?
CM: Fine. I'll modify it, or we'll fill the gap somehow. But this is the first
of a family of computers that have different capabilities, different word
lengths, different memory interfaces, different instruction sets, all of them
sharing a number of features.
RV: You're implying more the commonality of the design approach as opposed to
chunks that get put together, like libraries or bitslices.
CM: Well, the chunks I have are registers, are memory interface, ALU. If I
have to change the instruction set, it will slow things down a whole lot. If I
can just perturb the instruction set a little bit, it'll be a quick design
cycle.
RV: How far up do you think this can be scaled? To workstations?
CM: It can be, but that's a tough market, it's so thoroughly occupied. But I
would like an excuse to do a 64-bit chip one time. I don't know of any
applications which need 64 bits, but if you can stretch the design like they
stretch the 747...
JW: Are you endorsing the philosophy of the application-specific
microprocessor?
CM: Absolutely.
JW: Do you believe it is going to come down to a cottage-industry level?
CM: The fact that I can do it means that a whole lot of other people can do
it, too. I think there is going to be a great radiation of microprocessors.
It's not going to be dominated by Intel, Motorola any longer.
MB: I disagree, because the cost of putting a microprocessor into systems is
dominated by what it takes to create and maintain the software. There is so
much inertia in that system that creating a custom microprocessor, even if
it's a factor of two better than what you can buy off the shelf, is not going
to be compelling. There may be a small number of applications where you can
justify creating a microprocessor architecture and the software to support it.
CM: Mitch, Mitch, Mitch...it takes a week to put Forth on any of these
processors!
MB: I know, but people don't want to write Forth code. I've been fighting that
for years.
CM: Yes, I would say that the workstation is not the market, because
workstations are operating-system intensive and low volume. It's going to be
the new widgets of the future that benefit from this.
JW: To what extent will this device be able to be viewed as a general-purpose
microprocessor?
CM: It's really very fast, so if people need a processor, they'll probably
make some trade-offs and use it in a high-performance application, rather than
design their own. So in that sense, it's general purpose, but not in the sense
of the 68000 which goes into a lot of products and has a massive support base.
JW: What about seeing it as a general-purpose embedded control processor?
Could you build a single-board computer around the muP20 that different people
could use in different applications, provided that it came with a ROM Forth?
CM: Yes, but a 20-bit computer is not going to be ideal for word processing,
unless you like 10-bit bytes.
RV: Comparing the scale of the design effort here with the design of the
conventional microprocessor, in terms of the number of engineers, number of
elements, what would be a comparable chip, and what would be the scale of
effort for the nonminimalist approach?
MB: The new Sparc chip that Sun and Texas Instruments are developing has three
million transistors, unbelievably complicated, scads of people working on it.
It's not the same class of thing, so it's really hard to even talk about them
in the same breath.
It's clear that the simple approach can give you tremendous bang per buck of
engineering. But it seems like this very rarely is factored into decisions.
Decisions are based upon market momentum. Market momentum is generated by big
companies spending humongous amounts of money in advertising and marketing and
exaggerating and fighting, whatever it takes to generate lots of people
knowing about your product. Most people buy stuff just because they have heard
of it.
CM: Don't underestimate the small market. I don't advertise, and I have all
that I can handle. muP20 has three or four successors committed already,
variants, expansions, speed improvements...I can see a one-man design house.
I'm busy. I can make a business of this.
MB: But can you hire a secretary?
CM: Double my business and I'll hire a secretary. I don't know to what degree
it will scale up. But there are a lot of people out there who would like to
have their own microprocessor and see now a chance to do it. Whether their
reasoning is correct or not, whether the economics work for them, is their
problem.
RV: Do you have plans to make your development environment available, to go
into the tool business?
CM: Yes, that's the next level of generalization. People want it, sight
unseen; if I can do it, they can do it too. It is a very peculiar development
package, and it certainly is not marketable at the moment. I have a three-key
keyboard. Currently, these keys are taped to the ends of my fingers, so I can
touch my fingers to my thumb and select one of seven menu entries. Now, the
neat thing about this...a lot of people do menus...but my menus are invisible!
Because it's perfectly obvious to me what these fingers do, and I don't need
something on the screen to tell me.
So it's a bit of black magic. I have to make this a whole lot more comfortable
to the unskilled user.
MB: You and your computer are symbiotic...
CM: I hope so; we've lived together long enough!
JW: How deeply nested are the menus?

CM: Four, maybe. Not very.
JW: And at one menu level you can pick a character from the alphabet.
CM: Better put, I can scroll through the characters so I can pick the one I
want.
JW: Then select, so you can insert text labels by searching, selecting, moving
one position to the right...
CM: Just like on the video arcade games...and it's equally clumsy, but I don't
do that much [text]. Mostly, I'm scrolling through graphics characters or
words, if you like, if you really want to write you want to use words, not
characters.
JW: So you can save entities composed of these basic elements...
CM: Right.
JW: ...and recall that entity to insert it into the diagram, such as a complex
selection of gates?
CM: I can pick a region of interest and move it or flip it or whatever...or
replicate it twenty times to get a register.
RV: Are these stored by name, or just visually accessed?
CM: Visually.
RV: So you don't do much specification of alphanumeric characters, then.
CM: Very, very little. The most characters I see are in memory dumps. And as
sophistication increases, the memory dumps get interpreted not as hex, but as
a decompiler generating Forth. So you can have quite readable object code.
On the other hand, I don't typically take it to that nice level. I don't need
to. So again, a marketable tool would have to have a larger level of
refinement.
RV: Is there any pointing device?
CM: Just the cursor moving on the screen: up, down, left, right.
JW: Can you move, say, southwest by holding two fingers down?
CM: No, I don't have enough fingers. Typically, [gesturing with fingers] this
one is up, this one is down, these two are left, these two are right, and that
leaves me three to actually change menus or something else. Four of my keys
are immediately gone, off the top, for motion.
JW: How much Forth source code went into writing this system?
CM: No source. But about 4 Kbytes of object, and it was all constructed on
ShBoom. ShBoom doesn't have any compilers...[turning to George Shaw] Does it
have a Forth compiler?
GS: Not yet.
CM: Or a C compiler. So ShBoom is all programmed in machine code.
JW: You sat there and entered the hex bytes.
CM: That's an easy instruction set, compared to the 386.
MB: When I read the 486 manual, I was amazed that they had got this all
working. Then I got to the bug list in the back, and said, "Oh, They didn't
get it all working."
RV: I just wonder if there is a point where it [the Intel architecture] will
collapse under its own weight. I'm surprised it's gotten this far, it's a
tremendous engineering feat.
GS: It's amazing, if any of the technology applied to increasing the hardware
had been applied to increasing the software, it would have grown by leaps and
bounds.
MB: I don't think so. Managing this complexity, you can just throw amazing
amounts of people and computing power at it. You can't really do that for
making leaps and bounds of improvement in design. Those kind of things only
happen when there is one person who has a better idea and the skill to do
something about it. Managing these humongously complicated projects, you can
throw endless amounts of money at them, and eventually succeed in some
fashion.
CM: But that's the thing that's bankrupt, the humongously complicated project.
If I can come up with a processor that is even vaguely comparable to the 386,
and I'm sure I can, this undermines their charter. You're going to have a
hundred people out there producing microprocessors at one percent of the
overhead that Intel has.
MB: By that token, IBM would be out of business, yet we know that they are
not.
CM: No, Intel won't be out of business, but they will have forfeited the
future, just like IBM and DEC and all the others have. Apple and IBM, by their
combination, have mutual suicide. They have thrown away a marketplace that is
now accessible to people like me, because they decided to build bigger
operating systems, hardware-independent everythings, slower, more
ponderous...it's a boon! The Forth community profits tremendously from that
conjunction.
JS: Except that with them goes the common wisdom that bigger is better.
JW: But is it true that that is the common wisdom nowadays?
ER: Is there any indication at all that there is a groundswell of people that
are appalled by that trend? I haven't seen it.
MB: It's marketing. There is so much positive feedback in the economy in terms
of the fact that you can't be successful until people perceive that you are
already successful. It's a chicken/egg situation, you can't sell product
unless you have already sold product, a lot of people have heard about it,
have a good, warm feeling about it, feel safe.
JW: There's a self-limiting factor in that these megacorporations very quickly
drive out the people who created the product. [Several noted organizations]
for example are slowly self-destructing internally by driving out people who
will not work in those environments where there are ten levels of managers
above one.
CM: I recently encountered the magazine, Midnight Engineering.
RV: A great magazine.
CM: And that is eye-opening. Here is a whole support industry for the very,
very low-scale industry. So that indicates to me that the growth is all going
to come from below. These midnight engineers are the resource that is going to
dominate the next decade.
JW: They may dominate the creativity, but usually the reward for invention is
preempted by the marketeers.
CM: Entry costs are going down, sophistication is going up. These people are
becoming an increasingly potent force.
MB: The engineering entry costs are going down, but the business entry costs
are going way up.
CM: You learn to be an effective entrepreneur, instead of a victim.
GS: One of the reasons that the United States has been innovative in
technology, is that you have the ability as an individual to start your own
company and make a bunch of bucks. In other countries, it's not nearly so
easy.
RV: Take the Japanese model, where it's not one grand flash of insight leading
to one medium-sized innovation, it's building a team of people, each one
contributes an incremental insight over a long period of time. You get
something like the Sony HandyCam, which I don't believe could have been built
in this country, because one person, no matter how brilliant, could not have
built that by themselves. It would require a 20-year incremental process to
get to that point.
GS: In an article in Midnight Engineering, the guy was marvelling at some of
these very interesting circuits that are in some of the very small Japanese
electronic items. He ran into the fellow who designed the stuff that went into
these Japanese products: an American.
ER: I think that there are some developments that are amenable to the anthill
approach, and some that are not. There are achievements that come about as a
result of a breathtaking insight that only comes from an individual. There is
nothing in, say, the HandyCam that isn't intrinsically technologically
evolutionary. But there are a number of breakthroughs in the history of
television technology per se that were single, unique breakthroughs, and those
are the ones that come from the creative entrepreneur.
RV: History does prove that this happens again and again. Here we have an
example where someone could have said, "This can't be done with 4K of object
code and a three-key keyboard."
CM: I see [chip design] as an art form. I think that sculpting on silicon is
going to be an increasingly important process for an increasing number of
people in the next decades. Maybe I'm one of the first to approach it this
way, that here I have my piece of silicon and I'm doing what I want on it. I
want to make a buck, maybe, I want to do some things that are neat, nice, or
pretty.
So there's this whole aesthetic involved, as well as pragmatism, and it
doesn't need to be as difficult as people make it out to be. Mature
technologies never are. They may be black boxes when they start, but they
eventually become part of common knowledge and trivial, like circuit board
design is now.
JW: Like farming, or breaking a horse to be ridden, something that is shared
by a society.
CM: The first time is black magic, then it becomes routine.
JS: Some people, like Henry Ford for example, start off as innovators and end
up at the end of their life punishing those who innovate and saying that
innovation is not important.
ER: Chuck seems not to have fallen into that particular pitfall.
CM: Not yet!


An Update on ANS Forth



The spring equinox found the ANS/ASC X3/X3J14 Technical Committee in
Hillsboro, Oregon, mulling over feedback from the first public-review period
of dpANS (draft-proposed ANS Standard) Forth. This step continued the process
described in "Forth: A Status Report" (DDJ, October 1991).
Although the few substantive alterations in the document must now result in
another public-review period, it is pleasingly obvious that the matter is down
to the proverbial jots and titles. Secure is the overarching concept: It is
possible to propound an architecturally independent semantic description of
Forth, traditionally the most hardware-wedded of all high-level languages.
While my wife Eleonore cooked breakfast waffles each day for committee
members, the committee was not above serving up a few waffles of its own. The
definition of DOES> was altered. The implementation now defines whether DOES>
reveals or leaves hidden the name header of the definition in which it occurs.
The concise definitions of ALLOT and related constructs were expanded in the
interest of clarity, rendering them less concise and arguably less clear. In a
rush to cover a previous omission, eight words of critical functionality were
entered into the floating-point word set, somewhat loosely specified.
Quibbles like these notwithstanding, even the most rancorously debated issues
involved extremely minor points. No one who was authored Forth systems based
on previous drafts of the document will have to spend more than a few hours
implementing these new changes to the standard.
The smallish membership of X3J14 has spent person-decades and well over
$200,000 on this effort. We're tired but content that we have nearly fulfilled
our five-year mission to seek out new worlds for Forth to conquer. The next
meeting of X3J14 may promulgate true ANS Forth.
-- J.W.























































June, 1992
PORTING UNIX TO THE 386 MISSING PIECES II


Completing the 386BSD kernel




William Frederick Jolitz and Lynne Greer Jolitz


Bill was the principal developer of 2.8 and 2.9BSD and was the chief architect
of National Semiconductor's GENIX project, the first virtual-memory,
micro-processor-based UNIX system. Prior to establishing TeleMuse, a market
research firm, Lynne was vice president of marketing at Symmetric Computer
Systems. They conduct seminars on BSD, ISDN, and TCP/IP. Send e-mail questions
or comments to ljolitz@cardio.ucsf.edu. (c) 1992 TeleMuse.


Last month, we began the final steps of a journey that will lead to a bootable
running 386BSD kernel that provides a self-supporting development environment.
The source code presented over the last two installments--plus a small set of
bug fixes and a recent copy of the NET/2 tape--will enable you to build an
operational kernel.
This month, we'll implement a "bare-bones" execve() system call that allows
386BSD to provide basic operation; a block-I/O buffer cache used to reduce the
cost of UNIX file operations, and ring buffers that reduce the cost of
tty-character buffer management.


Leveraging the POSIX Definition


POSIX 1003.1 describes the semantics of file execution and the requirements of
the implementation. It defines, from the point of view of a C program's main()
procedure, just what needs to be delivered to the new process image. Aside
from the arguments passed to the new process image, it also describes the
structure of the environment that is passed and the treatment of other systems
facilities (such as files, signals, and process credentials). It also defines
the possible error conditions that exec() should return if it cannot correctly
complete the request. All the various function calls of the exec() family are
implemented in the NET/2 object library, and they all eventually translate
down to an execve() system call that actually does the work.


Choices of Implementation


While POSIX says nothing about system-call semantics (because it's entirely an
object-library based standard), both The UNIX System V Programmer's Reference
Manual and the BSD Programmer's Manual fill in the minor details lacking in
the standard. In fact, we could even implement the execve() function from user
code that manipulates the address space with special system calls. During
early discussions at Berkeley concerning the 4.2BSD design process,
"RISCizing" the system calls was strongly considered, and the system call that
topped the lists for this treatment was exec(). (It was to be built out of
segment-creation calls, piece by piece.) Recent versions of Mach have
implemented a program loader that works somewhat like this, in that the
understanding of various executable formats can be exported to a user program
loader. In MINIX, the responsibility is sensibly split between user and
kernel: The user exec() subroutine gathers a buffer of argument pointers, and
the kernel copies the buffer into the new process and then fixes up the
argument pointers by relocating them (now that the new location is known in
the new process's image). Unfortunately, we cannot implement these approaches
because the Intel BCS (Binary Compatibility Standard) specifies that execve()
is a system call, and a system call implements the full semantics of
execve()'s operation.
The costs in execve() arise primarily from collecting all the argument (and
environment) strings, saving them temporarily, and then depositing them in the
new process. The representation of argument strings is dense; for example, the
strings are adjacent at the top of the new stack with their corresponding
pointers. Therefore, we generally use a two-pass algorithm that determines the
size of the space to hold the strings and the number of strings that exist.
Thus, we can reserve space and determine the relocation of the strings and
arguments in the new process's image--remember, we're packing things together
at the top of the stack, working from the top down. The principle reason for
tightly packed arguments is so that a program knows that it can have (at most)
ARG_MAX bytes of incoming arguments, and that those bytes can be saved away
(by a bcopy, perhaps) to another region without chasing each of them down.
If we don't mind wasting some space, we can make the algorithm single pass, by
assuming where the arguments and strings will go, copying them into position,
and doing the relocation all at the same time. We can't destroy the old
process's stack (who knows, an argument might point at an invalid location, or
worse yet, at the location we are copying to for the new image), so we need to
save our results elsewhere. By forcing this buffer to be aligned on an
appropriate page boundary instead of copying it to the new address space, we
can map the page accordingly and maintain our single-pass objective.


execve()


Our implementation (see Listing One, page 96) contains a bare-bones execve()
system call that allows 386BSD to provide basic operation. This minimal
version is compact enough to discuss easily, yet complete enough to allow a
system using it to rebuild the kernel. In other words, this is not a toy
implementation!
As a system call in the kernel, this procedure obeys certain conventions that
all system-call code in our 386BSD kernel uses. The arguments to all
system-call code are identical in number and use. The first parameter is
always a pointer to the process structure of the process requesting the system
call, and all process-relative facilities, resources, and state information
are referenced solely through this pointer. (See FN FILE =
\IMAGE92\08910591Figure 2/FN in our August 1991 386BSD article.) The next
argument is a pointer to the user's arguments for this system call. Each
system call itself has different numbers and types of arguments, so it is a
pointer to a structure that is usually different per each system call. The
execve system call itself has three arguments: the filename of the file to be
executed, a pointer to a vector of string arguments for the new process, and a
pointer to a vector of string environment variables. The last parameter of the
execve() procedure is the return value pointer, which can be used to return a
data value to the caller of the system call. Independently, an error can be
passed to the system-call management function sys-call(), which calls all
system-call functions in the kernel by the execve() function returning a
value.
This implementation can be broken down into five separate steps: file
validation, executable format recognition and consistency checking, reading
and processing argument strings, building a new process image, and preparing
the new image for execution.


File Validation


We first check whether the file we've been asked to execute actually exists,
and if so, the internal name by which to refer to it (the vnode). We do this
by employing the file-lookup function called namei(). This function has so
many arguments that it actually requires a structure (nameidata) to detail all
the related matters that will occur as a result of its use. We instruct
namei() to LOOKUP the filename and return a pointer to an internal file
reference (a vnode or abstract file node). We then LOCK it so that others
don't alter it while we process our execve() system call, and then interpret
any symbolic links we encounter as we incrementally evaluate the filename.
namei is also told that the filename to be requested is in the user process's
address space, located in the address specified by the first argument of the
execve() system call. When namei() succeeds, we check whether the file is
"regular" (that is, not a directory, socket, device, fifo, and so on), and
whether it's an executable file about which we can obtain status information.
All these calls occur relative to our "virtual," or abstract file-system
level, and the calls hide access to the actual mechanisms that implement the
file system's operations. The main result of this step is a pointer to a vnode
(ndp ->ni_vp) that points to a qualified file.


Executable Format Recognition and Consistency Checking


Now that we have a file we can read, we need to divine its format and check
this against what we can execute.
vn_rdwr() is a vast, kitchen-sink styled internal kernel procedure that
implements the general scheme for reading a desired amount of data from a
vnode. It places this in our prototype header structure (hdr), so we can then
dig through it to validate that it's a recognizable format with a sane
request. We don't want to execute if the file is too small or the parameters
are not aligned with page boundaries. We check sizes, first separately and
then together, to avoid the chance that together they overflow our 4-gigabyte
address space. The result is that we now have a vnode that contains an image
we can load and execute.


Reading and Processing Argument Strings


We need to collect argument and environment strings prior to loading the new
image. In this implementation, we take advantage of the "floating" location of
the user stack (no absolute reference has been assumed in 386BSD) and our
4-Gbyte process address space. We create a "new" stack within the "old" image
in its new process-image location, and then build it in place. Thus, we
consume 32 Mbytes of virtual address space, but only allocate three to ten
pages of actual memory touched ("sparse" allocation). Vector by vector, we
obtain the address of argument strings and use copyinoutstr() to obtain a
string from the user's address space that does not exceed the size of our
temporary buffer. (Unlike applications programming, kernel work is loaded with
obscurely arranged procedures that service special-purpose needs ideally.) We
build a pointer to the string that corresponds to its location in the new
process's image. After doing this for all arguments, we repeat the procedure
on the environment vector, thus reusing the code for a similar purpose. Due to
sparse allocation, three objects (argument vectors, strings, and stack) each
take up at least a page. (These could be condensed to a single page if speed
considerations are subordinate to space considerations.)



Building a New Process Image


We can do no more at this point with the old image--we must destroy the old to
build the new. We can no longer return to the image we were called from and
are now committed. We must ask the virtual memory system to erase the old
process image and map the new executable file in its place, taking care to
leave the new stack present. Note that no I/O is yet done, and that the pages
will be demand-loaded on access. If, following the vm_mmap(), we referenced
the bottom of the virtual address space where this file is mapped, a fault
would be generated. The page in the file associated with the address would be
read in and our reference would be satisfied. MAP_COPY and VM_PROT_ALL specify
that the pages may be referenced for all purposes (read, write, and execute),
and that any changes will affect this process's copy only. We want
instructions to be protected from modification by the program, so vm_protect()
allows us to restrict valid references to read and execute (not write) over
the extent of the text (instruction) portion of the address space. Next, we
allocate any remaining uninitialized data address space with anonymous paged
memory from the virtual memory system. The virtual memory system has simply
been instructed in building data structures that it can consult to decide how
to handle faults in specific portions of the processor's address space. No
pages of memory have been allocated, nor have any parts of the processor's
address-mapping hardware been touched in building the new process image, other
than to invalidate the address range.


Preparing the New Image for Execution


Before we return from execve(), we must inform the system of the new image's
characteristics, close off any files as needed, and reset caught signals. We
set the stack pointer and other registers (setregs for the PC), unlock and
release the file so others can mess with it, and return to the new image we
have just effectively loaded. We have not done a shred of I/O to read in the
instruction pages, so the first return to start the user process is guaranteed
to generate a page fault. The virtual memory system will then consult the
information from the previous step to satisfy a "page in" request from our
executable file. This is how the cycle of life begins for our new incarnation
of the process.


What's Not Finished with execve?


To be POSIX compliant, we should implement the famous setuid/setgid features
of UNIX, by extracting information out of the file-attribute buffer and
altering the process's credentials (for instance, user id and group id). We've
neglected any details concerning statistics updating/gathering and accounting.
Another intentional oversight was neglecting the interface to the user-process
debugging mechanism. Each of the items results in small additions to this
implementation. These are good exercises for the enthusiast but are outside
the scope of this article.
Importantly, the 386BSD system will now operate, and allow us to recompile the
kernel, even without these additional changes, but the fully fleshed-out
execve() does allow programs facilities like su(1) to work.
Another useful functionality not discussed is the ability to execute more than
one kind of file format. We are unsure how far to extend 386BSD in this
direction, as there are literally dozens of potential formats. One possibility
is to create a new format that just addresses the weaknesses we have seen.
This may be necessary for the multiprocessor, multithreaded version of 386BSD.


Block I/O Cache


Much of the file-system code and various other facilities of our BSD kernel
use the ancient UNIX buffer-cache interface. This buffer cache, splendidly and
generously described in Maurice Bach's The Design of the UNIX Operating
System, implements a file-system, block-oriented cache of I/O operations.
Since the original UNIX file system, block buffers have been used to reduce
the cost of UNIX file operations, in which partial block reads and writes
occur frequently. By retaining a small cache of those frequently accessed
buffers, the UNIX kernel could avoid unnecessary redundant I/O operations. The
mechanisms of delayed writes avoid writing a block until the buffer is reused:
read-aheads ensure that the successive block will be available by "prereading"
it.
The principle interfaces to the rest of the 386BSD kernel is through the
procedures getblk(), bread(), breada(), bwrite(), bdwrite(), bawrite(), and
brelse().
getblk() sifts through the buffer cache, looking for a matching buffer that it
can make busy. Failing that, it allocates a new buffer out of those currently
not busy, makes it busy, and returns it to its caller. The caller of getblk()
can tell if the contents were obtained from the cache or if the block needs to
be filled because it's not cached or valid. If getblk() ever needs to pause to
wait for something to become available, it needs to restart its algorithm on
the off chance that it is working with stale pointers. If getblk() attempts to
contend for a free buffer, it needs to ensure unique access by issuing a
splbio(), thus blocking out all asynchronous events that could intrude.
bread() uses getblk() to obtain the appropriate buffer. If the contents of the
buffer are not appropriate, it issues a logical read of the buffer,
VOP_STRATEGY(), whereby the logical-to-physical mapping occurs and the I/O
operation is passed to the driver. A wait for the operation to complete
(biowait()) is then entered. With appropriate contents, the buffer is returned
for unique access by caller.
breada() is like bread(), but it overlaps the possible first read operation
with the second read operation, in an attempt to force the read-ahead block
into the cache, in anticipation of it being read by the process "soon." If the
read-ahead block is already in the cache, then the block is merely moved to
the tail of the LRU chain so it won't be reused so soon. breada() is naive
about cache flooding and relies on the wait for subsequent blocks being high
because the blocks are not contiguous. Thus, its concept of "double buffering"
works best in a file system with rotational delays (unlike, for example, a
log-based file system).
bwrite() accomplishes a synchronous write of a block obtained from one of the
previously mentioned sources (or indirectly, such as through a delayed write).
The only magic here, other than being almost symmetrical with bread(), is that
delayed writes need to inform the vnode layer that they are no longer delayed.
After the write completes, the block buffer is returned to the freelist, ready
for others to use.
bdwrite() does not actually do a write, it just marks the block and tells the
vnode layer of its special significance. Tape drives can never have delayed
(and possibly unordered) writes, so we enforce a synchronous write.
bawrite() is much like the synchronous case; however, it marks the block to be
released when output is finished using the ASYNC flag on the block. Note that
the read-ahead block will also be released in the same manner when its read
completes.
brelse() is how a block buffer is returned for use by the rest of the system.
To prevent congestion, other processes waiting for this or any other block are
notified of the change in state. The WANTED flags merely reduce the number of
spurious wakeups that might otherwise be generated. We then categorize the
block and put it on the appropriate list for reuse. (This is where
buffer-cache policies are instituted.) It then is marked no longer busy, and
may be reallocated.
getblk() and others use incore() and getnewbuf() to do the dirty work of
locating blocks in the cache and obtaining them. Buffers are allocated space
with malloc(), and if the size changes, allocbuf() adjusts the size
accordingly. The file systems themselves are responsible for upgrading the
size of blocks that are cached, because new data might need to be read in.
Note that as buffers gain and leave association with a given file (vnode
pointer), they must inform the virtual file-system layer of the event.
Likewise, a block buffer must gain and lose association with a given freelist
(search for a freeblock), and hashlist (so it can be located by search for
contents).
biowait() is used to wait for a buffer to be finished with I/O, and biodone()
signals the end of I/O to interested biowait() calls; see Listing Two, page
101. Both are used for dealing with the drivers. biodone(), in particular,
needs to specially handle cases for the virtual memory system (B_CALL) and
asynchronous I/O (deallocation).


4.4BSD Demands


The current BSD kernel uses the block cache quite differently from older
versions. It is now a logical cache: Its contents are relative to a logical
file rather than to a physical disk-sector address. As a result, the virtual
file-system layer must translate between the two on demand, and do the
necessary I/O operation, VOP_STRATEGY()
Also, the vnode layer must track delayed writes with a list of dirty blocks
per each vnode. bgetvp(), brelvp(), and reassignbuf() track the assignment of
blocks to clean and dirty block lists in each vnode. This information is
consulted with a file commit (fsync()). File-system commit (sync()) operation
is done by the file-system layers.
Because the file system may work with variable block sizes, the buffer
contents and sizing are actually the provenance of the file-system code.
Surprisingly, the buffer cache has no knowledge of the scaling of the logical
blocks it manages, and limited knowledge on the size of the cached block
itself. (Only the file system associated with the block knows how much is
valid at any time.) This makes it particularly difficult to advance the
state-of-the-art of a file-system page cache.


4.4BSD Weaknesses


As a result, there are many weaknesses in this design. First, it creates a
synchronous logjam on I/O, as blocks are sequenced out to the disk driver.
(Everything is done in small, synchronous transfers that don't allow the
modern disk subsystems to maintain high data rates.) The buffer cache is
privately managed space, so it competes for resources with the virtual memory
system, with which it's not on speaking terms. Worse yet, both tend to cache
the same data, so they are at odds with duplicated effort and state
information. Finally, the cache policies implied by the buffer cache usurp the
kinds of policies a file system might wish to make. An example of this is when
an NFS file server wishes to offer "leases" on buffered contents of
file-system information to reduce client-server cache coherence cost.


Terminal Ring Buffers


The final missing piece for an operable system is the code to manipulate the
character buffers used by the tty driver, called "clists." clists are just
another privately managed buffer mechanism that relies on the reallocation of
a pool of small (32-byte) blocks of character data that can be viewed
logically as a FIFO queue of characters. The tty driver uses these queues to
implement a general-purpose terminal interface for consoles, serial ports, and
network sessions. clists are an elegant mechanism to allow numerous terminal
sessions to share a region of buffer memory, and they were ideal for a
timesharing system with small memory and a large number of competing sessions.
However, they are cumbersome to maintain and inconvenient for mass transfers
(such as painting a bitmap screen). Therefore, we have written code to
implement ring buffers in their place. These reduce the cost of character
buffer management, especially in the mass transfer case. Instead of making an
analogue of the interface of clists, we modified the BSD tty driver and
related code to take advantage of large, contiguous buffer regions of
characters that this approach afforded.


Character-by-Character Operations



For the drivers themselves, written with character-by-character code, we kept
the traditional getc/putc operations and their inverse operations
ungetc/unputc; see Listing Three, page 103. These work by means of successor
and predecessor macros (Listing Four, page 104) that topologically make a ring
buffer's data region contiguous. A side effect of this is that the operations
and inverses are valid for any underlying method of storage. So if, for
example, we wanted to use another buffering mechanism (such as BSD mbufs), we
could do so just by modifying the macros.


Block Operations


Block operations are afforded by the contiguous transfer-length macros that
allow inline code to manipulate ring-buffer contents in contiguous sections.
This means that code to replace clist-to-block (and its inverse) must be
generated on a case-by-case basis, but that code is exactly where the transfer
rate bottleneck is anyway, so this is appropriate.
This scheme requires more space for each active terminal session, because the
blocks don't share buffer space but still have private ring-buffer contents.
Also, at the moment, this code is faster than the buffering policies
anticipate, so the higher layers of the tty driver suspend operation for too
long, anticipating the usual transfer rate. As such, much work needs to be
done to tune the system for this mechanism.


386BSD: Other Portions Beyond Basic Operation


With the code presented in this article and a list of trivial bug fixes
available from DDJ, the system becomes bootable (using the MS-DOS bootstrap)
and can rebuild itself. However, to be fully complete, two areas remain. One
is the "raw" I/O facility for mass-storage devices, that allows block transfer
directly to a user process. This is used in file-system integrity checks, and
file-system dump and backup procedures. In addition, no user-process debugging
can be done, because the process-tracing facility is not yet present (although
process core dumps are available).


Lessons Learned


We were loath to proceed in the manner outlined here because we ended up
creating some backward-looking portions of code. We worried about the waste of
time and loss of focus such a diversion might cause. However, in retrospect,
it was the fastest way to clearly outline the problems to be considered while
working on more grandiose or innovative schemes. Our enforced realism exposed
many weaknesses lying dormant and taken for granted.
William Saroyan once said that re-reading a good book was never a waste of
time, because in every great work lay little things that had been unnoticed or
forgotten. It seems that the same holds true for systems programming and
design.


_PORTING UNIX TO THE 386: THE MISSING PIECES, PART II_
by William Jolitz and Lynne Greer Jolitz


[LISTING ONE]

/* Copyright (c) 1989, 1990, 1991, 1992 William F. Jolitz, TeleMuse
 * All rights reserved.
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 * 1. Redistributions of source code must retain the above copyright
 * notice, this list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright
 * notice, this list of conditions and the following disclaimer in the
 * documentation and/or other materials provided with the distribution.
 * 3. All advertising materials mentioning features or use of this software
 * must display the following acknowledgement:
 * This software is a component of "386BSD" developed by
 * William F. Jolitz, TeleMuse.
 * 4. Neither the name of the developer nor the name "386BSD" may be used to
 * endorse or promote products derived from this software without specific
 * prior written permission.
 * THIS SOFTWARE IS A COMPONENT OF 386BSD DEVELOPED BY WILLIAM F. JOLITZ
 * AND IS INTENDED FOR RESEARCH AND EDUCATIONAL PURPOSES ONLY. THIS
 * SOFTWARE SHOULD NOT BE CONSIDERED TO BE A COMMERCIAL PRODUCT.
 * THE DEVELOPER URGES THAT USERS WHO REQUIRE A COMMERCIAL PRODUCT
 * NOT MAKE USE OF THIS WORK. THIS SOFTWARE IS PROVIDED BY THE DEVELOPER
 * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
 * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
 * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE DEVELOPER BE LIABLE FOR ANY
 * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
 * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES;
 * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
 * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT

 * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
 * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 *
 * This procedure implements a minimal program execution facility for
 * 386BSD. It interfaces to the BSD kernel as the execve system call.
 * Significant limitations and lack of compatiblity with POSIX are
 * present with this version, to make its basic operation more clear.
 */

#include "param.h"
#include "systm.h"
#include "proc.h"
#include "mount.h"
#include "namei.h"
#include "vnode.h"
#include "file.h"
#include "exec.h"
#include "stat.h"
#include "wait.h"
#include "signalvar.h"
#include "mman.h"
#include "malloc.h"

#include "vm/vm.h"
#include "vm/vm_param.h"
#include "vm/vm_map.h"
#include "vm/vm_kern.h"

#include "machine/reg.h"
extern int dostacklimits;

/* execve() system call. */
/* ARGSUSED */
execve(p, uap, retval)
 struct proc *p;
 register struct args {
 char *fname;
 char **argp;
 char **envp;
 } *uap;
 int *retval;
{
 register struct nameidata *ndp;
 struct nameidata nd;
 struct exec hdr;
 char **argbuf, **argbufp, *stringbuf, *stringbufp;
 char **vectp, *ep;
 int needsenv, limitonargs, stringlen, addr, size, len,
 rv, amt, argc, tsize, dsize, bsize, cnt, foff;
 struct vattr attr;
 struct vmspace *vs;
 caddr_t newframe;
 /* Step 1. Lookup filename to see if we have something to execute. */
 ndp = &nd;
 ndp->ni_nameiop = LOOKUP LOCKLEAF FOLLOW;
 ndp->ni_segflg = UIO_USERSPACE;
 ndp->ni_dirp = uap->fname;
 /* is it there? */
 if (rv = namei(ndp, p))

 return (rv);
 /* is it a regular file? */
 if (ndp->ni_vp->v_type != VREG) {
 vput(ndp->ni_vp);
 return(ENOEXEC);
 }
 /* is it executable? */
 rv = VOP_ACCESS(ndp->ni_vp, VEXEC, p->p_ucred, p);
 if (rv)
 goto exec_fail;
 /* does it have any attributes? */
 rv = VOP_GETATTR(ndp->ni_vp, &attr, p->p_ucred, p);
 if (rv)
 goto exec_fail;
 /* Step 2. Does file contain a format we can understand and execute */
 rv = vn_rdwr(UIO_READ, ndp->ni_vp, (caddr_t)&hdr, sizeof(hdr),
 0, UIO_SYSSPACE, IO_NODELOCKED, p->p_ucred, &amt, p);
 /* big enough to hold a header? */
 if (rv)
 goto exec_fail;
 /* ... that we recognize? */
 rv = ENOEXEC;
 if (hdr.a_magic != ZMAGIC)
 goto exec_fail;
 /* sanity check "ain't not such thing as a sanity clause" -groucho */
 if (hdr.a_text > MAXTSIZ
 hdr.a_text % NBPG hdr.a_text > attr.va_size)
 goto exec_fail;
 if (hdr.a_data == 0 hdr.a_data > DFLDSIZ
 hdr.a_data > attr.va_size
 hdr.a_data + hdr.a_text > attr.va_size)
 goto exec_fail;
 if (hdr.a_bss > MAXDSIZ)
 goto exec_fail;
 if (hdr.a_text + hdr.a_data + hdr.a_bss > MAXTSIZ + MAXDSIZ)
 goto exec_fail;
 /* Step 3. File and header are valid. Now, dig out the strings
 * out of the old process image. */
 /* We implement a single-pass algorithm that builds a new stack
 * frame within the address space of the "old" process image,
 * avoiding the second pass entirely. Thus, the new frame is
 * in position to be run. This consumes much virtual address space,
 * and two pages more of 'real' memory, such are the costs.
 * [Also, note the cache wipe that's avoided!] */
 /* create anonymous memory region for new stack */
 vs = p->p_vmspace;
 if ((unsigned)vs->vm_maxsaddr + MAXSSIZ < USRSTACK)
 newframe = (caddr_t) USRSTACK - MAXSSIZ;
 else
 vs->vm_maxsaddr = newframe = (caddr_t) USRSTACK - 2*MAXSSIZ;
 /* don't do stack limit checking on traps temporarily XXX*/
 dostacklimits = 0;
 rv = vm_allocate(&vs->vm_map, &newframe, MAXSSIZ, FALSE);
 if (rv) goto exec_fail;
 /* allocate string buffer and arg buffer */
 argbuf = (char **) (newframe + MAXSSIZ - 3*ARG_MAX);
 stringbuf = stringbufp = ((char *)argbuf) + 2*ARG_MAX;
 argbufp = argbuf;
 /* first, do args */

 vectp = uap->argp;
 needsenv = 1;
 limitonargs = ARG_MAX;
 cnt = 0;
do_env_as_well:
 if(vectp == 0) goto dont_bother;
 /* for each envp, copy in string */
 do {
 /* did we outgrow initial argbuf, if so, die */
 if (argbufp == (char **)stringbuf) {
 rv = E2BIG;
 goto exec_dealloc;
 }
 /* get an string pointer */
 ep = (char *)fuword(vectp++);
 if (ep == (char *)-1) {
 rv = EFAULT;
 goto exec_dealloc;
 }
 /* if not a null pointer, copy string */
 if (ep) {
 if (rv = copyinoutstr(ep, stringbufp,
 (u_int)limitonargs, (u_int *) &stringlen)) {
 if (rv == ENAMETOOLONG)
 rv = E2BIG;
 goto exec_dealloc;
 }
 suword(argbufp++, (int)stringbufp);
 cnt++;
 stringbufp += stringlen;
 limitonargs -= stringlen;
 } else {
 suword(argbufp++, 0);
 break;
 }
 } while (limitonargs > 0);
dont_bother:
 if (limitonargs <= 0) {
 rv = E2BIG;
 goto exec_dealloc;
 }
 /* have we done the environment yet ? */
 if (needsenv) {
 /* remember the arg count for later */
 argc = cnt;
 vectp = uap->envp;
 needsenv = 0;
 goto do_env_as_well;
 }
 /* At this point, one could optionally implement a second pass to
 * condense strings, arguement vectors, and stack to fit fewest pages.
 * One might selectively do this when copying was cheaper
 * than leaving allocated two more pages per process. */
 /* stuff arg count on top of "new" stack */
 argbuf[-1] = (char *)argc;
 /* Step 4. Build the new processes image. At this point, we are
 * committed -- destroy old executable! */
 /* blow away all address space, except the stack */
 rv = vm_deallocate(&vs->vm_map, 0, USRSTACK - 2*MAXSSIZ, FALSE);

 if (rv)
 goto exec_abort;
 /* destroy "old" stack */
 if ((unsigned)newframe < USRSTACK - MAXSSIZ) {
 rv = vm_deallocate(&vs->vm_map, USRSTACK - MAXSSIZ, MAXSSIZ,
 FALSE);
 if (rv)
 goto exec_abort;
 } else {
 rv = vm_deallocate(&vs->vm_map, USRSTACK - 2*MAXSSIZ, MAXSSIZ,
 FALSE);
 if (rv)
 goto exec_abort;
 }
 /* build a new address space */
 addr = 0;
 /* screwball mode -- special case of 413 to save space for floppy */
 if (hdr.a_text == 0) {
 foff = tsize = 0;
 hdr.a_data += hdr.a_text;
 } else {
 tsize = roundup(hdr.a_text, NBPG);
 foff = NBPG;
 }
 /* treat text and data in terms of integral page size */
 dsize = roundup(hdr.a_data, NBPG);
 bsize = roundup(hdr.a_bss + dsize, NBPG);
 bsize -= dsize;
 /* map text & data in file, as being "paged in" on demand */
 rv = vm_mmap(&vs->vm_map, &addr, tsize+dsize, VM_PROT_ALL,
 MAP_FILEMAP_COPYMAP_FIXED, (caddr_t)ndp->ni_vp, foff);
 if (rv)
 goto exec_abort;
 /* mark pages r/w data, r/o text */
 if (tsize) {
 addr = 0;
 rv = vm_protect(&vs->vm_map, addr, tsize, FALSE,
 VM_PROT_READVM_PROT_EXECUTE);
 if (rv)
 goto exec_abort;
 }
 /* create anonymous memory region for bss */
 addr = dsize + tsize;
 rv = vm_allocate(&vs->vm_map, &addr, bsize, FALSE);
 if (rv)
 goto exec_abort;
 /* Step 5. Prepare process for execution. */
 /* touchup process information -- vm system is unfinished! */
 vs->vm_tsize = tsize/NBPG; /* text size (pages) XXX */
 vs->vm_dsize = (dsize+bsize)/NBPG; /* data size (pages) XXX */
 vs->vm_taddr = 0; /* user virtual address of text XXX */
 vs->vm_daddr = (caddr_t)tsize; /* user virtual address of data XXX */
 vs->vm_maxsaddr = newframe; /* user VA at max stack growth XXX */
 vs->vm_ssize = ((unsigned)vs->vm_maxsaddr + MAXSSIZ
 - (unsigned)argbuf)/ NBPG + 1; /* stack size (pages) */
 dostacklimits = 1; /* allow stack limits to be enforced XXX */
 /* close files on exec, fixup signals */
 fdcloseexec(p);
 execsigs(p);

 /* setup initial register state */
 p->p_regs[SP] = (unsigned) (argbuf - 1);
 setregs(p, hdr.a_entry);
 vput(ndp->ni_vp);
 return (0);
exec_dealloc:
 /* remove interim "new" stack frame we were building */
 vm_deallocate(&vs->vm_map, newframe, MAXSSIZ, FALSE);
exec_fail:
 dostacklimits = 1;
 vput(ndp->ni_vp);
 return(rv);
exec_abort:
 /* sorry, no more process anymore. exit gracefully */
 vm_deallocate(&vs->vm_map, newframe, MAXSSIZ, FALSE);
 vput(ndp->ni_vp);
 exit(p, W_EXITCODE(0, SIGABRT));
 /* NOTREACHED */
 return(0);
}





[LISTING TWO]

/* Copyright (c) 1992 William Jolitz. All rights reserved.
 * Written by William Jolitz 1/92
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 * 1. Redistributions of source code must retain the above copyright
 * notice, this list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright
 * notice, this list of conditions and the following disclaimer in the
 * documentation and/or other materials provided with the distribution.
 * 3. All advertising materials mentioning features or use of this software
 * must display the following acknowledgement:
 * This software is a component of "386BSD" developed by
 William F. Jolitz, TeleMuse.
 * 4. Neither the name of the developer nor the name "386BSD"
 * may be used to endorse or promote products derived from this software
 * without specific prior written permission.
 * THIS SOFTWARE IS A COMPONENT OF 386BSD DEVELOPED BY WILLIAM F. JOLITZ
 * AND IS INTENDED FOR RESEARCH AND EDUCATIONAL PURPOSES ONLY. THIS SOFTWARE
 * SHOULD NOT BE CONSIDERED TO BE A COMMERCIAL PRODUCT. THE DEVELOPER URGES
 * THAT USERS WHO REQUIRE A COMMERCIAL PRODUCT NOT MAKE USE OF THIS WORK. THIS
 * SOFTWARE IS PROVIDED BY THE DEVELOPER ``AS IS'' AND ANY EXPRESS OR IMPLIED
 * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
 * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO
 * EVENT SHALL THE DEVELOPER BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
 * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
TO,
 * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
 * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
 * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
 * OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
 * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 *

 * Block I/O Cache mechanism, ala malloc(). */
#include "param.h"
#include "proc.h"
#include "vnode.h"
#include "buf.h"
#include "specdev.h"
#include "mount.h"
#include "malloc.h"
#include "resourcevar.h"

/* Initialize buffer headers and related structures. */
void bufinit()
{
 struct bufhd *bh;
 struct buf *bp;

 /* first, make a null hash table */
 for(bh = bufhash; bh < bufhash + BUFHSZ; bh++) {
 bh->b_flags = 0;
 bh->b_forw = (struct buf *)bh;
 bh->b_back = (struct buf *)bh;
 }
 /* next, make a null set of free lists */
 for(bp = bfreelist; bp < bfreelist + BQUEUES; bp++) {
 bp->b_flags = 0;
 bp->av_forw = bp;
 bp->av_back = bp;
 bp->b_forw = bp;
 bp->b_back = bp;
 }
 /* finally, initialize each buffer header and stick on empty q */
 for(bp = buf; bp < buf + nbuf ; bp++) {
 bp->b_flags = B_HEAD B_INVAL; /* we're just an empty header */
 bp->b_dev = NODEV;
 bp->b_vp = 0;
 binstailfree(bp, bfreelist + BQ_EMPTY);
 binshash(bp, bfreelist + BQ_EMPTY);
 }
}
/* Find the block in the buffer pool. If buffer is not present, allocate a new
 * buffer and load its contents according to the filesystem fill routine. */
bread(vp, blkno, size, cred, bpp)
 struct vnode *vp;
 daddr_t blkno;
 int size;
 struct ucred *cred;
 struct buf **bpp;
{
 struct buf *bp;
 int rv = 0;
 bp = getblk (vp, blkno, size);
 /* if not found in cache, do some I/O */
 if ((bp->b_flags & B_CACHE) == 0 (bp->b_flags & B_INVAL) != 0) {
 bp->b_flags = B_READ;
 bp->b_flags &= ~(B_DONEB_ERRORB_INVAL);
 VOP_STRATEGY(bp);
 rv = biowait (bp);
 }
 *bpp = bp;

 return (rv);
}
/* Operates like bread, but also starts I/O on the specified read-ahead block.
 * [See page 55 of Bach's Book] */
breada(vp, blkno, size, rablkno, rabsize, cred, bpp)
 struct vnode *vp;
 daddr_t blkno; int size;
 daddr_t rablkno; int rabsize;
 struct ucred *cred;
 struct buf **bpp;
{
 struct buf *bp, *rabp;
 int rv = 0, needwait = 0;
 bp = getblk (vp, blkno, size);
 /* if not found in cache, do some I/O */
 if ((bp->b_flags & B_CACHE) == 0 (bp->b_flags & B_INVAL) != 0) {
 bp->b_flags = B_READ;
 bp->b_flags &= ~(B_DONEB_ERRORB_INVAL);
 VOP_STRATEGY(bp);
 needwait++;
 }
 rabp = getblk (vp, rablkno, rabsize);
 /* if not found in cache, do some I/O (overlapped with first) */
 if ((rabp->b_flags & B_CACHE) == 0 (rabp->b_flags & B_INVAL) != 0) {
 rabp->b_flags = B_READ B_ASYNC;
 rabp->b_flags &= ~(B_DONEB_ERRORB_INVAL);
 VOP_STRATEGY(rabp);
 } else
 brelse(rabp);
 /* wait for original I/O */
 if (needwait)
 rv = biowait (bp);
 *bpp = bp;
 return (rv);
}
/* Synchronous write. Release buffer on completion. */
bwrite(bp)
 register struct buf *bp;
{
 int rv;
 if(bp->b_flags & B_INVAL) {
 brelse(bp);
 return (0);
 } else {
 int wasdelayed;
 wasdelayed = bp->b_flags & B_DELWRI;
 bp->b_flags &= ~(B_READB_DONEB_ERRORB_ASYNCB_DELWRI);
 if(wasdelayed) reassignbuf(bp, bp->b_vp);
 bp->b_flags = B_DIRTY;
 VOP_STRATEGY(bp);
 rv = biowait(bp);
 if (!rv)
 bp->b_flags &= ~B_DIRTY;
 brelse(bp);
 return (rv);
 }
}
/* Delayed write. The buffer is marked dirty, but is not queued for I/O. This
 * routine should be used when the buffer is expected to be modified again

 * soon, typically a small write that partially fills a buffer. NB: magnetic
 * tapes can't be delayed; must be written in order writes are requested. */
void bdwrite(bp)
 register struct buf *bp;
{
 if(bp->b_flags & B_INVAL)
 brelse(bp);
 if(bp->b_flags & B_TAPE) {
 bwrite(bp);
 return;
 }
 bp->b_flags &= ~(B_READB_DONE);
 bp->b_flags = B_DIRTYB_DELWRI;
 reassignbuf(bp, bp->b_vp);
 brelse(bp);
 return;
}
/* Asynchronous write. Start I/O on a buffer, but do not wait for it to
 * complete. The buffer is released when the I/O completes. */
bawrite(bp)
 register struct buf *bp;
{
 if(!(bp->b_flags & B_BUSY))panic("bawrite: not busy");
 if(bp->b_flags & B_INVAL)
 brelse(bp);
 else {
 int wasdelayed;
 wasdelayed = bp->b_flags & B_DELWRI;
 bp->b_flags &= ~(B_READB_DONEB_ERRORB_DELWRI);
 if(wasdelayed) reassignbuf(bp, bp->b_vp);
 bp->b_flags = B_DIRTY B_ASYNC;
 VOP_STRATEGY(bp);
 }
}
/* Release a buffer. Even if the buffer is dirty, no I/O is started. */
brelse(bp)
 register struct buf *bp;
{
 int x;
 /* anyone need a "free" block? */
 x=splbio();
 if ((bfreelist + BQ_AGE)->b_flags & B_WANTED) {
 (bfreelist + BQ_AGE) ->b_flags &= ~B_WANTED;
 wakeup(bfreelist);
 }
 /* anyone need this very block? */
 if (bp->b_flags & B_WANTED) {
 bp->b_flags &= ~B_WANTED;
 wakeup(bp);
 }
 if (bp->b_flags & (B_INVALB_ERROR)) {
 bp->b_flags = B_INVAL;
 bp->b_flags &= ~(B_DELWRIB_CACHE);
 if(bp->b_vp)
 brelvp(bp);
 }
 /* enqueue */
 /* buffers with junk contents */
 if(bp->b_flags & (B_ERRORB_INVALB_NOCACHE))

 binsheadfree(bp, bfreelist + BQ_AGE)
 /* buffers with stale but valid contents */
 else if(bp->b_flags & B_AGE)
 binstailfree(bp, bfreelist + BQ_AGE)
 /* buffers with valid and quite potentially reuseable contents */
 else
 binstailfree(bp, bfreelist + BQ_LRU)
 /* unlock */
 bp->b_flags &= ~B_BUSY;
 splx(x);
 return;
}
int freebufspace = 20*NBPG;
int allocbufspace;
/* Find a buffer which is available for use. If free memory for buffer space
 * and an empty header from the empty list, use that. Otherwise, select
 * something from a free list. Preference is to AGE list, then LRU list. */
struct buf *
getnewbuf(sz)
{
 struct buf *bp;
 int x;
 x = splbio();
start:
 /* can we constitute a new buffer? */
 if (freebufspace > sz
 && bfreelist[BQ_EMPTY].av_forw != (struct buf *)bfreelist+BQ_EMPTY) {
 caddr_t addr;
 if ((addr = malloc (sz, M_TEMP, M_NOWAIT)) == 0) goto tryfree;
 freebufspace -= sz;
 allocbufspace += sz;
 bp = bfreelist[BQ_EMPTY].av_forw;
 bp->b_flags = B_BUSY B_INVAL;
 bremfree(bp);
 bp->b_un.b_addr = (caddr_t) addr;
 goto fillin;
 }
tryfree:
 if (bfreelist[BQ_AGE].av_forw != (struct buf *)bfreelist+BQ_AGE) {
 bp = bfreelist[BQ_AGE].av_forw;
 bremfree(bp);
 } else if (bfreelist[BQ_LRU].av_forw != (struct buf *)bfreelist+BQ_LRU) {
 bp = bfreelist[BQ_LRU].av_forw;
 bremfree(bp);
 } else {
 /* wait for a free buffer of any kind */
 (bfreelist + BQ_AGE)->b_flags = B_WANTED;
 sleep(bfreelist, PRIBIO);
 splx(x);
 return (0);
 }
 /* if we are a delayed write, convert to an async write! */
 if (bp->b_flags & B_DELWRI) {
 bp->b_flags = B_BUSY;
 bawrite (bp);
 goto start;
 }
 if(bp->b_vp)
 brelvp(bp);

 /* we are not free, nor do we contain interesting data */
 bp->b_flags = B_BUSY;
fillin:
 bremhash(bp);
 splx(x);
 bp->b_dev = NODEV;
 bp->b_vp = NULL;
 bp->b_blkno = bp->b_lblkno = 0;
 bp->b_iodone = 0;
 bp->b_error = 0;
 bp->b_wcred = bp->b_rcred = NOCRED;
 if (bp->b_bufsize != sz) allocbuf(bp, sz);
 bp->b_bcount = bp->b_bufsize = sz;
 bp->b_dirtyoff = bp->b_dirtyend = 0;
 return (bp);
}
/* Check to see if a block is currently memory resident. */
struct buf *incore(vp, blkno)
 struct vnode *vp;
 daddr_t blkno;
{
 struct buf *bh;
 struct buf *bp;
 bh = BUFHASH(vp, blkno);
 /* Search hash chain */
 bp = bh->b_forw;
 while (bp != (struct buf *) bh) {
 /* hit */
 if (bp->b_lblkno == blkno && bp->b_vp == vp
 && (bp->b_flags & B_INVAL) == 0)
 return (bp);
 bp = bp->b_forw;
 }
 return(0);
}
/* Get a block of requested size that is associated with a given vnode and
 * block offset. If it is found in block cache, mark it as found, make it busy
 * and return it. Otherwise, return empty block of the correct size. It is up
 * to caller to insure that the cached blocks be of the correct size. */
struct buf *
getblk(vp, blkno, size)
 register struct vnode *vp;
 daddr_t blkno;
 int size;
{
 struct buf *bp, *bh;
 int x;
 for (;;) {
 if (bp = incore(vp, blkno)) {
 x = splbio();
 if (bp->b_flags & B_BUSY) {
 bp->b_flags = B_WANTED;
 sleep (bp, PRIBIO);
 continue;
 }
 bp->b_flags = B_BUSY B_CACHE;
 bremfree(bp);
 if (size > bp->b_bufsize)
 panic("now what do we do?");

 } else {
 if((bp = getnewbuf(size)) == 0) continue;
 bp->b_blkno = bp->b_lblkno = blkno;
 bgetvp(vp, bp);
 x = splbio();
 bh = BUFHASH(vp, blkno);
 binshash(bp, bh);
 bp->b_flags = B_BUSY;
 }
 splx(x);
 return (bp);
 }
}
/* Get an empty, disassociated buffer of given size. */
struct buf *
geteblk(size)
 int size;
{
 struct buf *bp;
 int x;
 while ((bp = getnewbuf(size)) == 0)
 ;
 x = splbio();
 binshash(bp, bfreelist + BQ_AGE);
 splx(x);
 return (bp);
}
/* Exchange a buffer's underlying buffer storage for one of different size,
 * taking care to maintain contents appropriately. When buffer increases in
 * size, caller is responsible for filling out additional contents. When
buffer
 * shrinks in size, data is lost, so caller must first return it to backing
 * store before shrinking the buffer, as no implied I/O will be done.
 * Expanded buffer is returned as value. */
struct buf *
allocbuf(bp, size)
 register struct buf *bp;
 int size;
{
 caddr_t newcontents;
 /* get new memory buffer */
 newcontents = (caddr_t) malloc (size, M_TEMP, M_WAITOK);
 /* copy the old into the new, up to the maximum that will fit */
 bcopy (bp->b_un.b_addr, newcontents, min(bp->b_bufsize, size));
 /* return old contents to free heap */
 free (bp->b_un.b_addr, M_TEMP);
 /* adjust buffer cache's idea of memory allocated to buffer contents */
 freebufspace -= size - bp->b_bufsize;
 allocbufspace += size - bp->b_bufsize;
 /* update buffer header */
 bp->b_un.b_addr = newcontents;
 bp->b_bcount = bp->b_bufsize = size;
 return(bp);
}
/* Patiently await operations to complete on this buffer. When they do,
 * extract error value and return it. Extract and return any errors associated
 * with the I/O. If an invalid block, force it off the lookup hash chains. */
biowait(bp)
 register struct buf *bp;
{

 int x;
 x = splbio();
 while ((bp->b_flags & B_DONE) == 0)
 sleep((caddr_t)bp, PRIBIO);
 if((bp->b_flags & B_ERROR) bp->b_error) {
 if ((bp->b_flags & B_INVAL) == 0) {
 bp->b_flags = B_INVAL;
 bremhash(bp);
 binshash(bp, bfreelist + BQ_AGE);
 }
 if (!bp->b_error)
 bp->b_error = EIO;
 else
 bp->b_flags = B_ERROR;
 splx(x);
 return (bp->b_error);
 } else {
 splx(x);
 return (0);
 }
}
/* Finish up operations on a buffer, calling an optional function (if
 * requested), and releasing the buffer if marked asynchronous. Then mark this
 * buffer done so that others biowait()'ing for it will notice when they are
 * woken up from sleep(). */
biodone(bp)
 register struct buf *bp;
{
 int x;
 x = splbio();
 if (bp->b_flags & B_CALL) (*bp->b_iodone)(bp);
 bp->b_flags &= ~B_CALL;
 if (bp->b_flags & B_ASYNC) brelse(bp);
 bp->b_flags &= ~B_ASYNC;
 bp->b_flags = B_DONE;
 wakeup(bp);
 splx(x);
}





[LISTING THREE]

/* Copyright (c) 1992 William F. Jolitz. All rights reserved.
 * Written by William Jolitz 1/92
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met: 1. Redistributions of source code must retain the above copyright
 * notice, this list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright
 * notice, this list of conditions and the following disclaimer in the
 * documentation and/or other materials provided with the distribution.
 * 3. All advertising materials mentioning features or use of this software
 * must display the following acknowledgement:
 * This software is a component of "386BSD" developed by
 William F. Jolitz, TeleMuse.
 * 4. Neither the name of the developer nor the name "386BSD"

 * may be used to endorse or promote products derived from this software
 * without specific prior written permission.
 * THIS SOFTWARE IS A COMPONENT OF 386BSD DEVELOPED BY WILLIAM F. JOLITZ
 * AND IS INTENDED FOR RESEARCH AND EDUCATIONAL PURPOSES ONLY. THIS SOFTWARE
 * SHOULD NOT BE CONSIDERED TO BE A COMMERCIAL PRODUCT. THE DEVELOPER URGES
 * THAT USERS WHO REQUIRE A COMMERCIAL PRODUCT NOT MAKE USE OF THIS WORK. THIS
 * SOFTWARE IS PROVIDED BY THE DEVELOPER ``AS IS'' AND ANY EXPRESS OR IMPLIED
 * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
 * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO
 * EVENT SHALL THE DEVELOPER BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
 * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
TO,
 * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
 * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
 * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
 * OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
 * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 *
 * Ring Buffer code for 386BSD. */
#include "param.h"
#include "systm.h"
#include "buf.h"
#include "ioctl.h"
#include "tty.h"

putc(c, rbp) struct ringb *rbp;
{
 char *nxtp;
 /* ring buffer full? */
 if ( (nxtp = RB_SUCC(rbp, rbp->rb_tl)) == rbp->rb_hd) return (-1);
 /* stuff character */
 *rbp->rb_tl = c;
 rbp->rb_tl = nxtp;
 return(0);
}
getc(rbp) struct ringb *rbp;
{
 u_char c;
 /* ring buffer empty? */
 if (rbp->rb_hd == rbp->rb_tl) return(-1);
 /* fetch character, locate next character */
 c = *(u_char *) rbp->rb_hd;
 rbp->rb_hd = RB_SUCC(rbp, rbp->rb_hd);
 return (c);
}
nextc(cpp, rbp) struct ringb *rbp; char **cpp; {
 if (*cpp == rbp->rb_tl) return (0);
 else { char *cp;
 cp = *cpp;
 *cpp = RB_SUCC(rbp, cp);
 return(*cp);
 }
}
ungetc(c, rbp) struct ringb *rbp;
{
 char *backp;
 /* ring buffer full? */
 if ( (backp = RB_PRED(rbp, rbp->rb_hd)) == rbp->rb_tl) return (-1);
 rbp->rb_hd = backp;
 /* stuff character */

 *rbp->rb_hd = c;
 return(0);
}
unputc(rbp) struct ringb *rbp;
{
 char *backp;
 int c;
 /* ring buffer empty? */
 if (rbp->rb_hd == rbp->rb_tl) return(-1);
 /* backup buffer and dig out previous character */
 backp = RB_PRED(rbp, rbp->rb_tl);
 c = *(u_char *)backp;
 rbp->rb_tl = backp;
 return(c);
}
#define peekc(rbp) (*(rbp)->rb_hd)
initrb(rbp) struct ringb *rbp; {
 rbp->rb_hd = rbp->rb_tl = rbp->rb_buf;
}
/* Example code for contiguous operations:
 ...
 nc = RB_CONTIGPUT(&rb);
 if (nc) {
 if (nc > 9) nc = 9;
 bcopy("ABCDEFGHI", rb.rb_tl, nc);
 rb.rb_tl += nc;
 rb.rb_tl = RB_ROLLOVER(&rb, rb.rb_tl);
 }
 ...
 ...
 nc = RB_CONTIGGET(&rb);
 if (nc) {
 if (nc > 79) nc = 79;
 bcopy(rb.rb_hd, stringbuf, nc);
 rb.rb_hd += nc;
 rb.rb_hd = RB_ROLLOVER(&rb, rb.rb_hd);
 stringbuf[nc] = 0;
 printf("%s", stringbuf);
 }
 ...
 */
/* Concatinate ring buffers. */
catb(from, to)
 struct ringb *from, *to;
{
 char c;
 while ((c = getc(from)) >= 0)
 putc(c, to);
}





[LISTING FOUR]

/* [Excerpted from tty.h, 386BSD Release 0.0 - wfj] */
/* Ring buffers provide a contiguous, dense storage for character data used
 * by the tty driver. */

#define RBSZ 1024
struct ringb {
 char *rb_hd; /* head of buffer segment to be read */
 char *rb_tl; /* tail of buffer segment to be written */
 char rb_buf[RBSZ]; /* segment contents */
};
#define RB_SUCC(rbp, p) \
 ((p) >= (rbp)->rb_buf + RBSZ - 1 ? (rbp)->rb_buf : (p) + 1)
#define RB_ROLLOVER(rbp, p) \
 ((p) > (rbp)->rb_buf + RBSZ - 1 ? (rbp)->rb_buf : (p))
#define RB_PRED(rbp, p) \
 ((p) <= (rbp)->rb_buf ? (rbp)->rb_buf + RBSZ - 1 : (p) - 1)
#define RB_LEN(rp) \
 ((rp)->rb_hd <= (rp)->rb_tl ? (rp)->rb_tl - (rp)->rb_hd : \
 RBSZ - ((rp)->rb_hd - (rp)->rb_tl))
#define RB_CONTIGPUT(rp) \
 (RB_PRED(rp, (rp)->rb_hd) < (rp)->rb_tl ? \
 (rp)->rb_buf + RBSZ - (rp)->rb_tl : \
 RB_PRED(rp, (rp)->rb_hd) - (rp)->rb_tl)
#define RB_CONTIGGET(rp) \
 ((rp)->rb_hd <= (rp)->rb_tl ? (rp)->rb_tl - (rp)->rb_hd : \
 (rp)->rb_buf + RBSZ - (rp)->rb_hd)








































June, 1992
PROGRAMMING THE 1{2}C INTERFACE


When intelligent devices need to communicate




Mitchell Kahn


Mitch is a senior strategic development engineer for Intel and can be
contacted at 5000 W. Chandler Blvd., Chandler, AZ 85226 or at mkahn@sedona.
intel.com.


The Inter-Integrated Circuit Bus (I"{2}C Bus" for short) is a two-wire,
synchronous, serial interface designed primarily for communication between
intelligent IC devices. The I{2}C bus offers several advantages over
"traditional" serial interfaces such as Microwire and RS-232. Among the
advanced features of I{2}C are multimaster operation, automatic baud-rate
adjustment, and "plug-and-play" network extensions.
Mention the I{2}C bus to a group of American engineers and you'll likely get
hit with an abundance of blank stares. I say American engineers because until
recently the I{2}C bus was primarily a European phenomenon. Within the last
year, however, interest in I{2}C in the United States has risen dramatically.
Embedded systems designers are realizing the cost, space, and power savings
afforded by robust serial interchip protocols.
The idea of serial interconnect between integrated circuits is not new. Many
semiconductor vendors offer devices designed to "talk" via serial links with
other processors. Current examples include Microwire (National Semiconductor),
SPI (Motorola), and most recently Echelon's Neuron chips. In all cases, the
goal is the same: to reduce the wiring and pincount necessary for a parallel
data bus. It simply does not make economic sense to route a full-speed
parallel bus to a slow peripheral.
Unfortunately for most serial-bus-capable devices, the choice of a bus
protocol will dictate the CPU architecture. For example, only two CPU
architectures implement an on-chip I{2}C port. If your choice of architecture
precludes use of these architectures, then your only option is to implement
the protocol in software.
The software implementation of the I{2}C protocol discussed in this article
came about as a result of an implicit challenge during a staff meeting. One of
our managers proposed that we hire a consultant to write a software I{2}C
driver for the Intel 80C186EB embedded processor. Being somewhat new to the
group, I took exception (although not verbally!) to his suggestion. A weekend
of intense hacking later, I presented the first prototype of the driver. My
reward? I got to write a generic version of the driver for general
distribution.


Design Trade-offs


Three distinct tasks are involved in implementing the I{2}C protocol: watching
the bus, waiting for a specific amount of time, and driving the bus. This
became apparent when I flowcharted 1 byte of a typical bus transaction; see
Figure 1 . The time delays associated with creating the bus waveforms would
normally have been relegated to the 80C186EB's on-chip timers. I could not,
however, assume that the end users of my code would be able to spare a timer
for the software I{2}C port. I had to forego the elegance (and to some extent
accuracy) of the on-chip timers for the sledgehammer approach of software
timing loops. Luckily, the I{2}C protocol is extremely forgiving with regard
to timing accuracy. The decision to use assembly instead of a high-level
language stemmed directly from the need to control program-execution time. I
had neither the time nor the inclination to handtune high-level code.
Having made the decision to use assembly language, I faced my next problem:
Could I make the code portable? Intel offers a plethora of CPU and
embedded-controller architectures. Would it be possible to make the code
somewhat portable between disparate assembly languages? I found my answer in
the use of macros.
All the basic building blocks of the I{2}C protocol (watching, waiting, and
doing) can be compartmentalized into distinct macros. The algorithms that make
up the I{2}C driver are written with these macros as the framework. You don't
need to understand the intricacies of the I{2}C protocol to port these
routines -- you just need to know how to make your CPU watch, wait, and do.
For example, a 4.7_uS delay is a common event during a transfer. The macro
%Wait_4_7_uS implements just such a delay by using the 8086 LOOP instruction
with a couple of NOPs for tuning; see Example 1(a). Total execution time is
readily calculated from instruction timing tables. The same macro is ported to
the i960 architecture in Example 1(b). Although I am a neophyte when it comes
to i960 programming, I had no problems porting the core macros.
Example 1: (a) 80C186 implementation of 4.7_uS wait macro; (b) 80960CA
implementation of 4.7_uS wait macro.

 (a)

 %*DEFINE(Wait_4_7_uS)(
 mov cx, 5 ; 4 clocks
 loop $ ; 4*15+5 = 65 clocks
 nop ; 3 clocks
 nop ; 3 clocks
 ; total = 75 clocks
 ; 75 * 62.5ns = 4.69uS (close enough)
 )

 (b)

 define(Wait_4_7_uS,'

 lda 0x17, r4 # instruction may be issued in parallel
 # so assume no clocks.
 Ob: cmpdeco 0, r4 # compare and decrement counter in r4
 bne.t Ob # if !=0 branch back (predict taken
 # branch)
 #
 # The cmpdeco and bne.t together take 3
 # clocks in parallel minimum.
 #
 # 0x17 (25 decimal) * 3 = 75 clocks
 # at 16MHz this is 4.69uS
 ')




Hardware Dependencies


A few words about the target hardware are in order before I discuss the code.
Any implementation of the I{2}C protocol requires two open-drain (or
open-collector), bidirectional port pins for the Serial Clock (SCL) and Serial
Data (SDA) lines. The code in this article was designed for the 80C186EB
embedded processor, which has two open-drain ports on-chip. The two pins, P2.6
(SCL) and P2.7 (SDA), are part of a larger 8-bit port. Processors without
open-drain I/O ports can easily implement I{2}C with the addition of an
external open-collector latch.
Two special-function registers, P2PIN and P2LTCH, are used to read and write
the state of the port pins. The 80C186EB allows the special-function registers
to be located anywhere in either memory or I/O space. For this implementation,
I chose to leave the registers in I/O space, even though this limited my
choice of instructions. The 80186 architecture does not provide for
read-modify-write instructions in I/O space (an AND to I/O, for example); it
can only load and store (IN and OUT). So why did I limit myself? Again, I had
to assume the lowest common denominator for our customers when designing my
code.


Building the Framework


Early on in development, I decided to partition my code macros according to
physical processes involved in the I{2}C protocol. Code not directly involved
in mimicking the actions of a hardware I{2}C port was not written as macros.
For example, the code necessary to access the stack frame is not written as a
macro, whereas the code needed to toggle the clock line is. This was done to
isolate architecture-dependent code sequences from the more generic I{2}C
functions. Macros were also not used for "gray areas" such as the shifting of
serial data, which is both architecture dependent and physical in nature. The
I{2}C functions that passed the litmus test fell into the three aforementioned
categories of watching, waiting, and doing.
The "waiting" macros provide a fixed-minimum time delay. They are implemented
using a simple LOOP $ delay. The LOOP instruction decrements the CX register,
then branches to the target (in this case itself) if the result is non-zero.
The delay is (n-1)*15+5 clocks, where n is the starting value in the CX
register. All the delays were calculated assuming a 16-MHz clock rate (62.5
nanoseconds per clock). The code still works at lower CPU speeds because the
I{2}C protocol only specifies minimum timings. In fact, the delay macros are
only "accurate enough," providing timings as close as I could get to the
specified minimum without undue tuning.
The "watching" macros are "spin-on-bit" polling loops. These pieces of code
wait for a transition on the appropriate I{2}C line to occur before allowing
execution to continue. There are two polling macros for each of the two I{2}C
signal lines; one for high-to-low transitions and one for low-to-high
transitions. The polling of the SCL line that gives rise to an important
feature of I{2}C: automatic, bit-by-bit baud-rate adjustment. Any device on
the I{2}C bus may hold the clock line low in order to stall the bus for more
time (a serial wait state). The other devices on the bus are then forced to
poll the SCL line until the slow device releases control of the clock.
The %Get_SDA_Bit macro also falls under the category of "watching." Its
function is simply to return the state of the SDA line without waiting for a
transition. %Get_SDA_Bit is used primarily to pull the serial data off the bus
when the clock is valid.
The "doing" macros control the state of the clock and data lines. As with the
polling macros, there are four types -- one for each transition of the SCL or
SDA lines. The "doing" macros are named to reflect the physical operations
they perform. For example, %Drive_SCL_Low always drives the SCL line to a low
state. %Release_SCL_High, on the other hand, relinquishes control of the SCL
line, which may then be pulled high or driven low by another device on the
bus. A read-modify-write operation is used for the bit manipulation so that
the other 6 bits of Port 2 are not affected by the I{2}C operations.


Getting on the Bus


Three procedures were created using the macro framework. I'll describe only
the master transmit (Listing One, page 106) and master receive functions
(Listing Two, page 108), as they represent the needs of most I{2}C users. The
slave procedure is long and intricate and will not be described here.
An I{2}C master transmission proceeds as follows:
1. The master polls the bus to see if it is in use.
2. The master generates a start condition on the bus.
3. The master broadcasts the slave address and expects an acknowledge (ACK)
from the addressed slave.
4. The master transmits 0 or more bytes of data, expecting an ACK following
each byte.
5. The master generates a stop condition and releases the bus.
The stack frame for the master transmit procedure, 12CXA.A86, includes a far
pointer to the message for transmission, the byte count for the message, and
the slave address. Far pointers and far procedure calls are used in all the
procedures. No attempt was made to conform to a specific high-level language
calling convention, although such a conversion would be trivial. The
procedures save only the state of the modified segment registers.
The master transmit procedure performs error checking on the passed parameters
before attempting to send the message. The maximum message length is set at 64
Kbytes by the segmentation of the 80186 memory space. This restriction could
be removed by including code to handle segment boundaries. The transmit
procedure also checks the direction bit in the slave address to ensure that a
reception was not erroneously indicated. Errors are reported back to the
calling procedure through the AX register. (The exact code is in Listing One.)
The first step in sending a message is getting on the I{2}C bus. The macro
%Check_For_Bus_Free simply polls the bus to determine if any transactions are
in progress. If so, the transmit procedure aborts with the appropriate error
code. If the bus is free, a start condition is generated. The start condition
is defined as a high-to-low transition of SDA with SCL high followed by a
4.7_uS pause. These waveforms are easily generated with the %Drive_SDA_Low and
%Wait_4_7_uS macros.
All communication on the I{2}C bus between the stop and start conditions,
including addressing and data, takes place as an 8-bit data value followed by
an acknowledge bit. This lead to the natural nested loop structure for the
body of the procedure; see Figure 2.
The inner loop is responsible for transmitting the 8 bits of each data byte.
Each transmitted bit generates the appropriate data (SDA) and clock (SCL)
waveforms while checking for both serial wait states and potential bus
collisions. A bus collision occurs when two masters attempt to gain control of
the bus simultaneously. The I{2}C protocol handles collisions with the simple
rule: "He who transmits the first 0 on the SDA line wins the bus." To ensure
that we (the master transmit procedure) own the bus, the SDA line is checked
whenever transmitting a 1. If a 0 is present, then a collision has occurred
(because another master is pulling the line low), and the transfer must be
aborted.
Control is turned over to the outer loop after the 8 bits of data (or address)
have been transmitted. The outer loop immediately checks for an acknowledge
from the addressed slave. The transfer is aborted if an acknowledge is not
received. At the end of the ACK bit the message length counter is decremented.
Control is returned to the inner loop if more data remains, otherwise a stop
condition is generated and the master transmit procedure terminates.
Registers are used for intermediate result storage throughout the body of the
procedure. For example, the AH register is used to hold the current value
(either address or data) being shifted onto the SDA line. This eliminates the
need for local data storage within the procedure.


On the Receiving End


The steps involved in an I{2}C master receive transaction are almost identical
to those in transmission:
1. The master polls the bus to see if it is in use.
2. The master generates a start condition on the bus.
3. The master broadcasts the slave address and expects and ACK from the
addressed slave.
4. The master receives 0 or more bytes of data and sends an ACK to the slave
after each byte. The master signals the last byte by not sending an ACK.
5. The master generates a stop condition and releases the bus.
A far pointer to receive buffer is passed on the stack to the master receive
procedure. The remainder of the parameters--slave address and message
count--are identical between the two procedures. The received message length
is fixed at 64 Kbytes, again because of segmentation. The error-checking,
bus-availability sensing, and start-condition generation sections of the
receive procedure are lifted verbatim from the transmit code.
The structure of the receive procedure differs slightly once the start
condition has been generated; see Figure 3. The slave address is transmitted
using one iteration of the transmit procedure's outer loop. Control is passed
to the receive loop once the slave acknowledges its address.
The receive loop structure is patterned after that of the transmit procedure.
The inner loop controls the clocking of the SCL line and the shifting of the
serial data off the SDA line into the CPU. Eight iterations of the inner loop
are performed to receive each byte. The outer loop stores the received byte in
the buffer, decrements the byte count, then sends an ACK to the slave. The
last data byte is signalled by not sending an ACK.


Using the Procedures


Listing Three (page 110) shows a short program that uses both the master
transmit and master receive procedures. The call to procedure I2C_XMIT
displays the word "bUS-" on a four-character, seven-segment display controlled
by the SAA1064 I[2]C compatible display driver. The time of day is read from
the PCF8583 real-time clock by the call to procedure I2C_RECV.
Please note that interrupts must be disabled during the execution of both
procedures. An interruption at an inopportune time (when the master is not in
control of the clock) could cause the bus to hang. If you need to service
interrupts periodically, then enable them only when the clock is driven low.
These procedures have been tested on a wide array of I{2}C devices ranging
from serial EEPROMs to voice synthesizers. No compatibility problems have been
seen to date.



Enhancing the Code


I've kicked around many ideas for enhancing the I{2}C procedures. You could,
for example, replace the timing loops with timed interrupts. That way, the CPU
could perform useful work during the pauses. Along the same lines, the pauses
could be scheduled using a real-time kernel, again improving CPU throughput.
Finally, you could add a high-level language calling structure.
The use of timed interrupts adds an order of magnitude to the complexity of
the code, but would be worth it for high-performance, real-time systems.


Conclusion


I[2]C is not the only game in town when in comes to serial protocols.
Hopefully, some of the techniques presented here will carry over into the
development of other "simulated" serial protocols, such as those targeted at
the home-automation market. Who knows, maybe someday a snippet of my code may
find its way into a truly intelligent dishwasher. I'll be waiting....


References


I{2}C Bus Specification, Philips Corporation (undated).


_PROGRAMMING THE I2C INTERFACE_
by Mitchell Kahn



[LISTING ONE]

$pagelength (30)
$mod186
$debug
$xref

NAME i2c_transmit;

$include (\include\pcp_io.inc)

PUBLIC i2c_xmit

;****** EQUates ******
BUS_FREE_MIN EQU 2 ; Loop counter for free bus delay.
MAXIMUM_MESSAGE_LEN EQU 255

CODE_ILLEGAL_ADDR EQU 020H
CODE_MSG_LEN EQU 040H

;****** STACK FRAME STRUCTURE ******
stack_frame STRUC
ret_ip DW ?
ret_cs DW ?
buffer_offset DW ?
buffer_segment DW ?
count DW ?
address DW ?
stack_frame ENDS

%*DEFINE(Drive_SCL_Low)(
 mov dx, P2LTCH
 in al, dx

 and al, 10111111B ; SCL is bit 6
 out dx, al
 )
%*DEFINE(Release_SCL_High)(
 mov dx, P2LTCH
 in al, dx
 or al, 01000000B
 out dx, al
 )
%*DEFINE(Drive_SDA_Low)(
 mov dx, P2LTCH
 in al, dx
 and al, 01111111B ; SDA is bit 6
 out dx, al
 )
%*DEFINE(Release_SDA_High)(
 mov dx, P2LTCH
 in al, dx
 or al, 10000000B
 out dx, al
 )
%*DEFINE(Wait_4_7_uS)(
 mov cx, 5
 loop $
 nop
 nop
 )
%*DEFINE(Wait_Half_Bit_Time)(
 mov cx, 3
 loop $
 )
%*DEFINE(Wait_SCL_Low_Time)(
 mov cx, 5
 loop $
 nop
 nop
 )
%*DEFINE(Wait_SCL_High_Time)(
 mov cx, 5
 loop $
 nop
 nop
 )
%*DEFINE(Wait_For_SCL_To_Go_Low)LOCAL wait(
 mov dx, P2PIN
%wait: in al, dx
 test al, 01000000B
 jne %wait
 )
%*DEFINE(Wait_For_SCL_To_Go_High)LOCAL wait(
 mov dx, P2PIN
%wait: in al, dx
 test al, 01000000B
 je %wait
%*DEFINE(Wait_For_SDA_To_Go_High)LOCAL wait(
 mov dx, P2PIN
%wait: in al, dx
 test al, 10000000B
 je %wait

 )
 )
%*DEFINE(Get_SDA_Bit)(
 mov dx, P2PIN
 in al, dx
 and al, 0080H
 )
%*DEFINE(Check_For_Bus_Free)(
 mov dx, P2PIN
 in al, dx
 mov bl, 0C0H ; Mask for SCL and SDA.
 and al, bl ; If SCL and SDA are high
 xor al, bl ; this sequence will leave a zero in AX.
 )

;*****************************************************************************
;** Revision History: 0.0 (7/90): First frozen working verion. No slave wait
;** timeout. No arbitration turn around. Inefficient register usage.
;** 0.1 (7/16/90): 8-bit registers used (improves 80C188EB. Use STRUCT for
;** stack frame clarity. Implements slave wait timeout. Saves ES.
;*****************************************************************************

;*****************************************************************
;** Procedure I2C_XMIT **
;** Call Type: FAR **
;** Uses : All regs. **
;** Saves : DS and ES only. **
;** Stack Frame: **
;** [bp]= ip **
;** [bp+2]= cs **
;** [bp+4]= message offset **
;** [bp+6]= message segment **
;** [bp+8]= message count **
;** [bp+10]= slave adress **
;** Return Codes in AX register: **
;** XX00 = Transmisiion completed without error **
;** XX01 = Bus unavailable **
;** XX02 = Addressed slave not responding **
;** nn04 = Addressed slave aborted during xfer **
;** (nn= number of bytes transferred before **
;** transfer aborted) **
;** XX08 = Arbitration loss (note 1) **
;** XX10 = Bus wait timeout **
;** XX20 = Illegal address **
;** XX40 = Illegal message count **
;** note 1: Arbitration loss requires that the **
;** I2C unit switch to slave receive **
;** mode. This is not implemented. **
;*****************************************************************

code segment public
 assume cs:code
i2c_xmit proc far
 mov bp, sp
 push ds
 push es
 test word ptr [bp].address,01H ; Check for illegal
 ; address (a READ).
 jz addr_ok

 mov ax, CODE_ILLEGAL_ADDR ; Illegal addr
 pop es
 pop ds
 ret 8 ; Tear down stack frame
addr_ok:
 mov cx, [bp].count ; Get message length.
 cmp cx, MAXIMUM_MESSAGE_LEN
 jle message_len_ok ; Message is 256 or less
 ; characters.
 mov ax, CODE_MSG_LEN ; Bad length return code.
 pop es
 pop ds
 ret 8
message_len_ok:
 mov si, [bp].buffer_offset ; Get message offset.
 mov ax, [bp].buffer_segment ; Get message segment
 mov ds, ax ; and put in DS.
 ; Test for I2C bus free condition.
 ; SCL and SDA must be high at least 4.7uS
 mov cx, BUS_FREE_MIN ; initialize free time counter.

 ; The following loop takes 48 clocks while cx>1 and 33 clocks
 ; on the last iteration. To insure that bus is free, samples
 ; of bus must span at least 4.7uS. At 16Mhz: 48*(62.5ns)=3uS
 ; The first sample is at 0us, the second at 3us, and the
 ; third will be at 6. Although this exceeds the 4.7us
 ; spec, it is better safe than sorry.
bus_free_wait:
 %Check_For_Bus_Free
 jz i2c_bus_free
 ; At this point the bus is not available.
 mov ax, 01H ; 01= return code for
 pop es ; a busy bus.
 pop ds
 ret 8 ; return and tear down
 ; stack frame.
i2c_bus_free: loop bus_free_wait ; bus may be free but wait
 ; the 4.7uS required!
 ; I2C bus is available, generate a START condition
 %Drive_SDA_Low
 %Wait_4_7_uS
 mov ax, [bp].address
 xchg ah, al ; ah = address
next_byte: mov di, 8 ; set up bit counter
next_bit: %Drive_SCL_Low
 %Wait_Half_Bit_Time
 mov bl, ah ; get current data
 and bl, 080H ; strip MSB
 mov dx, P2LTCH
 in al, dx
 and al, 7fh
 or al, bl ; set bit 7 to reflect
 ; data bit
 out dx, al ; xmit data bit
 %Wait_Half_Bit_Time
 %Release_SCL_High
 %Wait_For_SCL_To_Go_High

 ; At this point SCL is high so if there is another master

 ; attempting to gain the bus, it's data would be valid here.
 ; We need only check when our data is "1"...

 test bl, 80H ; Is data a "1"?
 jz won_arbitration ; If not -> don't check arbitration.

 mov dx, P2PIN
 in al, dx
 test al, 80H ; Is SDA high?
 jnz won_arbitration
 jmp lost_arbitration ; If SDA != 1 then we lost
 ; arbitration....
won_arbitration:
 %Wait_SCL_High_Time
 shl ah, 1 ; shift current byte
 dec di ; tick down bit counter
 jne next_bit ; continue bits
; a byte has been completed. Time to get an ACKNOWLEDGE.
 %Drive_SCL_Low
 %Wait_Half_Bit_Time
 %Release_SDA_High
 %Wait_Half_Bit_Time
 %Release_SCL_High
 %Wait_For_SCL_TO_Go_High
 ; SCL is now high. We must loop while checking SDA for 4.7us.
 ; With a count of 3 we have a delay of 89 clocks (5.5uS). This
 ; could be find tuned with NOPs when performance is critical.
 mov cx, 3
check_4_ack:
 %Get_SDA_Bit ; Is SDA a "0"
 jnz abort_no_ack ; if so -> abort
 loop check_4_ack
; if we've gotten to here, then an acknowledge was received.
 mov ah, byte ptr [si]
 inc si ; point to next byte
 dec word ptr [bp].count ; dec string counter
 js xfer_done
 jmp next_byte
; END OF MESSAGE: Issue a STOP condition
xfer_done:
 mov di, 0 ; Normal completion code.
 jmp i2c_bus_stop
abort_no_ack:
 cmp si, [bp].buffer_offset ; Check if this is the
 je slave_did_not_respond ; first byte (the address ).
 mov di, 4H ; Abort during xfer code.
 jmp i2c_bus_stop
slave_did_not_respond:
 mov di, 02H ;
i2c_bus_stop:
 %Drive_SCL_Low
 %Wait_Half_Bit_Time
 %Drive_SDA_Low
 %Wait_4_7_uS
 %Release_SCL_High
 %Wait_For_SCL_To_Go_High
 %Wait_4_7_uS
 %Release_SDA_High
 %Wait_For_SDA_To_Go_High


 mov ax, di
 pop es
 pop ds
 ret 8 ; Return and tear
 ; down stack frame.
lost_arbitration:
 mov dx, P2LTCH
 in al, dx ; Release SDA and SCL
 or al, 0C0H
 out dx, al
 mov ax, 08H ; Lost arbitration code.
 pop es
 pop ds
 ret 8
i2c_xmit endp
code ends
end






[LISTING TWO]

$pagelength (30)
$mod186
$debug
$xref

NAME i2c_receive;

$include (/include/pcp_io.inc)

PUBLIC i2c_recv

;****** EQUates ******
BUS_FREE_MIN EQU 1H ; Loop counter for free bus delay.
MAXLEN EQU 255

;****** STACK FRAME STRUCTURE ******
stack_frame STRUC
ret_ip DW ?
ret_cs DW ?
buffer_offset DW ?
buffer_segment DW ?
count DW ?
address DW ?
stack_frame ENDS

%*DEFINE(Drive_SCL_Low)(
 mov dx, P2LTCH
 in al, dx
 and al, 10111111B ; SCL is bit 6
 out dx, al
 )
%*DEFINE(Release_SCL_High)(
 mov dx, P2LTCH

 in al, dx
 or al, 01000000B
 out dx, al
 )
%*DEFINE(Drive_SDA_Low)(
 mov dx, P2LTCH
 in al, dx
 and al, 01111111B ; SDA is bit 6
 out dx, al
 )
%*DEFINE(Release_SDA_High)(
 mov dx, P2LTCH
 in al, dx
 or al, 10000000B
 out dx, al
 )
%*DEFINE(Wait_4_7_uS)(
 mov cx, 5
 loop $
 nop
 nop
 )
%*DEFINE(Wait_Half_Bit_Time)(
 mov cx, 3
 loop $
 )
%*DEFINE(Wait_SCL_Low_Time)(
 mov cx, 5
 loop $
 nop
 nop
 )
%*DEFINE(Wait_SCL_High_Time)(
 mov cx, 5
 loop $
 nop
 nop
 )
%*DEFINE(Wait_For_SCL_To_Go_Low)LOCAL wait(
 mov dx, P2PIN
%wait: in al, dx
 test al, 01000000B
 jne %wait
 )
%*DEFINE(Wait_For_SCL_To_Go_High)LOCAL wait(
 mov dx, P2PIN
%wait: in al, dx
 test al, 01000000B
 je %wait
%*DEFINE(Wait_For_SDA_To_Go_High)LOCAL wait(
 mov dx, P2PIN
%wait: in al, dx
 test al, 10000000B
 je %wait
 )
 )
%*DEFINE(Get_SDA_Bit)(
 mov dx, P2PIN
 in al, dx

 and al, 0080H
 )
%*DEFINE(Check_For_Bus_Free)(
 mov dx, P2PIN
 in al, dx
 mov bl, 0C0H ; Mask for SCL and SDA.
 and al, bl ; If SCL and SDA are high
 xor al, bl ; this sequence will leave
 ) ; a zero in AX.
code segment public
 assume cs:code

i2c_recv proc far

 ; The LSB of the address for a READ always has a "1" in the LSB.
 ; The first step is to check for a legal address....
 mov bp, sp
 push ds
 push es
 test word ptr [bp].address,01H ; Check for illegal
 ; address (an XMIT).
 jnz addr_ok
 ; The address passed was for a transmit (WRITE). This is
 ; illegal in this procedure....
 mov ax, 20H ; Illegal addr
 pop es
 pop ds
 ret 8 ; Tear down stack frame
addr_ok:
 cmp word ptr [bp].count, MAXLEN
 jg message_wrong_len
 cmp word ptr [bp].count, 1 ; check message length
 jge len_ok
message_wrong_len:
 mov ax, 40H ; error code
 pop es
 pop ds
 ret 8 ; tear down frame
len_ok:
 ; Test for I2C bus free condition.
 ; SCL and SDA must be high at least 4.7uS
 mov cx, BUS_FREE_MIN ; initialize free time counter.
; Following loop takes 48 clocks while cx>1 and 33 clocks on last iteration.
; To insure that bus is free, samples of bus must span at least 4.7uS. At
16Mhz
; 48*(62.5ns)= 3uS. First sample is at 0us, second at 3us, and third will be
at
; 6. Although this exceeds 4.7us spec, it is better safe than sorry.
bus_free_wait:
 %Check_For_Bus_Free
 jz i2c_bus_free
 ; At this point the bus is not available.
 mov ax, 01H ; 01= return code for
 pop es ; a busy bus.
 pop ds
 ret 8 ; return and tear down stack frame.
i2c_bus_free: loop bus_free_wait ; bus may be free but wait 4.7uS required
 ; I2C bus is available, generate a START condition
 %Drive_SDA_Low
 %Wait_4_7_uS
 ; A receive begins with transmission of the ADDRESS

 mov di, 8 ; set up bit counter
next_bit:
 %Drive_SCL_Low
 %Wait_Half_Bit_Time
 mov bx, [bp].address
 and bl, 080H ; strip MSB
 mov dx, P2LTCH
 in al, dx
 and al, 7fh
 or al, bl ; set bit 7 to reflect data bit
 out dx, al ; xmit data bit
 sal [bp].address,1 ; shift current byte
 %Wait_Half_Bit_Time
 %Release_SCL_High
 %Wait_For_SCL_To_Go_High
 ; At this point SCL is high so if there is another master
 ; attempting to gain the bus, it's data would be valid here.
 ; We need only check when our data is a "1"...
 test bl, 10000000B ; Is data a "1"?
 je won_arbitration ; If not -> don't check arbitration.
 mov dx, P2PIN
 in al, dx
 test al, 10000000B ; Is SDA high?
 jnz won_arbitration
 jmp lost_arbitration
won_arbitration:
 %Wait_4_7_uS ; count off high time.
 dec di ; tick down bit counter
 jne next_bit ; continue bits
; The address has been completed. Time to get an ACKNOWLEDGE.
 %Drive_SCL_Low
 %Wait_Half_Bit_Time
 %Release_SDA_High
 %Wait_Half_Bit_Time
 %Release_SCL_High
; Here we are expecting to see an acknowledge from addressed slave receiver:
 %Wait_For_SCL_To_Go_High ; a wait state
 mov cx, 3
check_4_ack:
 mov dx, P2PIN
 in al, dx ; get SDA value
 and al, 10000000B ; is it high?
 jnz abort_no_ack ; if so -> abort
 nop
 nop
 nop ; NOPs for timing at 16Mhz
 loop check_4_ack
; if we've gotten to here, then an acknowledge was received.
; At this point in the code, slave receiver has acknowledged
; receipt of its address. SCL has just been driven low, SDA is floating.
 jmp start_recv
abort_no_ack:
 %Drive_SCL_Low
 mov di, 02H ; Code for unresponsive slave.
 jmp i2c_bus_stop
; Now the master transmitter switches to master receiver....
start_recv:
 mov di, [bp].buffer_offset
 mov ax, [bp].buffer_segment

 mov es, ax
next_byte_r: mov bx, 0
 mov si, 8
next_bit_r:
 %Drive_SCL_Low
 %Wait_4_7_uS
 %Release_SCL_High
 %Wait_For_SCL_To_Go_High
 %Get_SDA_Bit
 shr al, 7 ; move SDA value to LSB
 or bl, al ; drop in lsb of bl
 %Wait_4_7_uS
 dec si ; tick down bit counter
 je byte_Recv_comp ; continue bits
 shl bl, 1 ; shift bl for next bit
 jmp next_bit_r
; The word has been completed. Time to send an ACKNOWLEDGE.
byte_Recv_comp:
 mov al, bl
 stosb
 %Drive_SCL_Low
 %Wait_Half_Bit_Time
; Here we need to decide whether or not to transmit an acknowledge. If this is
; last byte required from slave, we do not send an ack; otherwise we do....
 dec [bp].count ; decrement the message count
 cmp [bp].count, 0
 jne send_ack
 %Release_SDA_High
 jmp do_ack
send_ack: %Drive_SDA_Low

do_ack:
 %Wait_Half_Bit_Time
 %Release_SCL_High
 %Wait_For_SCL_To_Go_High
 %Wait_4_7_uS
 %Drive_SCL_Low
 %Wait_Half_Bit_Time
 %Release_SDA_High
 cmp [bp].count, 0
 je recv_done
 jmp next_byte_r
recv_done: mov di, 00

i2c_bus_stop:
 %Wait_Half_Bit_Time
 %Drive_SDA_Low
 %Wait_4_7_uS
 %Release_SCL_High
 %Wait_For_SCL_To_Go_High
 %Wait_4_7_uS
 %Release_SDA_High
 %Wait_For_SDA_To_Go_High
 mov ax, di
 pop es
 pop ds
 ret 8 ; Return and tear down stack frame.
lost_arbitration:
 mov dx, P2LTCH

 in al, dx ; Release SDA and SCL
 or al, 0C0H
 out dx, al
 pop es
 pop ds
 ret 8
i2c_recv endp
code ends
end






[LISTING THREE]

$mod186
$debug
$xref

$include (\include\pcp_io.inc) ; a file of EQUates for 186EB register names

NAME i2c_example
EXTRN i2c_recv:far, i2c_xmit:far

%*DEFINE(XMIT(ADDR,COUNT,MESSAGE))(
 push %ADDR
 push %COUNT
 push seg %MESSAGE
 push offset %MESSAGE
 call i2c_xmit
 )
%*DEFINE(RECV(ADDR,COUNT,BUFFER))(
 push %ADDR
 push %COUNT
 push seg %BUFFER
 push offset %BUFFER
 call i2c_recv
 )
stack segment stack
 DW 20 DUP (?)
 t_o_s DW 0
stack ends
data segment para public 'RAM'
bus_msg db 00h,77h,01h,02h,04h,08h ; the LED I2C message
recv_buff db 255 dup(?)
data ends
usr_code segment para 'RAM'
 assume cs:usr_code
start: mov ax, data ; data segment init
 mov ds, ax
 cli
 assume ds:data
 mov ax, stack ; set up stack
 mov ss, ax
 assume ss:stack
 mov sp, offset t_o_s
 mov dx, P2DIR ; set up open-drain

 in ax, dx ; port pins on 186EB
 and ax, 3FH
 out dx, ax
 mov dx, P2CON
 in ax, dx
 and ax, 03FH
 out dx, ax
 ; The I2C address of the LED driver is 70H for a transmit.
 %XMIT(70H,6,bus_msg) ; send "bus" message
 ; The address for the clock is 0xA3 for a receive.
 %RECV(0A3H,15,recv_buff) ; read first 15 bytes in clock chip.
usr_code ends
 end start





Example 1: (a) 80C186 implementation of 4.7uS wait macro; (b) 80960CA
implementation of 4.7uS wait macro.


(a)


%*DEFINE(Wait_4_7_uS)(
 mov cx, 5 ; 4 clocks
 loop $ ; 4*15+5 = 65 clocks
 nop ; 3 clocks
 nop ; 3 clocks
 ; total = 75 clocks
 ; 75 * 62.5ns = 4.69uS (close enough)
 )


(b)


define(Wait_4_7_uS,'

 lda 0x17, r4 # instruction may be issued in parallel
 # so assume no clocks.
0b: cmpdeco 0, r4 # compare and decrement counter in r4
 bne.t 0b # if !=0 branch back (predict taken
 # branch)
 #
 # The cmpdeco and bne.t together take 3
 # clocks in parallel minimum.
 #
 # 0x17 (25 decimal) * 3 = 75 clocks
 # at 16MHz this is 4.69











June, 1992
ACCESSING LARGE DATA ARRAYS WITH X-ARRAY


No DOS extender required




Barr E. Bauer


Barr uses high-performance computers to design pharmaceuticals for
Schering-Plough Research Institute. He can be reached at 60 Orange St.,
B-1-3-85, Bloomfield, NJ 07003.


The primary barrier to using contemporary 386-based PCs for tackling
large-data scientific and engineering problems is the artificial limitation
imposed by conventional memory. All other factors considered, they are far
faster, pack more memory and disk, and are substantially cheaper than standard
platforms like the early Sun workstations and the MicroVAX-II of just a few
years ago. Except for the lack of multitasking and virtual-memory support,
even DOS is not a major limitation. Yet, the anachronistic 640-Kbyte
conventional memory limit, a holdover from the original IBM-PC design,
effectively blocks their use for all but the smallest of problems. You can
access extended memory using a DOS extender, a PC version of UNIX, or Windows
as a DOS extender for Microsoft Fortran 5.0--or you can use libraries like
X-arRAY.
This article examines X-arRAY routines for handling megabyte-sized data
arrays. The X-arRAY package is a tiny (84 Kbytes) Microsoft Fortran
5.0-compatible library of subroutines that manage access to extended memory
and perform mathematical operations on data stored in arrays located within
either extended or conventional memory. As such, X-arRAY is actually a
combination of an extended-memory manager and a general-purpose
array-manipulation package that sets it apart from DOS extenders.


X-arRAY Memory Management


The first call to the X-arRAY memory-management routines places the program in
protected mode; the details of protected-mode operation are handled entirely
by X-arRAY. Extended-memory access is either through XMS via the Microsoft
HIMEM.SYS driver (preferred) or the modified LIM control block. HIMEM.SYS is
standard with DOS 5.0 or Windows 3.0, making it a convenient choice. X-arRAY
can use whichever memory manager is available or be forced to use a specific
manager.
The extended-memory management routines (see Table 1) operate in a manner
analogous to that of those used for C memory management: Memory blocks are
requested by size, referenced through a key that serves as a pointer to the
allocated memory, and freed. In contrast to memory management in C, getxtd
returns both an integer*4 handle and a modified integer*4 key associated with
the successfully allocated extended-memory block. The handle is used by relxtd
and endxtd to free the memory allocation. The key is the absolute address of
the first byte of the allocated memory block, with bits 30 and 31 set to mark
it as a legitimate key referencing extended memory. All of the library
routines use this key to access and manipulate extended memory. The key itself
behaves like a pointer and can be conveniently manipulated by address
arithmetic. The maximum allocation is 1 Gbyte, which s ould be enough for most
applications. (If you really need huge amounts of memory, you ought to
seriously consider relocating your application to a more appropriate
computer.)
Table 1: X-arRAY extended-memory access routines.

 Routine Description
 ------------------------------------------------------------

 getxtd Allocate blocks of available extended memory
 bufxtd Allocate memory in memory-mapped hardware
 inqxtd Report status of extended memory allocations
 relxtd Free a single allocation
 endxtd Free all allocations
 rzmxtd Restore linkage to existing allocation(s)
 a2axtd Array-to-array copy
 a2fxtd Extended-memory allocation to file copy
 f2axtd File to extended-memory allocation copy
 sgtrnm Get a real*4 from extended memory
 sgtcnm Get a complex*8 from extended memory
 igt[1/2/4]im Get an integer*[1/2/4] from extended memory
 sptrnm Put a real*4 into extended memory
 sptcnm Put a complex*8 into extended memory
 ipt[1/2/4]im Put an integer*[1/2/4] into extended memory
 flashr Flash extended-memory access on console

 *[1/2/4] means either 1,2, or 4 at that position in the name
 corresponding to the variable type employed.

Allocation size can be specified by indicating the array dimensionality, width
of each dimension (passed as an array), and the size of the variable in bytes.
Alternatively, you can simply specify the total number of bytes desired. For
example, the two getxtd calls in Figure 1 are equivalent. Both allocate enough
extended memory for a 512x512 array of real*4 variables. The actual allocated
memory is structureless--that is, not associated with any array dimensionality
or variable type. Structure and variable types are imposed by the manipulation
routines that themselves can use either mode to address specific array
elements or subarrays in the allocated block. This turns out to be very handy
(and makes accessing extended memory straightforward) when retention of array
addressing is important. Also, the array can be manipulated in portions using
address arithmetic.
Figure 1: Equivalent calls using getxd.

 call getxtd(0,0,1048576,0,ihandle,key,kbytes,iret,ier)

 and

 iwidth(1) = 512
 iwidth(2) = 512

 call getxtd(2,iwidth,4,0,ihandle,key,kbytes,iret,ier)

Unlike C, program termination does not automatically deallocate
extended-memory blocks. In fact, allocated memory blocks persist intact,
including their data, until deallocated by another program or machine reboot.
Memory allocations are under the control of the XMS or LIM memory manager,
which is external to the program. endxtd provides convenient end-of-program
allocation cleanup and ensures that all blocks are freed; see Listing Five and
Listing Six (page 114).
The persistence of extended-memory allocations beyond program termination can
be used to advantage. rzmxtd reestablishes the linkage to extended memory
previously allocated by an earlier program. rzmxtd uses a snapshot of the
active handles and keys (provided by inqxtd) passed between the programs in a
binary file. inqxtd also determines free and allocated memory, memory
management in use, and other useful data. Although I do not have a specific
example of this, I can envision a large-data/large-code Fortran program broken
into smaller modules that each operate on the data passed between the modules
in extended memory.
Routines are provided to shuttle data between extended memory and conventional
memory, either as blocks or as individual variables (Table 1). The block-copy
routine a2axtd determines the data type by its size in bytes, while the
individual element routines are specific to the variable types. Routines are
also provided to copy data between extended memory and binary files.
a2axtd uses extended-memory keys and/or conventional memory array names to
specify source and destination, thus requiring the MS-Fortran interface to
directive in order to pass keys by value and to properly declare real and
complex arrays. The multiple contexts for a2axtd in a program that shuttles
blocks of data between extended and conventional memory created a problem that
was solved by interfacing a2axtd twice. The first version of a2axtd was
interfaced at the top of the example for copying data from extended memory
(via a key) into a real*4 array in conventional memory. The second version of
a2axtd was aliased by the subroutine putback in a separate source file (see
Listing Four, page 113) and interfaced to copy from a real*4 array in
conventional memory to extended memory pointed to by the key. Yes, Fortran has
no alias (but should), so putback merely passes its arguments through to the
different versions of a2axtd. When you see putback in the examples, think
a2axtd.
Finally, data stored in extended memory can be manipulated in extended memory
using a number of unary and binary routines (Table 2). The routines ssmrnm
(array scaling) and smprnm (element-by-element product of two arrays) are used
in Listing One (page 112). Note that the binary array product is not the
normal array product. Each routine operates on specific variable types
currently limited to integer*1, integer*2, integer*4, real*4 and complex*8. Of
interest to those who do fast fourier transformations, for which X-arRAY is
finely tuned, are access routines to handle floating-point numbers in a
decimated form and to manipulate the bits of array elements. As with a2axtd,
the keys must be passed by value necessitating the use of the interface to
directive.
Table 2: Extended-memory data-manipulation array routines.

 Routine Description
 ----------------------------------------------------------------------

 sabcnm Absolute value of a complex*8 array
 scjcnm Conjugate a complex*8 array
 szicnm Zero the imaginary part of a complex*8 array
 szrcnm Zero the real part of a complex*8 array
 sngrnm Negate a real*4 array
 sngcnm Negate a complex*8 array
 ssmrnm Scalar multiply a real*4 array
 ssmcnm Scalar multiply a complex*8 array
 ism[1/2/4]sm Scalar multiply a signed integer*[1/2/4] array
 ism[1/2/4]um Scalar multiply an unsigned integer*[1/2/4] array
 imn[1/2/4]sm Location and value of min element of signed
 integer*[1/2/4] array
 imn[1/2/4]um Location and value of min element of unsigned
 integer*[1/2/4] array
 imx[1/2/4]sm Location and value of max element of signed
 integer*[1/2/4] array
 imx[1/2/4]um Location and value of max element of unsigned
 integer*[1/2/4] array
 sadrnm Element-by-element sum of real*4 arrays
 sadcnm Element-by-element sum of complex*8 arrays
 iad[1/2/4]im Element-by-element sum of integer*[1/2/4] arrays
 smprnm Element-by-element product of real*4 arrays
 smpcnm Element-by-element product of complex*8 arrays
 imp[1/2/4]sm Element-by-element product of signed
 integer*[1/2/4] arrays
 imp[1/2/4]um Element-by-element product of unsigned
 integer*[1/2/4] arrays
 ssbrnm Element-by-element difference of real*4 arrays
 ssbcnm Element-by-element difference of complex*8 arrays
 isb[1/2/4]im Element-by-element difference of integer*[1/2/4]
 arrays
 sflcnmp Product of dissimilar complex*8 arrays
 iln[1/2/4]sm Arbitrary linear combination of signed integer*[1/2/4]
 arrays
 iln[1/2/4]um Arbitrary linear combination of unsigned integer*[1/2/4]
 arrays

 *[1/2/4] means either 1,2, or 4 at that position in the name
 corresponding to the variable type employed.



Extended-memory Strategy


X-arRAY arrays located in extended memory are not arrays from a conventional
Fortran-array point of view. The elements are stored in extended memory
structured like an array, but cannot be manipulated except through the
supplied access routines. One approach might be to replace all array element
references with sgtrnm and sptrnm calls in your algorithm to shuttle element
values into conventional memory for processing. Although this preserves
algorithm structure, data stored in multidimensional arrays is generally
accessed by nested loops, in which array-element access occurs in the
innermost loop, and large arrays (the reason for using extended memory) will
often have many iterations. The result of the overhead associated with the
repeated sgtrnm or sptrnm calls on performance is cumulative and lethal.
The strategy shifts to moving blocks of array elements between extended and
conventional memory. This dramatically diminishes the overhead, even though
the block move done with a2axtd itself takes longer to complete. Because
Fortran stores data in column-major order, the ideal unit of movement is a
column vector. A 512x512 array in extended memory is read into conventional
memory with 512 calls to a2axtd, each moving the nth column vector (,n) of 512
elements, rather than 262,144 calls to sgtrnm. The temporary array receiving
the column vector is small enough to not tax the available conventional
memory, but the use of a temporary array and pieces of the total array will
force an algorithm change that might have to be made anyway for data arrays
exceeding the size of conventional memory. Vector supercomputers use this same
scheme to boost performance, the difference being that column-vector movement
is from conventional memory into an array of special CPU registers. The
savings, however, still accrue from moving groups rather than individual
elements.

The block-move strategy implements smoothly using the X-arRAY primitives. The
2-D summation in Figure 2(a) becomes that shown in Figure 2(b). The extended
memory can be conveniently and temporarily redimensioned from the viewpoint of
a2axtd to access 1-D arrays of 512 real*4 elements. The address arithmetic is
analogous to that routinely done in C--key1 points to the start of the next
column vector to be accessed by the loop. This is perfectly legal as long as
key1 points to a legitimate extended-memory allocation and the requested block
resides within the allocation; otherwise, a2axtd reports an error.
Figure 2: (a) Summing a two-dimensional array; (b) using the block-move
strategy to sum a two-dimensional array.

 (a)

 sum = 0.0
 do i = 1,512
 do j = 1,512
 sum = sum + arr(i,j)
 enddo
 enddo

 (b)

 iwidth(1) = 512
 iwidth(2) = 512 ! declared as a 2D array
 call getxtd(2, iwidth,4,ihandle,key,kbret,iret,ier)
 :
 sum = 0.0
 key1 = key ! used for address arithmetic
 ichunk = 4 * 512 ! size of 512 real*4 elements
 do i = 1,512 ! loop over column vectors
 call a2axtd (1,512,4,key1,temp,iret,ier) ! bring in as 1D
 do j = 1,512 ! loop down temp array doing sum
 sum = sum + temp (j)
 enddo
 key1 = key1 + ichunk ! advance to the next column vec
 enddo

Listing Two (page 112) tests this by performing the same summation twice,
first by column-vector moves and second by individual-element accesses. The
results are dramatic. The column-vector step processes the 1-Mbyte array in
3.16 seconds and produces sum=3.436025E + 10. The individual-element access
pass done in row order such that the second index was associated with the
inner loop and accesses were to noncontiguous array elements requires 126.4
seconds and produces sum=3.434290E + 10. These results are from a 16-MHz
386/387SX computer. Clearly, the column-vector approach works well with only a
small restructuring of the algorithm.
The different sums produced are normal for floating-point calculations, but
are also a concern. The difference is due to different cumulative round-off
errors that are the result of elements being summed in a different order.
Reverse the indexes in Listing Two into column order for the
individual-element summation and it gives an answer identical to the
column-vector version. Note that we are not talking about a correct or pure
answer; the reality of floating-point calculations is that they have an
unavoidable round-off error that manifests differently, depending on the order
of calculations. If you need the same answer independent of method, be sure to
process the array elements in column order to produce the same round-off
error. Column ordering in arrays is, in my opinion, a flaw in Fortran (or the
teaching of Fortran) because most programmers write multidimensional arrays
with the index order following loop nesting; see Figure 3.
Figure 3: A multidimensional array with the index order following loop
nesting.

 do outer = 1,n
 do inner = 1,n
 sum = sum + a(outer,inner)
 enddo
 enddo

The above discussion does not address array elements stored contiguously in
memory. For maximum performance, array indexing should be a(inner, outer). The
inner loop references, contiguous array elements stored in memory, and the
outer references the column vector. This facilitates easy conversion to the
column-vector transfer strategy discussed here. It also makes vectorization
and parallelization possible, but that is a story for another day.
A triply nested lower triangular array (see Listing Three, page113) in which
the inner-loop bounds depend on the current value of an outer-loop index
presents a challenge. Although only one array is used, two column vectors are
manipulated, and the number of elements used in the column vector varies. The
strategy is similar to that in Listing Two. Two column vectors (,k) and (,j)
must be moved into their corresponding temporary arrays and processed. Then
the (,j) column vector is put back into its original place in the array in
extended memory. This is shown schematically in Figure 4 and completely in
Listing Three.
Figure 4: Two column vectors (,k) and (,j) must be moved into their
corresponding temporary arrays and processed. Then the (,j) column vector is
put back into its original place in the array in extended memory.

 keyj = key
 keyk = key ! both temporary pointers point to the same array
 do j = 1,512
 ! get column vector (,j) pointed to by keyj into arrj ()
 do k = 1,j-1
 ! get column vector (,k) pointed to by keyk into arrk()
 do i = k+1, 512
 arrj(i) = arrj(i) + arrk(i) *arrj(k)
 enddo
 ! increment keyk to next column vector
 enddo
 ! put arrj() back into extended memory pointed to by keyj
 ! increment keyj to next column vector
 enddo

The address arithmetic is kept simple by copying entire column vectors, even
though only part of a vector may be used for any given iteration. Improved
performance might be eeked out by moving only the required portion of the
column vector but at the price of more overhead from the additional address
arithmetic. Listing Three runs as expected, steadily slowing as the simulation
proceeds, but still completing within 13 minutes. Note that the basic
algorithm structure was not mangled beyond recognition.

The shuttling of array blocks into conventional memory for processing breaks
down when the algorithm is fatally row oriented, as in the case of an array
inversion using Gaussian elimination. I was interested in a megabyte-sized
array-inversion routine for reconstructing 2-D stereo graphics projections
into 3-D, as an example. The inverter I created was sadly too slow, due to the
large amount of single-element shuttling to and from extended memory. The
basic algorithm also became unrecognizable. When this happens, the best bet is
to use a DOS extender, in my case, the Windows version of Microsoft Fortran
5.1; X-arRAY manipulation of extended memory should be targeted at contiguous
array elements for the best performance, as demonstrated in Listing Two.


Manipulating Data in Extended Memory


Clearly, shuttling portions of a megabyte-sized array in and out of
conventional memory for processing is feasible, even efficient. It is far more
desirable to manipulate the data directly in extended memory wherever
possible. Consider a case in which a megabyte-sized array is duplicated in
extended memory, all members of the duplicate array are multiplied by a scale
factor, and then the two arrays are multiplied element-by-element with the
results placed into the third array. This was done in Listing One with the
added wrinkle that the array copy was done by copying the source array from
extended memory directly to a binary file, and then reading the file directly
into the newly allocated destination in extended memory. I also used inqxtd to
assess available extended memory and determine which extended-memory manager
was active at the start of the example. All phases of the resulting program
were quick: one to three seconds, even on my relatively slow 386SX.


Conclusion


Frankly, the ability to access extended memory from within a DOS program free
of DOS extenders was refreshing. Compared to the Windows extensions to
Microsoft Fortran, X-arRAY addresses more extended memory, memory can be
managed in a manner familiar to C programmers, and the resulting programs run
faster and are independent of Windows. I liked the performance delivered by
X-arRAY even though effort was required to incorporate the extended-memory
routines into programs. That effort will often lead to optimizations that
might otherwise be overlooked. What I would like to see in future versions of
the X-arRAY library is an expanded list of array primitives such as a
true-array product, determinant, array inverter, swap elements or columns, and
fill with value; all of course supporting all Fortran data types. I would even
like to see this functionality in a C-language library.
Incorporation of X-arRAY into applications will depend on the application. I
have found that programs ultimately intended for UNIX computers can be
successfully developed and tested with their full-sized (multimegabyte) arrays
using the Windows version of Microsoft Fortran. Performance is not great, but
that is not the point of cross-platform program development. On the other
hand, a large-memory, array-based Fortran application undergoing a one-way
port onto a DOS-based PC will benefit from incorporation of X-arRAY routines.


Products Mentioned


X-arRAY 1.0 Release 2 Davis Associates Inc. 43 Holden Road West Newton, MA
02165 617-244-1450 $99.00 Minimum requirements: 80386 with 387 math
coprocessor; MS-DOS 2.0 or higher; Microsoft Fortran 5.0

_ACCESSING LARGE ARRAYS WITH X-ARRAY_
by Barr E. Bauer



[LISTING ONE]

* Extended memory manipulation using X-arRAY Fortran Library.
* Does the following: 1. allocates a 1 Mbyte real*4 array a(512,512); 2. loads
* array a with real*4 values; 3. saves the data in array a to disk;
* 4. allocates two 1 Mbyte real*4 arrays b and c; 5. loads data from file
* (step 3) into array b; 6. scales all members of array b by 5.0; 7. does an
* element-by-element array multiplication of arrays a and b, results into
* array c; 8. sums all members of array c, reports results.
* Compile with Microsoft Fortran 5.1 using:
* fl /FPi87 /G2 example1.for putback.for bagit.for /link xarray
* B. E. Bauer 3/20/92

 interface to subroutine a2axtd(i1,i2,i3,i4[VALUE],r1,i5,i6)
 integer*4 i1,i2,i3,i4,i5
 integer*2 i6
 real*4 r1
 end

 interface to subroutine sgtrnm(i1,i2,i3[VALUE],i4,r1,i5)
 integer*4 i1,i2,i3,i4
 integer*2 i5
 real*4 r1
 end

 interface to subroutine sptrnm(i1,i2,i3[VALUE],i4,r1,i5)
 integer*4 i1,i2,i3,i4
 integer*2 i5
 real*4 r1
 end

 interface to subroutine smprnm(i1,i2,i3[VALUE],i4[VALUE],
 + i5[VALUE],i6)

 integer*4 i1,i2,i3,i4,i5
 integer*2 i6
 end

 interface to subroutine ssmrnm(i1,i2,i3[VALUE],r1,i4)
 integer*4 i1,i2,i3
 real*4 r1
 integer*2 i4
 end

 include 'bagit.inc' ! error codes and other symbols
 integer*4 kb_total, kb_unallocated, number_allocations
 integer*4 memory_manager, required_memory, shortage
 integer*4 handle_array(1), key_array(1)
 integer*4 ARRAY_SIZE(ARRAY_DIM), allocated_array(1)

 integer*4 handle, key, key1, kb_allocated
 integer*4 bytes_moved, increment
 integer*4 keyb, keyc, handleb, handlec
 real*4 temp, a(SIZE)
 integer*2 return_status, eflag
 character*13 tempfile
 data tempfile /'tempfile.dat'C/ ! C string format
 data ARRAY_SIZE / SIZE, SIZE /

* enable extended memory routine flashing
 call flashr(ON,LOWER_RIGHT,eflag)
 if (eflag .ne. 0) call bagit(FLASHR_ERROR)
 required_memory = 3*SIZE*SIZE*REAL4/1024 ! need 3 Mbytes
* determine status of extended memory
 call inqxtd(kb_total, kb_unallocated, number_allocations,
 + memory_manager, handle_array, key_array,
 + allocated_array, return_status, eflag)
 if (eflag .ne. 0) call bagit(INQXTD_ERROR)
 if ((memory_manager .eq. 0) .or.
 + (memory_manager .gt. 2)) then
 call bagit(WRONG_MMANAGER)
 else if (memory_manager .eq. 1) then
 print *,'XMS in use'
 else
 print *,'Modified LIM in use'
 endif
 print *,'Extended memory available ',kb_unallocated,' kb'
 if (kb_unallocated .lt. required_memory) then
 shortage = required_memory - kb_unallocated
 print *,'insufficient memory, need',shortage,'kb'
 call bagit(STOPPING)
 endif
* enough memory present, allocate memory for 1st array
 print *,'just ahead of memory allocation'
 ! allocate a 2D array of real*4 dimensioned 512 by 512
 call getxtd(ARRAY_DIM,ARRAY_SIZE,REAL4,XMS,handle,key,
 1 kb_allocated,return_status, eflag)
 if (eflag .ne. 0) call bagit(GETXTD_ERROR)
* load extended memory array (X,Y) with 1.0 using column vector approach
 print *,'at loading stage'
 key1 = key
 temp = 0.0
 increment = SIZE*REAL4

 do j = 1,SIZE
 do k = 1,SIZE
 a(k) = 1.0 ! fills the 1D array with values
 enddo
 ! move the 1D into extended memory by columns
 ! putback is a2axtd interfaced for
 ! conventional -> extended memory transfers
 call putback(1,SIZE,REAL4,a,key1,bytes_moved,eflag)
 if (eflag .ne. 0) call bagit(PUTBACK_ERROR)
 if (bytes_moved .ne. increment) then
 call bagit(PUTBACK_BADCNT)
 endif
 key1 = key1 + increment
 enddo
* save a copy of this array to disk
 print *,'saving array to file'
 call a2fxtd(ARRAY_DIM,ARRAY_SIZE,REAL4,tempfile,key,
 + ibytes_moved,eflag)
 if (ibytes_moved.ne.SIZE*SIZE*REAL4) then
 call bagit(A2FXTD_BADCNT)
 endif
 if (eflag.ne.0) call bagit(A2FXTD_ERROR)
* allocate extended memory for arrays b and c
 call getxtd(ARRAY_DIM,ARRAY_SIZE,REAL4,XMS,handleb,keyb,
 + kb_allocated,return_status, eflag)
 if (eflag .ne. 0) call bagit(GETXTD_ERROR)
 call getxtd(ARRAY_DIM,ARRAY_SIZE,REAL4,XMS,handlec,keyc,
 + kb_allocated,return_status, eflag)
 if (eflag .ne. 0) call bagit(GETXTD_ERROR)
* read file into extended memory for array b
 print *,'reading tempfile'
 call f2axtd(ARRAY_DIM,ARRAY_SIZE,REAL4,tempfile,keyb,
 1 ibytes_moved,eflag)
 if (eflag.ne.0) call bagit(F2AXTD_ERROR)
 if (ibytes_moved.ne.SIZE*SIZE*REAL4) then
 call bagit(F2AXTD_BADCNT)
 endif
* scale array b by 5.0
 print *,'scaling array b elements by 5.0'
 call ssmrnm(ARRAY_DIM,ARRAY_SIZE,keyb,5.0,eflag)
 if (eflag.ne.0) call bagit(SSMRNM_ERROR)
* element-by-element mult of a and b, results to c
 print *,'ahead of array multiplication'
 call smprnm(2,ARRAY_SIZE,key,keyb,keyc,eflag)
 if (eflag .ne. 0) call bagit(SMPRNM_ERROR)
* sum all elements of array c to check results by using column vectors to
* bring data from extended into conventional memory, where sum is performed.
 key1 = keyc
 temp = 0.0
 increment = SIZE*REAL4
 do j = 1,SIZE
 call a2axtd(1,SIZE,REAL4,key1,a,bytes_moved,eflag)
 if (eflag.ne.0) call bagit(A2AXTD_ERROR)
 if (bytes_moved.ne.increment) call bagit(A2AXTD_BADCNT)
 do i=1,SIZE
 temp = temp + a(i)
 enddo
 key1 = key1 + increment ! advance to next column vector
 enddo

 print *,'done, sum = ',temp,' (correct = 1310720.000000)'
* done, remove all allocations through ENDXTD in bagit
 call bagit(DONE)
 stop
 end





[LISTING TWO]

* Performs a sum reduction first using column vector moves then individual
* element accesses
* Compile with Microsoft Fortran 5.1
* fl /FPi87 /G2 example1.for putback.for bagit.for /link xarray
* B. E. Bauer 3/20/92
*
 interface to subroutine a2axtd(i1,i2,i3,i4[VALUE],r1,i5,i6)
 integer*4 i1,i2,i3,i4,i5
 integer*2 i6
 real*4 r1
 end

 interface to subroutine sgtrnm(i1,i2,i3[VALUE],i4,r1,i5)
 integer*4 i1,i2,i3,i4
 integer*2 i5
 real*4 r1
 end

 interface to subroutine sptrnm(i1,i2,i3[VALUE],i4,r1,i5)
 integer*4 i1,i2,i3,i4
 integer*2 i5
 real*4 r1
 end

 include 'bagit.inc'

 integer*4 kb_total, kb_unallocated, number_allocations
 integer*4 memory_manager, required_memory, shortage
 integer*4 handle_array(1), key_array(1), allocated_array(1)
 integer*4 ARRAY_SIZE(2)

 integer*4 handle, key, key1, kb_allocated, increment
 integer*4 bytes_moved, index(2), keyj

 real*4 temp, a(SIZE), arrj(SIZE)
 integer*2 return_status, eflag

 data ARRAY_SIZE / SIZE, SIZE / ! 2D 512x512 array used
* enable console flashing when extended memory is accessed
 call flashr(1,3,eflag)
 if (eflag .ne. 0) call bagit(FLASHR_ERROR)
 required_memory = SIZE*SIZE*REAL4/1024
* check for adequate XMS memory, quit if inadequate
 call inqxtd(kb_total, kb_unallocated, number_allocations,
 + memory_manager, handle_array, key_array,
 + allocated_array, return_status, eflag)
 if (eflag.ne.0) call bagit(INQXTD_ERROR)

 if (required_memory .gt. kb_unallocated) call bagit(NOT_ENOUGH)
* allocate a 512 by 512 array of real*4
 print *,'just ahead of memory allocation'
 call getxtd(2,ARRAY_SIZE,REAL4,XMS,handle,key,
 1 kb_allocated,return_status, eflag)
 if (eflag .ne. 0) call bagit(GETXTD_ERROR)
* load extended memory array (X,Y) using column vectors
 print *,'at loading stage'
 key1 = key
 temp = 0.0
 increment = SIZE*REAL4
 do j = 1,SIZE
 do k = 1,SIZE
 a(k) = float(k) + float(SIZE*(j-1))
 enddo
 call putback(1,SIZE,REAL4,a,key1,bytes_moved,eflag)
 if (eflag .ne. 0) call bagit(PUTBACK_ERROR)
 if (bytes_moved .ne. increment) then
 call bagit(PUTBACK_BADCNT)
 endif
 key1 = key1 + increment
 enddo
* column vector summation
 print *,'start column vector sum reduction'
 sum_col = 0.0
 chunk = SIZE*REAL4
 do j=1,SIZE
 keyj = key + chunk*(j-1) ! address arithmetic
 ! put (,j) into arrj
 call a2axtd(1,SIZE,REAL4,keyj,arrj,bytes_moved,eflag)
 if (eflag.ne.0) call bagit(A2AXTD_ERROR)
 if (bytes_moved.ne.chunk) call bagit(A2AXTD_BADCNT)
 do k=1,SIZE ! process the column vector
 sum_col = sum_col +arrj(k)
 enddo
 enddo
 print *,'done with column vector sum reduction'
* individual element access
 print *,'start individual access sum reduction'
 sum_ind = 0.0
 do i=1,SIZE
 do j=1,SIZE
 index(1)=i ! row of element
 index(2)=j ! column of element
 ! get the element into retval
 call sgtrnm(2,ARRAY_SIZE,key,index,retval,eflag)
 if (eflag.ne.0) call bagit(SGTRNM_ERROR)
 sum_ind = sum_ind + retval
 enddo
 enddo
 print *,'done with individual access sum reduction'
 print *,'column sum =',sum_col,', individual sum =',sum_ind
 call bagit(DONE)
 stop
 end








[LISTING THREE]

* Triangular array manipulation of a single 1 Mbyte real*4 array arr(512,512)
* using X-arRAY routines
* Does the following:
* do j=1,512
* do k = 1, j-1
* do i = k+1, 512
* arr(i,j) = arr(i,j) + arr(i,k) * arr(k,j)
* enddo
* enddo
* enddo
* Compile in Microsoft Fortran 5.1 using:
* fl /FPi87 /G2 example2.for putback.for bagit.for /link xarray
* B. E. Bauer 3/20/92
*
 interface to subroutine a2axtd(i1,i2,i3,i4[VALUE],r1,i5,i6)
 integer*4 i1,i2,i3,i4,i5
 integer*2 i6
 real*4 r1
 end

 interface to subroutine sgtrnm(i1,i2,i3[VALUE],i4,r1,i5)
 integer*4 i1,i2,i3,i4
 integer*2 i5
 real*4 r1
 end

 interface to subroutine sptrnm(i1,i2,i3[VALUE],i4,r1,i5)
 integer*4 i1,i2,i3,i4
 integer*2 i5
 real*4 r1
 end

 include 'bagit.inc'

 integer*4 kb_total, kb_unallocated, number_allocations
 integer*4 memory_manager, required_memory
 integer*4 handle_array(1), key_array(1), allocated_array(1)
 integer*4 ARRAY_SIZE(ARRAY_DIM)

 integer*4 handle, key, key1, kb_allocated, increment
 integer*4 bytes_moved, index(2), keyj, keyk

 real*4 temp, a(SIZE), arrj(SIZE), arrk(SIZE)
 integer*2 return_status, eflag

 data ARRAY_SIZE / SIZE, SIZE /
 call flashr(ON,LOWER_RIGHT,eflag)
 required_memory = SIZE*SIZE*REAL4/1024
 call inqxtd(kb_total, kb_unallocated, number_allocations,
 + memory_manager, handle_array, key_array,
 + allocated_array, return_status, eflag)
 if (eflag.ne.0) call bagit(INQXTD_ERROR)
 if (kb_unallocated .lt. required_memory) then
 call bagit(NOT_ENOUGH)

 endif
* allocate 1 Mbyte of extended memory
 print *,'just ahead of memory allocation'
 call getxtd(ARRAY_DIM,ARRAY_SIZE,REAL4,XMS,handle,key,
 + kb_allocated,return_status, eflag)
 if (eflag .ne. 0) call bagit(GETXTD_ERROR)
 print *,'loading extended memory'
 key1 = key
 temp = 0.0
 increment = SIZE*REAL4
 do j = 1,SIZE
 do k = 1,SIZE
 a(k) = 0.00025
 enddo
 call putback(1,SIZE,REAL4,a,key1,bytes_moved,eflag)
 if (eflag .ne. 0) call bagit(PUTBACK_ERROR)
 if (bytes_moved .ne. increment) call bagit(PUTBACK_BADCNT)
 key1 = key1 + increment
 enddo
* process triangular array
 print *,'processing triangular array'
 keyj = key
 keyk = key
 chunk = SIZE*REAL4
 do j=1,SIZE
 print *,'outer loop j = ',j
 ! get arr(x,j) from extended into arrj(x)
 call a2axtd(1,SIZE,REAL4,keyj,arrj,bytes_moved,eflag)
 if (eflag.ne.0) call bagit(A2AXTD_ERROR)
 if (bytes_moved.ne.chunk) call bagit(A2AXTD_BADCNT)
 do k=1,j-1
 keyk = key + (k-1)*chunk
 ! get arr(x,k) from extended into arrk(x)
 call a2axtd(1,SIZE,REAL4,keyk,arrk,bytes_moved,eflag)
 if (eflag.ne.0) call bagit(A2AXTD_ERROR)
 if (bytes_moved.ne.chunk) call bagit(A2AXTD_BADCNT)
 ! do the manipulation
 do i=k+1,SIZE
 arrj(i) = arrj(i) + arrk(i)*arrj(k)
 enddo
 enddo
 ! put arrj(x) back to extended memory
 call putback(1,SIZE,REAL4,arrj,keyj,bytes_moved,eflag)
 if (eflag.ne.0) call bagit(A2AXTD_ERROR)
 if (bytes_moved.ne.chunk) call bagit(A2AXTD_BADCNT)
 keyj = keyj + chunk
 enddo
* sample selected members of the array in extended memory
 do i=1,SIZE,125
 do j=1,SIZE,125
 index(1)=i
 index(2)=j
 call sgtrnm(ARRAY_DIM,ARRAY_SIZE,key,index,retval,eflag)
 if (eflag.ne.0) call bagit(SGTRNM_ERROR)
 print *,i,j,retval
 enddo
 enddo
 call bagit(DONE)
 stop

 end






[LISTING FOUR]

* putback.for--interface a2axtd for conventional to extended memory block
moves
* B. E. Bauer 3/20/92
*
 interface to subroutine a2axtd(i1,i2,i3,r1,i4[VALUE],i5,i6)
 integer*4 i1,i2,i3,i4,i5
 integer*2 i6
 real*4 r1
 end

 subroutine putback(i1,i2,i3,r1,i4,i5,i6)
 integer*4 i1, i2, i3, i4, i5
 real*4 r1(*)
 integer*2 i6
 call a2axtd(i1,i2,i3,r1,i4,i5,i6)
 return
 end





[LISTING FIVE]

* bagit.inc--symbols and declarations used for error handling and the
examples.
* B. E. Bauer 3/20/92
*
 integer*4 INQXTD_ERROR,WRONG_MMANAGER,STOPPING,GETXTD_ERROR
 integer*4 PUTBACK_ERROR,PUTBACK_BADCNT,A2AXTD_BADCNT
 integer*4 A2AXTD_ERROR,A2FXTD_BADCNT,A2FXTD_ERROR
 integer*4 F2AXTD_ERROR,F2AXTD_BADCNT,SSMRNM_ERROR
 integer*4 SMPRNM_ERROR,NOT_ENOUGH,SGTRNM_ERROR
 integer*4 FLASHR_ERROR,DONE

 integer*4 ARRAY_DIM,REAL4,XMS,SIZE,ON,LOWER_RIGHT

 parameter (INQXTD_ERROR=1)
 parameter (WRONG_MMANAGER=2)
 parameter (STOPPING=3)
 parameter (GETXTD_ERROR=4)
 parameter (PUTBACK_ERROR=5)
 parameter (PUTBACK_BADCNT=6)
 parameter (A2AXTD_BADCNT=7)
 parameter (A2AXTD_ERROR=8)
 parameter (A2FXTD_BADCNT=9)
 parameter (A2FXTD_ERROR=9)
 parameter (F2AXTD_ERROR=10)
 parameter (F2AXTD_BADCNT=11)
 parameter (SSMRNM_ERROR=12)
 parameter (SMPRNM_ERROR=13)
 parameter (NOT_ENOUGH=14)

 parameter (SGTRNM_ERROR=15)
 parameter (FLASHR_ERROR=16)
 parameter (DONE=99)

 parameter (ARRAY_DIM = 2) ! 2D array
 parameter (REAL4 = 4) ! size of real*4
 parameter (XMS = -1) ! use available mmanager
 parameter (SIZE = 512) ! size of array
 parameter (ON = 1) ! convenient symbol
 parameter (LOWER_RIGHT = 3) ! where flashr flashes





[LISTING SIX]

* bagit.for--error handler. Prints an appropriate message then calls endxtd
* to ensure allocations are freed.
* B. E. Bauer 3/20/92
*
 subroutine bagit(iflag)
 integer*4 iflag
 integer*2 return_status, eflag

 include 'bagit.inc'

 select case (iflag)
 case (INQXTD_ERROR)
 print *,'error reported by inqxtd'
 case (WRONG_MMANAGER)
 print *,'XMS or Mondified LIM memory manager not found'
 case (STOPPING)
 print *,'stopping...'
 case (GETXTD_ERROR)
 print *,'error reported by getxtd'
 case (PUTBACK_ERROR)
 print *,'error in putback(a2axtd)'
 case (PUTBACK_BADCNT)
 print *,'wrong number of bytes moved by putback(a2axtd)'
 case (A2AXTD_BADCNT)
 print *,'wrong number of bytes moved by a2axtd'
 case (A2AXTD_ERROR)
 print *,'error in a2axtd'
 case (A2FXTD_BADCNT)
 print *,'wrong number of bytes moved by a2fxtd'
 case (A2FXTD_ERROR)
 print *,'error in a2fxtd'
 case (F2AXTD_ERROR)
 print *,'error in f2axtd'
 case (F2AXTD_BADCNT)
 print *,'wrong number of bytes moved by f2axtd'
 case (SSMRNM_ERROR)
 print *,'error in ssmrnm (scalar multiply)'
 case (SMPRNM_ERROR)
 print *,'error in smprnm (el-by-el multiply)'
 case (NOT_ENOUGH)
 print *,'inadequate extended memory available'
 case (SGTRNM_ERROR)

 print *,'error in sgtrnm (real*4 get)'
 case (FLASHR_ERROR)
 print *,'error in flashr'
 case (DONE)
 print *,'freeing extended memory'
 end select
 call endxtd(return_status, eflag)
 stop 'done, exiting...'
 end





















































June, 1992
FORTEX, A FORTRAN RUNTIME EXECUTIVE


A Fortran I/O-enhancement tool




Harold R. Justice


Harold is the manager of a simulation-development and software-engineering
group for Dynetics. He can be contacted at P.O. Drawer B, Huntsville, AL
35814.


Fortran is the language of choice for many scientists and engineers,
especially for simulation programs. Although languages such as C, Pascal, Ada,
and special-purpose simulation languages like the Advanced Continuous
Simulation Language (ACSL) and Simulation Language for Alternative Modeling
(SLAM) have made inroads into scientific programs, Fortran's popularity for
scientific and engineering applications has remained unabated.
Fortran provides I/O facilities in the form of read/write (or print)
statements used as primitives to develop the user interface (UI). The
application developer can easily spend a large amount of development time
programming the UI with hardcoded read/write and format statements. Most
Fortran environments make available a symbolic I/O method called NAMELIST.
Although NAMELIST provides a symbolic means to set the values of the variables
specified in the NAMELIST block, its use is generally restricted to reading
the values from a file rather than giving you the flexibility to enter the
values interactively. The output capabilities of NAMELIST are limited to every
variable in the NAMELIST block or nothing. NAMELIST lacks facilities to
selectively output a sublist of variables.
Since many Fortran users are not interested in the programming aspects, per
se, but in developing a program that will solve a particular problem, the
tedium of programming the UI is not appealing. On the other hand, the UI must
be seriously considered for those programs that will be used by anyone other
than the programmer.
This article discusses a set of I/O options provided by an application tool
called the Fortran Runtime Executive (FORTEX). I developed FORTEX to put a
friendly UI on a simulation program that had a decidedly unfriendly interface
provided by hardcoded I/O statements. The simulation program designer had
previously developed several simulations in ACSL and had become accustomed to
its rich set of I/O capabilities. He was proud of the algorithmic code but
unhappy with the rigid method of communication with the program. Thus, FORTEX
was conceived.
FORTEX is compliant with ANSI Fortran 77 and has been ported to several
platforms from PCs to mainframes with very few differences in the code.


FORTEX Features


FORTEX, a runtime environment for Fortran, features a Command-Line Interface
(CLI) that uses symbolic commands to control I/O and the order of program
operation. It provides facilities for setting and displaying values of the
program variables, selectively listing variable values during a run, and the
selective recording of these values for post-processing. FORTEX provides a
powerful macro feature that allows you to name a block of frequently used
commands and invoke these commands by merely entering the macro name. FORTEX
will read application command files that can be used to initialize data,
define macros, and specify FORTEX system parameters such as the frequency at
which data are recorded. FORTEX can also be used to control a purely batch
application.
FORTEX provides access to any program variable specified in a labeled common
block. Local subroutine and function variables, however, cannot be accessed by
FORTEX. A preprocessor parses one or more of the application's common blocks
and associated specification statements and builds a subroutine that contains
data structures encapsulating all necessary information about the common-block
members. This subroutine is used to symbolically access the individual
common-block members during the execution of the application program.


Interfacing to the Application Program


FORTEX can be added to an existing program with only a few additions to the
code in the form of subroutine calls. Existing I/O statements can be left in
the program or removed. Local variables must be placed in common in order to
be accessed by FORTEX. You are strongly urged to place each common block and
the associated specification statements in a separate include file, but this
is not a requirement. The common and specification statements are placed in an
input file for the preprocessor in the form of include statements, source
statements, or a combination of include and source statements. Include files
may contain embedded include statements.
After additions are made to the application program and the preprocessor is
executed to build the symbolic access routine, the application code and
symbolic access routine are compiled and linked with the FORTEX library
routines to build the application program. A makefile is normally used to
automate these steps.
The application program is controlled during execution with FORTEX commands;
see Table 1. Each command can be abbreviated to three characters, and the
frequently used commands SET and DISPLAY can be specified with their first
letter. FORTEX considers lowercase to be equivalent to uppercase.
Table 1: FORTEX commands.

 Command Description
 ------------------------------------------------------------------------

 action Sets a constant or variable during execution.
 close Closes a previously opened file.
 contin Continues execution of the program after an interrupt.
 display Displays the value of an interface constant or variable.
 dump Displays the values of all interface constants and variables.
 dynetx Invokes the graphical user interface (Dynet-X);
 not active for stand-alone FORTEX.
 echo Prints a message to the display.
 end Defines the end of a runtime macro.
 exit Terminates execution of the program (same as "stop").
 help Invokes the FORTEX help facility.
 history Lists previous commands.
 include Loads the contents of a command file (same as "read").
 inquire Checks whether a Fortran unit number is in use.
 macro Defines use commands by combining other FORTEX commands.
 open Opens and alternate, user-supplied, command file or data
 file.
 output Writes specified variables to the screen during execution.

 plot Transfers control to a plotting program.
 prepare Records specified variables for printing or plotting.
 print Prints recorded variables in tabular form (post-processing).
 quit Terminates execution of the program (same as "stop").
 range Finds minimum and maximum values of prepared variables.
 read Loads the contents of a command file (same as "include").
 reinit Transfers control to application program to reinitalize
 state variables to current values.
 restore Restores values previously saved with the save command.
 return Returns control to the calling routine.
 run Begins execution of the program (same as "start").
 save Saves values of the interface constant and variables to a
 binary file.
 set Sets the value of an interface constant or variable.\
 show Shows a variable's type, dimension, and location within its
 common block.
 spare Calls a user-supplied subroutine.
 start Begins execution of the program (run may also be used).
 stop Terminates execution of the program (quit or exit may also
 be used).
 xdisplay Displays the value of an integer constant or variable in
 hexadecimal.
 xset Set the value of an integer constant or variable in
 hexadecimal.
 !! Repeat the previous command.
 !an Repeat the most recent command that begins with the string
 "an."
 !n Repeat the nth command.
 "comment" Any text enclosed in quotes is a comment.
 $ Command separator.
 ... Continuation characters for continuing the present command to
 the next line.

The FORTEX preprocessor builds a subroutine that contains data structures
collectively referred to as the FORTEX dictionary. The dictionary contains the
variable name, common-block name, location in the common, dimension, and the
type for each variable presented as input to the preprocessor. This subroutine
allows FORTEX to symbolically access the individual common-block members. The
preprocessor also generates an output file containing a special form of each
common block input to the preprocessor.
The input file, named clicom.def, contains common-block statements and
associated specification statements or include statements that name a file
containing common and specification statements. The file clicom.def should
contain all common blocks and specification statements required for FORTEX
interface access, but it need not contain all commons used in the Fortran
application.
The preprocessor builds two output files: clidict.f and clicom.out. clidict.f
contains the dictionary subroutine which provides access to the information
needed to symbolically address the variables in common blocks. clicom.out
contains a special form of each common block parsed by the preprocessor. The
common blocks in clicom.out contain only one member for each common block. For
noncharacter commons, the single member is a real array with a single
dimension equal to the number of single-precision memory words occupied by the
common. For character common blocks, the single member is a character
variable, the length of which is equal to the total length of all the
character members of that common.
Once the interface between the Fortran application and FORTEX is defined by
the preprocessor, you can add calls to the FORTEX routines to your code and
link with the FORTEX library.


Adding FORTEX to the Application


Adding a FORTEX interface to a Fortran program involves the following steps:
Identify UI variables and place them into common blocks; place the common
blocks (or include files) in clicom.def for the FORTEX dictionary builder; add
FORTEX executive and data-recording calls to the program; execute the
preprocessor; compile the application routines and clidict.f; and link the
application to the FORTEX library.
The core of the FORTEX interface is a program structure that calls the
command-line executive (clexec) and data-recording subroutine (clrecd) at
appropriate points. Separating the structure from the application code can be
achieved with a simple main program that calls the top-level application
subroutine. Calls can be issued from the main program to perform each of the
following functions: initialize variables used in the application; run the
interface application; continue execution of the interface application
following an intermediate termination; and calculate terminal conditions (for
example, final values at the end of a FORTEX application run).
The FORTEX preprocessor (clibdict) must be executed before any FORTEX
application can be built. clibdict determines (from external files, as
previously explained) information about the number, type, size, and order of
variables declared in the Fortran application. clibdict then creates a
dictionary of storage locations in memory (corresponding to each variable.
This dictionary is used by FORTEX to symbolically access the program
variables.
All subroutines used with the FORTEX interface can be compiled with a standard
Fortran compiler. FORTEX provides a library designed to be linked with
application-specific subroutines. This linking process generates an executable
file for the interface application.


A Trajectory-analysis Program


Integrating FORTEX with the application is independent of the
application-program size. We've used FORTEX with programs having a few
variables (less than 25) to very large simulation programs with more than
1000. The integration time depends heavily on the program's use of common
blocks. If all variables you wish to access are already in common, and the
commons are stored in include files, the integration time may take only a few
minutes.
To illustrate, I'll use a trajectory-analysis program called MONTE to show how
FORTEX is added to an application and how the application is controlled by the
FORTEX commands. The analysis program reads a data file generated by a
missile-trajectory program that was repeatedly executed in a Monte Carlo
fashion. Monte Carlo refers to a simulation method whereby selected simulation
variables are varied for each run. The resulting deviations in the time
histories of the missile trajectories indicate the performance boundaries of
the missile. The simulation analyst is interested in the expected or average
trajectory and in the predicted variations in the trajectories, usually in the
form of the 3-standard deviation (3-sigma) trajectory. MONTE computes the
average and n sigma trajectories, where n is user selected through FORTEX. A
trajectory is composed of the missile downrange, cross-range, and altitude
components. The user may select which component to analyze and after analysis
is completed on the selected component, another component may be selected.
The main program of an application should normally be a short, executive-type
routine that controls the program flow. Listing One (page 116) is the main
program. Listing Two (page 116) is the application code, which consists of
subroutines dynxrun, mcsig, mcavg, and block-data routine mondat. Listing
Three (page 116) is the common-block and associated-specification statements
contained in the include file monte.h. The uppercase executable statements in
Listings One and Two are subroutine calls that are necessary to invoke the
FORTEX CLI and the data-recording routine. This example requires the addition
of only two subroutine calls to invoke the FORTEX interface. More
sophisticated applications may need additional calls to the data recording
routine and to the ACTION routine if runtime actions are desired.
The main program (climain) provides a loop that will repeatedly call the CLI
executive (clexec) and the application routines until the user enters the STOP
command. The call to clexec invokes the command-line interface which executes
FORTEX commands. After a START command is issued, control returns to climain
and dynxrun is called to execute the analysis routines. If a STOP command is
issued, climain causes the program to terminate.
The call to clrecd is somewhat arbitrary. Generally, it should be placed where
data can be recorded for each update of the program with appropriate logic to
limit calls as desired by the user at run time. For data that will be plotted,
you generally want a higher recording rate than for printed output. FORTEX
provides two system parameters to control the amount of output: nciout
controls the rate of output as the run executes, and nciprn controls the rate
at which post-processing results are printed. The call to clrecd in MONTE is
placed so that every computed point is also recorded.


Runtime Session



FORTEX accepts commands from the keyboard during an interactive session, and
from a command file during batch execution. Command files are frequently used
in interactive sessions as well. Command files are very useful in setting up
standard scenarios and for defining customized commands as macros. Included
with the electronic version of the listings is the command (or setup) file
used with MONTE. The command file uses two macros called LATDAT and VERDAT.
LATDAT sets up the parameters to analyze the lateral or cross-range component,
and VERDAT sets up the parameters so that the vertical component will be
analyzed.
Also available electronically is a partial example runtime session of MONTE.
I've placed comments in the (command stream to help explain the commands.
After the run is completed, control is passed to the plot program. The
resulting trajectory plot for the vertical component is shown in Figure 1.


Conclusion


FORTEX is a Fortran enhancement tool that provides a flexible and easily
mastered UI that can be quickly added to an existing program or incorporated
into a new application by following a simple set of programming rules. The
primary benefit of FORTEX is that the user is put completely in charge of the
application program. Any or all variables can be monitored during a run
without having to change the code. Traditional "what-if" runs can be made with
simple command sequences. Furthermore, FORTEX is useful during the debugging
stage since any variable can be monitored without adding print statements.


_FORTEX: A Fortran Runtime Executive_
by Harold R. Justice


[LISTING ONE]

c- climain.f -- top-level calling system utilized by FORTEX CLI.
c- The main program calls user routine "dynxrun". For CLI-only, this routine
c- the name is arbitrary. When linked to GUI library, "start" button calls
c- the subroutine "dynxrun", a reserved name that MUST be used with the GUI.
c- Dynetics, Inc. 1000 Explorer Blvd. Huntsville, AL 35806 (205) 922-9230
 program climain
 integer icntl
100 continue
c-------CLEXEC sends the program to the CLI> prompt
 CALL CLEXEC(icntl)
c-------check the return code to see whether the user typed 'stop'
 if( icntl.lt. 0 ) then
 write(*,*)'Normal exit from program'
 stop
 endif
c-------call the user program to do computations, etc.
 call dynxrun
c-----return to the user CLI> prompt so that we may run again
 goto 100
 end








[LISTING TWO]

c- monte.f -- bob graves -- (c) 1992, dynetics, inc., huntsville, al
c- name type description
c- ---- ---- -----------
c- dbldat dp array to read line of data from file
c- idat int data counter
c- irun int run counter
c- nmnvec int number of points in shortest run
c- nmxvec int number of points in longest run
c- numrun int number of runs read from file
c- there log boolean file exists (t) or not (f)
c- clrecd sub FORTEX data recording routine

c- dynxrun sub main number cruncher for monte program
c- mcavg sub compute average for all points, all runs
c- mcsig sub compute sigma (std dev) for single point
 subroutine dynxrun
c- include file listed in clicom.def for FORTEX dictionary
 include 'monte.h'
 double precision dbldat(100)
 logical there
c- error checking
 if( nsigma .lt. 0.0 ) then
 write(*,*)'err: nsigma negative: defaults to 3.0'
 nsigma = 3.0
 endif
c- user may access the file name through FORTEX set/display
 inquire( file = fname, exist = there )
 if( .not. there ) then
 write(*, *) 'file not found : ', fname
 return
 endif
c- file exists, so open.
 open( lfdat, file=fname, form='unformatted', status='old')
c- initialize counters.
 idat = 1
 irun = 1
c- data read loop reads from a FORTEX "clirrr" type file and
c- presumes that time (independent variable) has been prepared
c- first on the list. other parameters are data (nparm of them)
100 continue
 read(lfdat,err=110,end=110) t(idat,irun),(dbldat(i),i=1,nparm)
c- pick out the column of data for analysis
 work(idat,irun) = dbldat(ivar)
 if( idat .gt. 1 ) then
c- test whether we have entered into another run.
 if( t(idat, irun ) .lt. t(idat-1,irun) ) then
c- set the number of data points in the run.
 num(irun) = idat - 1
 irun = irun + 1
 if( irun .gt. 1 ) then
c- already into next run, so copy it to first element of next run
 t(1, irun ) = t( idat, irun - 1)
 work(1, irun ) = work( idat, irun - 1)
 endif
 idat = 2
 goto 100
 endif
 endif
 idat = idat + 1
 goto 100
110 continue
c- close the file in case we run again
 close(lfdat)
c- assign the number of data points in the last run.
 num(irun) = idat - 1
 numrun = irun
c- find the length of the shortest and longest runs.
 nmxvec = -1
 nmnvec = mxdat
 do 300 irun = 1, numrun
 nmxvec = max ( num(irun), nmxvec )

 nmnvec = min ( num(irun), nmnvec )
300 continue
c- determine data extend or truncate mode: trunc=.t., truncate long runs;
c- trunc=.f., extend all shorter runs to be identical to longest run
 if ( trunc) then
 numvec = nmnvec
 else
 numvec = nmxvec
 imxrun = 1
c- load shorter runs with longest run data. mechanism drives std dev to 0.0
 do 350 irun = 1, numrun
 if( nmxvec .eq. num(irun) ) then
 imxrun = irun
 endif
350 continue
c- assign longest run data to shorter run data.
 do 410 idat = nmnvec, nmxvec
 do 400 irun = 1, numrun
 work(idat, irun ) = work(idat, imxrun )
410 continue
400 continue
 endif
c- analysis section. looping for FORTEX output.
c- do average and +/- n sigma computations.
 call mcavg( avgvec, numvec, numrun, mxdat, work )
 do 450 idat = 1, numvec
 avg = avgvec(idat)
 time = t(idat,1)
 call mcsig( avg, idat, numrun, mxdat, work,
 . nsigma, sigpos, signeg, sig )
c- FORTEX data recording in clrecd
 CALL CLRECD(idat-1)
 450 continue
 return
 end
 subroutine mcsig( avg, idat, nrun, mxdat, work,
 . nsigma, ps3, ns3, sig )
c- compute +/- n sigma for data passed in.
 save
 double precision avg, ps3, ns3, sig, work(mxdat,*), sum
c- initialize.
 sum = 0.0
c- loop through points.
 do 100 irun = 1, nrun
 sum = sum + ( work(idat, irun ) - avg )**2
100 continue
 if(nrun .gt. 0) then
 sig = sqrt(sum / dble( nrun - 1 ))
 else
 sig = 0.0
 endif
c- compute + and - N sigma vectors.
 ps3 = avg + nsigma * sig
 ns3 = avg - nsigma * sig
 return
 end
 subroutine mcavg( avg, ndat, nrun, mxdat, work )
c- compute the average of nruns of data.
 save

 double precision avg(*), work(mxdat,*), sum
c- initialize.
 do 110 idat = 1, ndat
 sum = 0.0
c- loop the nrun loop.
 do 100 irun = 1, nrun
 sum = sum + work( idat, irun )
100 continue
c- compute the average for this parameter.
 avg( idat ) = sum / dble( nrun )
110 continue
c- return.
 return
 end
 block data mondat
 include 'monte.h'
c- initialize through data statements, may also use the monte.setup file
 data nsigma /3/
 data fname /'monte.dat'/
 data nparm /2/
 data ivar /2/
 end







[LISTING THREE]

c- monte.h / dynetics, inc., huntsville, al
c- name type description
c- ---- ---- -----------
c- mxdat parm maximum number of data elements per run
c- mxrun parm maximum number of runs
c- work dp memory copy of raw data for use in statistics
c- avg dp average for a given time point
c- avgvec dp average vector (keep the averages for std dev calc)
c- signeg dp minus n-sigma value for a given time point
c- sigpos dp plus n-sigma value for a given time point
c- sig dp sigma (std dev) for a given time point
c- nsigma int 1,2,or 3 for the +/- N sigma computation
c- t dp time array
c- time dp time for a given point (independent variable)
c- nparm int number of data columns (not counting time) in file
c- lfdat int logical file number of the data
c- fname char name of the input data file (binary)
c- num int array of number of data points per run
c- ivar int which data column to analyze
c- trunc log whether to truncate to shortest run (T) or extrapolate
 parameter( mxdat = 1000 )
 parameter( mxrun = 100 )
 double precision work( mxdat, mxrun )
 double precision avg, avgvec(mxdat), signeg, sigpos, sig
 double precision t(mxdat, mxrun ), time
 common /monted/ work, avg, avgvec, signeg, sigpos, sig, time, t
 integer numvar, nsigma, lfdat, nparm, ivar, num(mxrun)
 logical trunc

 common /monte/ numvar, lfdat, nsigma, nparm, ivar, num, trunc
 character*64 fname
 common /montec/ fname



[Example 1: MONTE command file]

echo Setup for MONTE program
set fname='traj.rrr'
set ivar = 2
set lfdat = 30
set nparm = 2
prepar time,avg,sigpos,signeg,sig
display 'prepare'
macro facts
 echo File name with trajectory data
 display fname
 echo Number of data columns
 display nparm
 echo Column of data to analyze
 display ivar
end
macro sig3
 echo 3 Sigma (nsigma=3)
 set nsigma = 3
end
sig3
macro latdat
 echo Analyze the crossrange data
 set ivar=2
 set trunc=.t.
 set fname='../inmonte/clirrr'
 set nparm=3 $ "number of data columns in missile simulation clirrr file"
 facts
 output 'clear'
 output time,avg,sigpos,signeg
 start
 plot
end
macro verdat
 echo Analyze the altitude data
 set ivar=3
 set trunc=.t.
 set fname='../inmonte/clirrr'
 set nparm=3 $ "number of data columns in missile simulation clirrr file"
 facts
 output 'clear'
 output time,avg,sigpos,signeg
 start
 plot
end
set nciout=10
s cmd=5



[Example 2: Partial output from example run-time session]


riddler77% DYNET-X (C) 1991 FORTRAN RUN-TIME EXECUTIVE V3.06 DYNETICS, INC.
 CLI> read 'monte.setup'
 Setup for MONTE program
 Prepare List
 TIME AVG SIGPOS
 SIGNEG SIG
 End Prepare List
 3 Sigma (nsigma=3)
 CLI> "Run program to analyze vertical data"
 CLI> display nciout
 NCIOUT 10
 CLI> "Every 10-th data point will be printed to the screen"
 CLI> verdat
 Analyze the altitude data
 File name with trajectory data
 FNAME '../inmonte/clirrr '
 Number of data columns
 NPARM 3
 Column of data to analyze
 IVAR 3
 TIME 0.00000000 AVG 0.00000000 SIGPOS 0.00000000
 SIGNEG 0.00000000
 TIME 0.09999999 AVG 18.96214371 SIGPOS 20.76645258
 SIGNEG 17.15783483
 TIME 0.20000002 AVG 32.36762810 SIGPOS 36.44571089
 SIGNEG 28.28954530
 .
 .
 .
 TIME 4.80001497 AVG 719.5072083 SIGPOS 784.4273663
 SIGNEG 654.5870502
 TIME 4.90001726 AVG 738.4712952 SIGPOS 805.1422782
 SIGNEG 671.8003122
 CLI> "Only output every 100th point"
 CLI> set nciout=100
 CLI> "Run program to analyze the lateral data"
 CLI> latdat
 Analyze the crossrange data
 File name with trajectory data
 FNAME '../inmonte/clirrr '
 Number of data columns
 NPARM 3
 Column of data to analyze
 IVAR 2
 TIME 0.00000000 AVG 0.00000000 SIGPOS 0.00000000
 SIGNEG 0.00000000
 TIME 0.99999934 AVG 13.08728914 SIGPOS 15.78440254
 SIGNEG 10.39017575
 TIME 1.99999845 AVG 25.58025684 SIGPOS 28.22318408
 SIGNEG 22.93732961
 TIME 2.99999762 AVG 37.22265167 SIGPOS 40.50114335
 SIGNEG 33.94416000
 TIME 3.99999666 AVG 50.43453903 SIGPOS 54.86466342
 SIGNEG 46.00441465
 CLI> display nsigma
 NSIGMA 3
 CLI> "Change sigma level to 2"
 CLI> set nsigma = 2
 CLI> "Run lateral case again to obtain 2-sigma trajectories"

 CLI> "Normally, you would type 'start' to run the program"
 CLI> "but we would like to illustrate the repeat command"
 CLI> !20
 latdat
 Analyze the crossrange data
 File name with trajectory data
 FNAME '../inmonte/clirrr '
 Number of data columns
 NPARM 3
 Column of data to analyze
 IVAR 2
 TIME 0.00000000 AVG 0.00000000 SIGPOS 0.00000000
 SIGNEG 0.00000000
 TIME 0.99999934 AVG 13.08728914 SIGPOS 14.88536474
 SIGNEG 11.28921355
 TIME 1.99999845 AVG 25.58025684 SIGPOS 27.34220833
 SIGNEG 23.81830535
 TIME 2.99999762 AVG 37.22265167 SIGPOS 39.40831279
 SIGNEG 35.03699056
 TIME 3.99999666 AVG 50.43453903 SIGPOS 53.38795529
 SIGNEG 47.48112278
 CLI> stop








































June, 1992
PROGRAMMING PARADIGMS


Multimedia Paradigms




Michael Swaine


Here's the premise: The way in which we interact with computers is in the
early stages of a paradigm shift. I don't know what the new paradigm will be,
although next month I'll report on an interesting theory about that. This
month, I'll present some arguments for why we need one and report on the
HomeMedia Expo, where the evidence was strong that a new approach to
user-interface design is needed.
I can tell you generally where to look for the new UI paradigm. It's in the
direction of dynamic media like sound and video, the direction of increased
interactivity and user involvement, the direction of the current explorations
in virtual reality, the direction of the smart television (possibly not the
oxymoron it seems to be), and of home uses for computers. If you say that
sounds like more than one direction, I'll concede it; then the direction I'm
talking about is the sum of these vectors.
I realize that a lot of people haven't accepted that last paradigm shift in
human-computer interaction, the one involving mice and metaphorical desktops;
and probably with good reason. But the next paradigm shift could be as
different from that one as it was from the character-based, command-line
interface.
I think it's entirely possible that the desktop interface invented at Xerox
PARC (with the Apple-Microsoft suit coming to trial any day now, it seems
appropriate to keep that truth alive) will eventually be a footnote in
textbooks on the design of human-computer interaction. I can hear some future
teacher struggling vainly to convince a skeptical student that the so-called
desktop really was an innovation, "considering how little people knew back
then." And it seems possible that the next paradigm of interaction, when it
finally shakes out, will appeal to those who now raise legitimate objections
to the GUI, or PARCface.
That's the premise, and you don't have to accept it; I won't even argue it
seriously until next month. But the phenomena are real: video, sound, MIDI,
virtual reality, simulations, scientific visualizations, digital television,
cable, pay-per-view, two-way-TV, ISDN, optical media. Whether or not they lead
to a paradigm shift in human interaction, and if so what the new paradigm will
be, remain to be seen.


Arguments for a New User-interface Model


Here are several arguments why these phenomena should lead to a new way of
dealing with the computer, paraphrased or distilled from the comments of some
astute observers:
The hardware needs the exercise. The keys to survival in the computer-hardware
industry are: smaller, faster, and more powerful. (Cheaper is also nice, but
it has a rather brutal effect on the bottom line.) Fortunately for the
hardware manufacturers, those are also the keys to survival in the
semiconductor industry, so the hardware manufacturers keep getting the
components they need to build the progressively hotter machines. So we can
count on computers continuing to get smaller, faster, and more powerful. Safe
bet.
So what does that mean for the software developer? That we need new categories
of products; products that really exercise the new hardware, gobbling up the
increased bandwidth and processing power. Multimedia products that require
gigabytes of storage, megabytes of RAM, and the power to put full-motion video
on the screen, running at movie speeds. What else are people going to do with
a 586?
I heard Bill Gates put forth this pragmatic view, speaking to developers and
retailers at COMDEX two years ago. That was his famous "Information at Your
Fingertips" speech, and although he had a lot more to say about emerging
technologies in that speech, one theme was this "use it or lose it" idea.
So we will get multimedia because the hardware needs it, and multimedia
obviously needs a better interface than the PARC model, better than a GUI. A
perverse argument, but a plausible one.
We're running out of suckers. That first computer or first spreadsheet package
or first word processor that the customer buys is an incredible bargain. The
difference in efficiency between doing it by hand or by typewriter and doing
it by computer is enormous. The main challenge early computer manufacturers
and retailers faced in selling personal computers was in overcoming ignorance
and prejudice and fear. That's plenty of challenge, I suppose, but my point is
that the benefits didn't need a whole lot of selling once they were pointed
out.
Selling the second or third computer is another matter. How do you get a
company with hundreds or thousands of perfectly good machines to throw them
out and replace them with this year's model? Well, you offer them smaller,
faster, and more powerful. But the pitch is harder now. Most word processing
is letter writing, most spreadsheets are not huge, and in short, most of the
work that people are using computers for most of the time doesn't require any
more speed or power than a PC AT. Given that customers have been sold a GUI,
they now need at least a 386-class machine, but for most users that's the
limit.
Faced with diminishing returns from selling upgrades and a shrinking stream of
first-time customers, the industry needs to open the computer up to a broader
group of people. With new users come all the new-user problems: ignorance and
prejudice and fear and an unreasonable unwillingness to put up with the
accepted nonsense. For these people, the installation procedure had better be
something like: Stick the cartridge in the slot. This next big group of
computer users, if you can find them, will also have less money to spend than
current users.
The trade-off is that if you guess right about those potential users, there
are a lot of them: more than the whole market for computers today. John
Sculley seems more committed to this idea than any other computer company
president. He is creating whole new profit centers within Apple to develop
products for (and to market products to) the consumer market. For the consumer
market, you need to do better than a GUI.
They want their MTV. Communications technology is going digital: television,
telephone; this brings it into computer territory, and questions about how
these digital media will relate to one another and to computers have to be
addressed. Nick Negroponte of MIT's Media Lab has more to say about this than
anybody else. He says that a lot of what we think and do with these media is
going to get turned upside down. For example, regarding the physical channels
he says that everything that currently goes through the air will go through
the ground and everything that currently goes through the ground will go
through the air. It's hard to believe that the human interface won't have to
change at least as much. Consider what you'd get if you just grafted the
human-interface design principles implied by a VCR controller onto the Mac
guidelines. Despite the fact that Apple and Microsoft and Macromind/Paracomp
and everybody else doing multimedia controllers is doing just that, it's a
pretty awful idea. Something's gotta change.
It's time to throw the rascals out. Maybe a lot has got to change. There is at
least one authority on the subject who is saying that the current GUI
human-computer interface design is hopelessly out of date, that it needs to be
jettisoned in toto, and that a completely new start has to be made--but that's
next month's argument.


Four Maps of Multimedia


The list I rattled off at the beginning of this column--video, sound, MIDI,
virtual reality, simulations, scientific visualizations, digital television,
cable, pay-per-view, two-way-TV, ISDN, optical media--probably sounded random.
It certainly isn't a single technology or even a single industry. But it is a
confluence of technologies, a collision of industries, and as such, a single
phenomenon, and a significant, if complicated, one. Getting a grip on it is
not easy.
There are, though, some newsletters, magazines, and conferences that try with
varying success to bring some order to the chaos. They generally have media in
their titles. Here is my short list.
Two newsletters do an excellent job of covering the field. Denise Caruso's
Digital Media costs $395 a year ($401 Canada, $413 foreign) and is published
by Seybold Publications, P.O. Box 644, Media (no kidding), PA 19063. Denise
has been writing about computer technology for a long time and knows her
stuff. The newsletter is not a solo effort, though; she has other writers. The
emphasis is on emerging technologies, the strategies of the big companies,
analysis, and informed opinion. The content can occasionally get technical, as
in a recent piece on compression strategies.
Tony Bove and Cheryl Rhodes have been at it longer than Denise. Their
newsletter, Bove & Rhodes Inside Report, ostensibly covers multimedia and
desktop publishing, but they occasionally slip in pieces on operating systems,
scripting tools, and other topics that interest them. I find that my interests
coincide with theirs. Tony & Cheryl founded (then sold) Publish magazine; as
their interests evolved toward multimedia, Inside Report followed. I find it
better informed than most other industry newsletters. Inside Report is monthly
and costs $245 ($260 in CA including tax, $270 outside the U.S.); Bove &
Rhodes, P.O. Box 1289, Gualala, CA 95445.
And the disclaimers: 1. Tony, Cheryl, and Denise are friends of mine. But
then, most of my friends are writers, and I haven't promoted all their
publications here. 2. I get my issues of Digital Media and Inside Report
through subscription exchange with my newsletter, HyperPub. But review copies
of software and books are usually free, too.
There are other publications, but I frankly don't see that the area is solid
enough to merit a magazine yet; newsletters are the best way to stay on top of
a developing industry.
Another way is by going to conferences. The granddaddy of conferences in this
realm is probably the Microsoft CD-ROM Conference. It's a good place to hear
Microsoft's strategy and to get current on standards and CD-ROM technology.
Another conference that is proving very informative is the annual HomeMedia
Expo in Beverly Hills, which covers this whole field. Well, it covers all the
technologies I've mentioned as well as the legal and licensing issues. Its
perspective is strictly home and entertainment, and I think that's
appropriate. If there is a big new group of potential computer users, they
aren't in offices; they must be in homes. And computer products for the home
market will need a more friendly interface than the ubiquitous GUI.
I attended the first HomeMedia Expo last year and the second this spring. It's
put on by American Expositions Inc., 10 Greene Street #703, New York, NY
10012, and costs in the neighborhood of $500 plus airfare and hotel, although
if you're actually in the neighborhood (Beverly Hills) and only want to see
the exhibits, you can get in for $10.


UI Begins in the Home


The collection of companies present or represented at the HomeMedia Expo was
interesting: Apple, Sega, Phillips Interactive Media, NEC, Motorola, Adobe,
Dolby, LucasArts Entertainment, Sony, Capitol-EMI, Playboy, Kane & Company
Investment Bankers, Billboard magazine, cable companies, and television
stations and networks. The list of people present was also interesting:
Timothy Leary, Michael Nesmith (the Monkee), the voice of Roger Rabbit, the
creators of Lawnmower Man, and various game designers, professional musicians,
animators, producers, agents, and lawyers. The issues dealt with included home
video and sound production and playing, MIDI, virtual reality, simulations,
scientific visualizations, digital television, cable, pay-per-view,
two-way-TV, ISND, optical media, and licensing issues. It's from this
conference that I got the idea that all this stuff fit together. Last year I
didn't get it; but this year, attending the sessions, listening to
conversations in the hallways, seeing common themes in the exhibits, I started
to pick up the gestalt.
Apple is very serious about home media. It has set up new divisions of the
company to produce and sell products for this market. Apple's Phillip Schiller
spoke about some of the plans. Some of what he had to say has been reported
elsewhere: Apple will produce personal digital assistants, or PDAs, small
limited-purpose devices that will respond to speech and/or pen gestures. But
it is safe to guess that the company is also going to be involved in two-way
video, electronic books, multimedia players, and other consumer products.
Today's consumer market, though, is not interesting to Apple, which will act
as a "technology provider" for a new kind of consumer product, strictly
digital and the result of the convergence of industries. Late this year there
will be consumer Macs, CD-ROM-equipped Macs that are not for the consumer
market, and the first of the (consumer) PDAs. Apple is working with both Sony
and Sharp to produce the consumer products, and will work with other partners
as well. Unfortunately, that new user-interaction paradigm doesn't exist yet,
so the consumer machines will be running System 7.
The first PDA won't have a new interface, either; it will be HyperCardish. But
then, according to panelists in the press roundtable session, the first PDAs
won't be consumer products in a strong sense: They'll be organizers, executive
toys. There was also some discussion on the press panel of the idea that sex
will sell the new consumer products, an argument that I made in a column a
year ago. Currently, the most popular products are electronic books or
interactive CD-ROM presentation of annotated literature, art, or music, but
products like Virtual Valerie and MacPlaymate are reminiscent of the early
days of the VCR market. The panelists also cited the early days of France's
Minitel system, when sexually oriented material was prevalent, and pointed out
that this tapered off quickly. The argument was that even if sexual material
becomes common in these new media, it will soon fade away.
Possibly. I report for your edification that the Expo did include a session on
R-rated content, and that I covered it on your behalf. It was pretty boring.
Licensing, I learned at another session, is a major problem for anyone trying
to put together a multimedia product that uses any published material. So bad,
in fact, that it's generally cheaper to produce all original material. Not
only will you have to negotiate for three different types of rights with the
record company, the writer, and the performer of a song you want to use a clip
from in a game, for example, but you may have to front a significant chunk of
money as a guarantee of earnings. The trickiest part is that you may have to
negotiate for rights in the dark: Say you want to use a clip from a movie. The
movie company will give (well, not give, but license) you certain rights, but
it won't tell you what other rights you need to negotiate for with the actors,
the writers of the music playing in the background, and so on. The only
indication of who owns what in the movie is in various contracts, and you
shouldn't expect the movie company to open up its files to you. So you could
find yourself buying rights from someone who already sold all his rights,
simply because you don't know what rights he has retained. It's a mess.
An innovative way to get around the rights problem is visible on television
every week; it's also a solution to the problems that come up in re-purposing
of movie material, where material that wasn't originally intended to be
interactive is warped into an interactive format. George Lucas's "Young
Indiana Jones" is written and produced explicitly with interactive game
spinoffs in mind. That may be a solution only for George Lucas, some of whose
employees were present at the HomeMedia Expo.

Finally, over lunch, I learned the secret of this year's Best Actor Oscar.
After Anthony Hopkins made the rounds of the talk shows explaining that the
essence of his character Hannibal Lector in The Silence of the Lambs was that
he never blinked, it turns out that he did blink. All his blink frames were
just edited out of the film. In today's media, even a stare can be a special
effect.





























































June, 1992
C PROGRAMMING


Software Development '92 and D-Flat Dialog Boxes


 This article contains the following executables: DFLT12.ARC D12TXT.ARC


Al Stevens


I spent a week in February in Santa Clara attending Software Development '92.
This programmer's trade show gets bigger every year, and the breadth of its
coverage provides a measure of where we are going and how we are going to get
there. The emphasis this year was on Windows--not by design of the show's
promoters but by virtue of the overwhelming number of Windows tools on
display. Everywhere you looked there was another new programmer's tool for
writing Windows code. Microsoft announced their C/C++ compiler with the
Microsoft Foundation Class Library, which puts the Windows SDK in an
object-oriented C++ wrapper. Instantiate an object, and a window opens. Send
it a message, and it does something. Go out of scope, and the window closes.
This is Microsoft's answer to Borland's OWL. Those guys keep answering each
other. More about both later on.
These shows are where you meet the folks who wear the faces that you can
attach to the names that you've been reading in magazines and on CompuServe.
As you walk around the show and talk to people, they always stare at your
chest first. This is to read your attendee's badge to see if they've heard of
you, but now I know how Dolly Parton must feel.
The show included several exhibitors demonstrating some new "virtual reality"
devices. I'm not sure what that had to do with programming, but it looked like
it could be fun. People were sticking their heads into Darth Vader helmets and
jerking joy sticks and their heads around. They were apparently simulating
movement through space in three dimensions and seeing it all in the viewers of
those helmets. It looked like it could have serious application in flight
simulators and such. I tried one on. I was not particularly impressed with the
simulation, but this stuff is new. Then Maggie Dunphy of Innovative Data
Concepts expressed concern about head lice. "You never know who's been wearing
that thing in a public show like this!" She was looking at some passing souls
whose grooming, coiffure, and accoutrements recall the early days of personal
computing--or is it just the recession? My kind of programmer, anyway. I
hastily took the helmet off and didn't try on any of the others. We all went
to the bar and had some California wheat beer. I scratched my scalp from time
to time. Couldn't help it.
P.J. Plauger presented two sessions at SD '92. One was on how not to develop
"shelfware"--software destined for the shelf because the user will not use it.
The other presentation was about a C++ function library that Dr. Plauger is
designing to follow his book, The Standard C Library (Prentice Hall, 1992). He
is, as usual, taking the pragmatist's view. He won't be trying to force-feed
us some pure, all-encompassing, object-oriented class library that defines the
world and all its entrails and appendages. Instead, he promises to address
those areas where the features of C++ can be used to provide improvements to
the functions of standard C. For example, the language wouldn't need strcpy,
strncpy, and every other distinct string-moving function. By overloading a
single function, the library would give us just one to remember. Its behavior
would depend on which data types you wanted to move. And, for purposes of
compatibility with the past, the C++ library would include all the standard C
library functions. Something to look forward to. Here's something else. Dr.
Plauger told me that he and Brian Kernighan are working on a C version of
their book, Elements of Programming Style.
The Free Software Foundation had a booth, and Richard Stallman gave me a
button that said "Keep your lawyers out of my computer." He said I could have
it only if I would wear it and if I knew what it meant. I said that I would
and I did, and then he asked for a donation to help defray his expenses. For a
second there I had to look around to make sure I wasn't at the airport. I gave
him a couple of bucks and wore the button because I agree with its sentiment.
It's logo is a snake with an apple for a head.
Finally, Liz Oakley came out of the closet. That doesn't mean what it usually
means, but I'm not telling.


D-Flat Dialog Boxes


A dialog box is a window that has one or more control windows with which the
user enters commands and data. In the September 1991 column I described how
you define your dialog boxes with macros. Last month we discussed the
APPLICATION window class, and you saw how the program uses dialog boxes. This
month we will look at the code that implements the dialog box itself. Next
month we will look at the code that implements the control windows.
Listing One, page 148, is dialbox.c. It includes the DialogBox function that
an application calls to open a dialog box and the code to build and manage the
dialog box. If a dialog box is "modal," it keeps the focus contained within
itself while it is open. You cannot get out to a menu item or a document
window without closing the dialog box. If a dialog box is "modeless," it works
like any other document window. You open them both by calling the same
function. The DialogBox function for a modeless dialog box returns as soon as
it has created and displayed the dialog box. The function for a modal dialog
box does not return until the user closes the dialog box.
The first function in dialbox.c is the ClearDialogBoxes function. The
text-control windows in a dialog box retain the text values that the user
enters even after the window closes. This allows the text entries to persist
for subsequent uses of the dialog box. These text values are on the far heap.
Usually DOS frees any heap allocations when the program terminates, but I
added this function so that I could use D-Flat in TSR applications. D-Flat
calls the ClearDialogBoxes function when the application is done, and the
function frees all heap allocations that are still open.
The DialogBox function is the one that the application calls to open a dialog
box. The function creates the dialog-box window, displays it, sets the focus
to its first control window, and sends the dialog-box window the
INITIATE_DIALOG message. If the dialog box is modeless, the function returns.
If the dialog box is modal, the function captures the mouse and keyboard into
the dialog-box window and enters the D-Flat message dispatching loop. The loop
will not return until the dialog box gets closed, which will happen as the
result of the actions taken in the dialog box's default window-processing
module.


Dialog-box Messages


The dialog box can have a custom window-processing module, which the
application provides. That is where its commands will be intercepted and
processed. The default window-processing module in dialbox.c is DialogProc. It
intercepts the messages and processes them.
The CREATE_WINDOW message builds the table of dialog boxes that the
ClearDialogBox function will use. Then it chains to the window-processing
module of the base window class to create the window. A dialog box has a
number of control windows. This message then creates each of the control
windows and initializes the text-based controls with the text that the
dialog-box definition specifies.
The SETFOCUS message bypasses the usual D-Flat logic when setting the focus to
a modal dialog box. It sends a message to clear the focus from whatever window
currently has the focus and sets the inFocus variable.
Modal dialog boxes ignore the SHIFT_CHANGED message to prevent D-Flat from
trying to give the focus to the menu bar when the user presses and releases
the Alt key.
The dialog box processes the intercepted mouse LEFT_BUTTON message for
combo-box and spin-button controls. If the mouse hits the scroll buttons for
those controls, the program sends the message to the control windows
themselves.
If the user presses F1, the KEY-BOARD message displays the help window
associated with whichever control has the focus. The Shift+Tab, Backspace, and
Up keys give the focus to the control that precedes the one with the current
focus, while the Alt+F6, Tab, right-arrow, and down-arrow keys give the focus
to the control that follows the current one. The Ctrl+F4 and Esc keys send the
ID_CANCEL command to the dialog box. All other keystrokes are matched against
the shortcut keys for the dialog box to see if the user is trying to set the
focus to one of them.
If the user chooses the OK or Cancel commands, the COMMAND message sets a
global variable to indicate which one, and posts the END_DIALOG message to a
modal dialog box or sends the CLOSE_WINDOW message to a modeless dialog box.
The CLOSE_WINDOW message sends the ID_CANCEL command to the dialog box before
closing the window.


Other Dialog-box Functions


Several other dialog-box functions support the control windows or the
application. The FindCommand function returns the control structure for the
control that contains the specified command code. The ControlWindow function
returns the window handle of a specified control. Some controls, such as check
boxes, are either on or off. The ControlSetting function sets the state of
those controls. There are several functions that get and set the text values
for dialog-box controls.


Control Window Messages


The ControlProc function is a generic window-processing module for all control
windows. Each class of control window will have its own window-processing
module, but the ControlProc function gets first shot at the messages for all
controls.
The CREATE_WINDOW message moves the address of the control structure from the
extension field to its own field in the window structure.
If the user presses F1, the KEYBOARD message displays the help window
associated with the control. Alt+F6, Ctrl +F4, and Alt+F4 are posted to the
dialog-box window. The arrow keys are converted to the keys that change the
focus to adjacent controls, unless the control is an edit box or list box. The
Enter key is converted to the ID_OK command except for multiline edit boxes
and buttons. The PAINT message adds scroll bars from text windows if the width
or length of the text exceeds the dimensions of the windows. The BORDER
message prevents edit boxes from displaying the double-line border when they
get the focus. The SET-FOCUS message sends the ENTERFOCUS or LEAVEFOCUS
commands to the dialog box. The CLOSE_WINDOW command updates the initializing
text values of text-based control windows if the user chose the OK command.


How to Get D-Flat Now



The D-Flat source code is on CompuServe in Library 0 of the DDJ Forum and on
M&T Online. If you cannot use either online service, send a formatted 360K or
720K diskette and an addressed, stamped diskette mailer to me in care of Dr.
Dobb's Journal, 411 Borel Ave., San Mateo, CA 94402. I'll send you the latest
version of D-Flat. The software is free, but if you'd care to, stuff a dollar
bill in the mailer for the Brevard County Food Bank. They help the homeless
and hungry. We call it DDJ's "careware" program. If you want to discuss D-Flat
with me, use CompuServe. My ID is 71101,1262, and I monitor the DDJ Forum
daily.


Some D-Flat++ Design Issues


I am brainstorming the architecture of what I will call D-Flat++, or DF++,
until I come up with a better name. There are a number of design
considerations to ponder. First among them are my objectives for DF++, which
are:
A C++ class library that implements the D-Flat user interface
Code that compiles with Borland, Microsoft, "TopSpeed, and Zortech C++
compilers
An object-oriented class library for such a system will have many things in
common with an event-driven, message-based architecture, and it will have many
things that are different, too. I have a reasonable early idea of how the API
will look. A program will declare a window variable, and the window will come
into view. The program will modify the window's appearance by sending it
messages and its behavior by adding new messages. How will the program do
that?


Processing Messages


In the D-Flat API, a program provides a message-processing function when it
creates a window. The message-processing function intercepts the messages that
it wants to process and passes the others to the next message-processing
function up the class tree. This procedure resembles polymorphism, but the
language is not implementing it; the programmer is.
In a C++ class hierarchy, a base class can declare a virtual member function.
When a derived class declares a member function with the same name and
parameter list, the derived class function executes when the program calls the
function in the name of an object of the derived class. Let's keep this
terminology straight. Calling member functions is how a C++ program sends a
message to the object. It means the same thing.
So, it would follow that the way to implement a window class that has its own
messages is to derive a class and add some message functions to it. Remember
that the class at the bottom of the hierarchy gets first whack at every
message, so every message function must be virtual to support subsequent
derived classes.
There is, however, a wrinkle. In the message-based architecture of D-Flat, a
particular window class does not need to know about all the messages--only
those it intercepts. It passes all others up the hierarchy. It can do that
because a message is a generic data construct, and its meaning is interpreted
by the class functions that process it. A message in C++ is a function. Assume
a class hierarchy of A->B->C, with A as the topmost base class. If A and C
both process a particular message, about which B is ignorant, then C must know
to pass the message to A when C is done with it. That means that when you
design a derived class, you must know a lot about the classes up the
organization, which flies at Mach 1 straight into the face of the traditional
concept of the abstract black box. The bright side is that the object-oriented
approach eliminates a severe level of overhead. Every D-Flat message goes
through the message-processing function of every class up the tree until one
of them intercepts it and processes it. The DF++ message approach will send
the message to the lowest function in the tree that is interested in it. The
lower class functions will never see it. If that function needs to pass the
message up, the message will go directly to the next interested class. Whether
or not the virtual-table mechanism of C++ imposes a more severe overhead than
the one we will eliminate remains to be seen. One thing is for sure, though.
You'll spend less time in your debugger stepping through layers of
message-processing functions that drop out the bottom of a big switch
statement, only to pass the message on to the next class.


Identifying Messages


In D-Flat, a message has an enumerated value. In C++, a message has a function
name. One of the D-Flat messages is COMMAND, which menus and dialog boxes send
to say that the user has chosen a menu item, pressed a command button, and so
on. Each discrete command has an enumerated value, too. When you design a menu
or dialog box, you associate the command value with the user's action. The
application or dialog window has a unique message-processing function that
intercepts the COMMAND message, figures out what command action occurred, and
executes some code to respond to the command.
Ideally, a window class would have a member function for each command, and the
design of a menu or dialog box would associate that member function with the
user's action. The D-Flat API is similar to the Windows API in many ways, so
it would be helpful to see how some of the C++ class libraries for Windows do
it.
The Foundation. The Microsoft Foundation Class Library uses what they call a
"message map." They retain the Windows messages and command codes, and they
associate them in the message map with macro statements such as this:
ON_COMMAND(IDM_NEW, OnNew)
The OnNew parameter in that macro names the member function that represents
the COMMAND message for the IDM_NEW command.
OWL. The Borland ObjectWindows class library takes a different approach.
Borland has extended the syntax of the C++ language to specify the declaration
of a "message-response member function." The declaration is a public member
function of the class that looks like this:
 virtual void CMFileNew(RtMessage Msg) [CM_FIRST+CM_FILENEW];
C++/Views. The C++/Views class library from CNS uses yet another approach. An
application program instantiates a pop-down menu and sends it a message with a
pointer to an array of structures--one for each item on the menu. The
structure includes a reference to the application window and a function
pointer to call when the user chooses the menu item.
Remember, however, that all these class libraries are C++ wrappers around the
event-driven, message-based Windows API, and so they necessarily retain some
of the characteristics of that platform. The DF++ objective is to rewrite the
underlying user-interface software as well as its API in C++, so we are not
bound to any existing methods and architecture. We'll see how it goes.


Other Classes


There are no standards for C++ classes. Stroustrup ignored the issue. The only
de facto standard is the IOSTREAM package that AT&T distributes and that most
compilers include. Yet there are some obvious needs. Many compilers include
classes for strings, containers, and such, but there is no consensus on format
and implementation. Some will disagree, but, in my opinion, the single most
valuable improvement that C++ brings to C is the ability for a programmer to
extend the language by adding data types. There are other improvements, of
course, but if you never used any of them and if you never wrote an
object-oriented program, this one feature would make C++ worth using. But
where are the classes? The DF++ project begins with no standard library except
the C standard library, and, therefore, no standard classes, and I must decide
what to do about that. One option would be to follow Bill Plauger's progress
in that area. Your thoughts would be appreciated.


_C PROGRAMMING COLUMN_
by Al Stevens


[LISTING ONE]

/* ----------------- dialbox.c -------------- */

#include "dflat.h"

static int inFocusCommand(DBOX *);
static void dbShortcutKeys(DBOX *, int);
static int ControlProc(WINDOW, MESSAGE, PARAM, PARAM);
static void ChangeFocus(WINDOW, int);
static CTLWINDOW *AssociatedControl(DBOX *, enum commands);


static BOOL SysMenuOpen;

static DBOX **dbs = NULL;
static int dbct = 0;

/* --- clear all heap allocations to control text fields --- */
void ClearDialogBoxes(void)
{
 int i;
 for (i = 0; i < dbct; i++) {
 CTLWINDOW *ct = (*(dbs+i))->ctl;
 while (ct->class) {
 if ((ct->class == EDITBOX 
 ct->class == COMBOBOX) &&
 ct->itext != NULL)
 free(ct->itext);
 ct++;
 }
 }
 if (dbs != NULL) {
 free(dbs);
 dbs = NULL;
 }
 dbct = 0;
}

/* -------- CREATE_WINDOW Message --------- */
static int CreateWindowMsg(WINDOW wnd, PARAM p1, PARAM p2)
{
 DBOX *db = wnd->extension;
 CTLWINDOW *ct = db->ctl;
 WINDOW cwnd;
 int rtn, i;
 /* ---- build a table of processed dialog boxes ---- */
 for (i = 0; i < dbct; i++)
 if (db == dbs[i])
 break;
 if (i == dbct) {
 dbs = realloc(dbs, sizeof(DBOX *) * (dbct+1));
 if (dbs != NULL)
 *(dbs + dbct++) = db;
 }
 rtn = BaseWndProc(DIALOG, wnd, CREATE_WINDOW, p1, p2);
 ct = db->ctl;
 while (ct->class) {
 int attrib = 0;
 if (TestAttribute(wnd, NOCLIP))
 attrib = NOCLIP;
 if (wnd->Modal)
 attrib = SAVESELF;
 ct->setting = ct->isetting;
 if (ct->class == EDITBOX && ct->dwnd.h > 1)
 attrib = (MULTILINE HASBORDER);
 else if (ct->class == LISTBOX ct->class == TEXTBOX)
 attrib = HASBORDER;
 cwnd = CreateWindow(ct->class,
 ct->dwnd.title,
 ct->dwnd.x+GetClientLeft(wnd),
 ct->dwnd.y+GetClientTop(wnd),

 ct->dwnd.h,
 ct->dwnd.w,
 ct,
 wnd,
 ControlProc,
 attrib);
 if ((ct->class == EDITBOX 
 ct->class == COMBOBOX) &&
 ct->itext != NULL)
 SendMessage(cwnd, SETTEXT, (PARAM) ct->itext, 0);
 if (ct->class != BOX &&
 ct->class != TEXT &&
 wnd->dFocus == NULL)
 wnd->dFocus = ct;
 ct++;
 }
 return rtn;
}

/* -------- LEFT_BUTTON Message --------- */
static BOOL LeftButtonMsg(WINDOW wnd, PARAM p1, PARAM p2)
{
 DBOX *db = wnd->extension;
 CTLWINDOW *ct = db->ctl;
 if (WindowSizing WindowMoving)
 return TRUE;
 if (HitControlBox(wnd, p1-GetLeft(wnd), p2-GetTop(wnd))) {
 PostMessage(wnd, KEYBOARD, ' ', ALTKEY);
 return TRUE;
 }
 while (ct->class) {
 WINDOW cwnd = ct->wnd;
 if (ct->class == COMBOBOX) {
 if (p2 == GetTop(cwnd)) {
 if (p1 == GetRight(cwnd)+1) {
 SendMessage(cwnd, LEFT_BUTTON, p1, p2);
 return TRUE;
 }
 }
 if (GetClass(inFocus) == LISTBOX)
 SendMessage(wnd, SETFOCUS, TRUE, 0);
 }
 else if (ct->class == SPINBUTTON) {
 if (p2 == GetTop(cwnd)) {
 if (p1 == GetRight(cwnd)+1 
 p1 == GetRight(cwnd)+2) {
 SendMessage(cwnd, LEFT_BUTTON, p1, p2);
 return TRUE;
 }
 }
 }
 ct++;
 }
 return FALSE;
}

/* -------- KEYBOARD Message --------- */
static BOOL KeyboardMsg(WINDOW wnd, PARAM p1, PARAM p2)
{

 DBOX *db = wnd->extension;
 CTLWINDOW *ct = db->ctl;

 if (WindowMoving WindowSizing)
 return FALSE;
 switch ((int)p1) {
 case F1:
 ct = wnd->dFocus;
 if (ct != NULL)
 if (DisplayHelp(wnd, ct->help))
 return TRUE;
 break;
 case SHIFT_HT:
 case BS:
 case UP:
 ChangeFocus(wnd, FALSE);
 break;
 case ALT_F6:
 case '\t':
 case FWD:
 case DN:
 ChangeFocus(wnd, TRUE);
 break;
 case ' ':
 if (((int)p2 & ALTKEY) &&
 TestAttribute(wnd, CONTROLBOX)) {
 SysMenuOpen = TRUE;
 BuildSystemMenu(wnd);
 }
 break;
 case CTRL_F4:
 case ESC:
 SendMessage(wnd, COMMAND, ID_CANCEL, 0);
 break;
 default:
 /* ------ search all the shortcut keys ----- */
 dbShortcutKeys(db, (int) p1);
 break;
 }
 return wnd->Modal;
}

/* -------- COMMAND Message --------- */
static BOOL CommandMsg(WINDOW wnd, PARAM p1, PARAM p2)
{
 DBOX *db = wnd->extension;
 switch ((int) p1) {
 case ID_OK:
 case ID_CANCEL:
 if ((int)p2 != 0)
 return TRUE;
 wnd->ReturnCode = (int) p1;
 if (wnd->Modal)
 PostMessage(wnd, ENDDIALOG, 0, 0);
 else
 SendMessage(wnd, CLOSE_WINDOW, TRUE, 0);
 return TRUE;
 case ID_HELP:
 if ((int)p2 != 0)

 return TRUE;
 return DisplayHelp(wnd, db->HelpName);
 default:
 break;
 }
 return FALSE;
}

/* ----- window-processing module, DIALOG window class ----- */
int DialogProc(WINDOW wnd, MESSAGE msg, PARAM p1, PARAM p2)
{
 DBOX *db = wnd->extension;

 switch (msg) {
 case CREATE_WINDOW:
 return CreateWindowMsg(wnd, p1, p2);
 case SETFOCUS:
 if (wnd->Modal) {
 if (p1)
 SendMessage(inFocus, SETFOCUS, FALSE, 0);
 inFocus = p1 ? wnd : NULL;
 return TRUE;
 }
 break;
 case SHIFT_CHANGED:
 if (wnd->Modal)
 return TRUE;
 break;
 case LEFT_BUTTON:
 if (LeftButtonMsg(wnd, p1, p2))
 return TRUE;
 break;
 case KEYBOARD:
 if (KeyboardMsg(wnd, p1, p2))
 return TRUE;
 break;
 case CLOSE_POPDOWN:
 SysMenuOpen = FALSE;
 break;
 case LB_SELECTION:
 case LB_CHOOSE:
 if (SysMenuOpen)
 return TRUE;
 SendMessage(wnd, COMMAND, inFocusCommand(db), msg);
 break;
 case COMMAND:
 if (CommandMsg(wnd, p1, p2))
 return TRUE;
 break;
 case PAINT:
 p2 = TRUE;
 break;
 case CLOSE_WINDOW:
 if (!p1) {
 SendMessage(wnd, COMMAND, ID_CANCEL, 0);
 return TRUE;
 }
 break;
 default:

 break;
 }
 return BaseWndProc(DIALOG, wnd, msg, p1, p2);
}

/* ------- create and execute a dialog box ---------- */
BOOL DialogBox(WINDOW wnd, DBOX *db, BOOL Modal,
 int (*wndproc)(struct window *, enum messages, PARAM, PARAM))
{
 BOOL rtn;
 int x = db->dwnd.x, y = db->dwnd.y;
 CTLWINDOW *ct;
 WINDOW oldFocus = inFocus;
 WINDOW DialogWnd;

 if (!Modal && wnd != NULL) {
 x += GetLeft(wnd);
 y += GetTop(wnd);
 }
 DialogWnd = CreateWindow(DIALOG,
 db->dwnd.title,
 x, y,
 db->dwnd.h,
 db->dwnd.w,
 db,
 wnd,
 wndproc,
 Modal ? SAVESELF : 0);
 DialogWnd->Modal = Modal;
 SendMessage(inFocus, SETFOCUS, FALSE, 0);
 SendMessage(DialogWnd, SHOW_WINDOW, 0, 0);
 SendMessage(((CTLWINDOW *)(DialogWnd->dFocus))->wnd,
 SETFOCUS, TRUE, 0);
 SendMessage(DialogWnd, INITIATE_DIALOG, 0, 0);
 if (Modal) {
 SendMessage(DialogWnd, CAPTURE_MOUSE, 0, 0);
 SendMessage(DialogWnd, CAPTURE_KEYBOARD, 0, 0);
 while (dispatch_message())
 ;
 rtn = DialogWnd->ReturnCode == ID_OK;
 SendMessage(DialogWnd, RELEASE_MOUSE, 0, 0);
 SendMessage(DialogWnd, RELEASE_KEYBOARD, 0, 0);
 SendMessage(inFocus, SETFOCUS, FALSE, 0);
 SendMessage(DialogWnd, CLOSE_WINDOW, TRUE, 0);
 SendMessage(oldFocus, SETFOCUS, TRUE, 0);
 if (rtn) {
 ct = db->ctl;
 while (ct->class) {
 ct->wnd = NULL;
 if (ct->class == RADIOBUTTON 
 ct->class == CHECKBOX)
 ct->isetting = ct->setting;
 ct++;
 }
 }
 return rtn;
 }
 return FALSE;
}


/* ----- return command code of in-focus control window ---- */
static int inFocusCommand(DBOX *db)
{
 CTLWINDOW *ct = db->ctl;
 while (ct->class) {
 if (ct->wnd == inFocus)
 return ct->command;
 ct++;
 }
 return -1;
}

/* -------- find a specified control structure ------- */
CTLWINDOW *FindCommand(DBOX *db, enum commands cmd, int class)
{
 CTLWINDOW *ct = db->ctl;
 while (ct->class) {
 if (ct->class == class)
 if (cmd == ct->command)
 return ct;
 ct++;
 }
 return NULL;
}

/* ---- return the window handle of a specified command ---- */
WINDOW ControlWindow(DBOX *db, enum commands cmd)
{
 CTLWINDOW *ct = db->ctl;
 while (ct->class) {
 if (ct->class != TEXT && cmd == ct->command)
 return ct->wnd;
 ct++;
 }
 return NULL;
}

/* ---- set a control ON or OFF ----- */
void ControlSetting(DBOX *db, enum commands cmd,
 int class, int setting)
{
 CTLWINDOW *ct = FindCommand(db, cmd, class);
 if (ct != NULL)
 ct->isetting = setting;
}

/* ---- return pointer to the text of a control window ---- */
char *GetDlgTextString(DBOX *db,enum commands cmd,CLASS class)
{
 CTLWINDOW *ct = FindCommand(db, cmd, class);
 if (ct != NULL)
 return ct->itext;
 else
 return NULL;
}

/* ------- set the text of a control specification ------ */
void SetDlgTextString(DBOX *db, enum commands cmd,

 char *text, CLASS class)
{
 CTLWINDOW *ct = FindCommand(db, cmd, class);
 if (ct != NULL) {
 ct->itext = realloc(ct->itext, strlen(text)+1);
 if (ct->itext != NULL)
 strcpy(ct->itext, text);
 }
}

/* ------- set the text of a control window ------ */
void PutItemText(WINDOW wnd, enum commands cmd, char *text)
{
 CTLWINDOW *ct = FindCommand(wnd->extension, cmd, EDITBOX);

 if (ct == NULL)
 ct = FindCommand(wnd->extension, cmd, TEXTBOX);
 if (ct == NULL)
 ct = FindCommand(wnd->extension, cmd, COMBOBOX);
 if (ct == NULL)
 ct = FindCommand(wnd->extension, cmd, LISTBOX);
 if (ct == NULL)
 ct = FindCommand(wnd->extension, cmd, SPINBUTTON);
 if (ct == NULL)
 ct = FindCommand(wnd->extension, cmd, TEXT);
 if (ct != NULL) {
 WINDOW cwnd = (WINDOW) (ct->wnd);
 switch (ct->class) {
 case COMBOBOX:
 case EDITBOX:
 SendMessage(cwnd, CLEARTEXT, 0, 0);
 SendMessage(cwnd, ADDTEXT, (PARAM) text, 0);
 if (!isMultiLine(cwnd))
 SendMessage(cwnd, PAINT, 0, 0);
 break;
 case LISTBOX:
 case TEXTBOX:
 case SPINBUTTON:
 SendMessage(cwnd, ADDTEXT, (PARAM) text, 0);
 break;
 case TEXT: {
 SendMessage(cwnd, CLEARTEXT, 0, 0);
 SendMessage(cwnd, ADDTEXT, (PARAM) text, 0);
 SendMessage(cwnd, PAINT, 0, 0);
 break;
 }
 default:
 break;
 }
 }
}

/* ------- get the text of a control window ------ */
void GetItemText(WINDOW wnd, enum commands cmd,
 char *text, int len)
{
 CTLWINDOW *ct = FindCommand(wnd->extension, cmd, EDITBOX);
 unsigned char *cp;


 if (ct == NULL)
 ct = FindCommand(wnd->extension, cmd, COMBOBOX);
 if (ct == NULL)
 ct = FindCommand(wnd->extension, cmd, TEXTBOX);
 if (ct == NULL)
 ct = FindCommand(wnd->extension, cmd, TEXT);
 if (ct != NULL) {
 WINDOW cwnd = (WINDOW) (ct->wnd);
 if (cwnd != NULL) {
 switch (ct->class) {
 case TEXT:
 if (GetText(cwnd) != NULL) {
 cp = strchr(GetText(cwnd), '\n');
 if (cp != NULL)
 len = (int) (cp - GetText(cwnd));
 strncpy(text, GetText(cwnd), len);
 *(text+len) = '\0';
 }
 break;
 case TEXTBOX:
 if (GetText(cwnd) != NULL)
 strncpy(text, GetText(cwnd), len);
 break;
 case COMBOBOX:
 case EDITBOX:
 SendMessage(cwnd,GETTEXT,(PARAM)text,len);
 break;
 default:
 break;
 }
 }
 }
}

/* ------- set the text of a listbox control window ------ */
void GetDlgListText(WINDOW wnd, char *text, enum commands cmd)
{
 CTLWINDOW *ct = FindCommand(wnd->extension, cmd, LISTBOX);
 int sel = SendMessage(ct->wnd, LB_CURRENTSELECTION, 0, 0);
 SendMessage(ct->wnd, LB_GETTEXT, (PARAM) text, sel);
}

/* -- find control structure associated with text control -- */
static CTLWINDOW *AssociatedControl(DBOX *db,enum commands Tcmd)
{
 CTLWINDOW *ct = db->ctl;
 while (ct->class) {
 if (ct->class != TEXT)
 if (ct->command == Tcmd)
 break;
 ct++;
 }
 return ct;
}

/* --- process dialog box shortcut keys --- */
static void dbShortcutKeys(DBOX *db, int ky)
{
 CTLWINDOW *ct;

 int ch = AltConvert(ky);

 if (ch != 0) {
 ct = db->ctl;
 while (ct->class) {
 char *cp = ct->itext;
 while (cp && *cp) {
 if (*cp == SHORTCUTCHAR &&
 tolower(*(cp+1)) == ch) {
 if (ct->class == TEXT)
 ct = AssociatedControl(db, ct->command);
 if (ct->class == RADIOBUTTON)
 SetRadioButton(db, ct);
 else if (ct->class == CHECKBOX) {
 ct->setting ^= ON;
 SendMessage(ct->wnd, PAINT, 0, 0);
 }
 else if (ct->class) {
 SendMessage(ct->wnd, SETFOCUS, TRUE, 0);
 if (ct->class == BUTTON)
 SendMessage(ct->wnd,KEYBOARD,'\r',0);
 }
 return;
 }
 cp++;
 }
 ct++;
 }
 }
}

/* --- dynamically add or remove scroll bars
 from a control window ---- */
void SetScrollBars(WINDOW wnd)
{
 int oldattr = GetAttribute(wnd);
 if (wnd->wlines > ClientHeight(wnd))
 AddAttribute(wnd, VSCROLLBAR);
 else
 ClearAttribute(wnd, VSCROLLBAR);
 if (wnd->textwidth > ClientWidth(wnd))
 AddAttribute(wnd, HSCROLLBAR);
 else
 ClearAttribute(wnd, HSCROLLBAR);
 if (GetAttribute(wnd) != oldattr)
 SendMessage(wnd, BORDER, 0, 0);
}

/* ------- CREATE_WINDOW Message (Control) ----- */
static void CtlCreateWindowMsg(WINDOW wnd)
{
 CTLWINDOW *ct;
 ct = wnd->ct = wnd->extension;
 wnd->extension = NULL;
 if (ct != NULL)
 ct->wnd = wnd;
}

/* ------- KEYBOARD Message (Control) ----- */

static BOOL CtlKeyboardMsg(WINDOW wnd, PARAM p1, PARAM p2)
{
 CTLWINDOW *ct = GetControl(wnd);
 switch ((int) p1) {
 case F1:
 if (WindowMoving WindowSizing)
 break;
 if (!DisplayHelp(wnd, ct->help))
 SendMessage(GetParent(wnd),COMMAND,ID_HELP,0);
 return TRUE;
 case ' ':
 if (!((int)p2 & ALTKEY))
 break;
 case ALT_F6:
 case CTRL_F4:
 case ALT_F4:
 PostMessage(GetParent(wnd), KEYBOARD, p1, p2);
 return TRUE;
 default:
 break;
 }
 if (GetClass(wnd) == EDITBOX)
 if (isMultiLine(wnd))
 return FALSE;
 switch ((int) p1) {
 case UP:
 if (!isDerivedFrom(wnd, LISTBOX)) {
 p1 = CTRL_FIVE;
 p2 = LEFTSHIFT;
 }
 break;
 case BS:
 if (!isDerivedFrom(wnd, EDITBOX)) {
 p1 = CTRL_FIVE;
 p2 = LEFTSHIFT;
 }
 break;
 case DN:
 if (!isDerivedFrom(wnd, LISTBOX) &&
 !isDerivedFrom(wnd, COMBOBOX))
 p1 = '\t';
 break;
 case FWD:
 if (!isDerivedFrom(wnd, EDITBOX))
 p1 = '\t';
 break;
 case '\r':
 if (isDerivedFrom(wnd, EDITBOX))
 if (isMultiLine(wnd))
 break;
 if (isDerivedFrom(wnd, BUTTON))
 break;
 SendMessage(GetParent(wnd), COMMAND, ID_OK, 0);
 return TRUE;
 default:
 break;
 }
 return FALSE;
}


/* ------- CLOSE_WINDOW Message (Control) ----- */
static void CtlCloseWindowMsg(WINDOW wnd)
{
 CTLWINDOW *ct = GetControl(wnd);
 if (ct != NULL) {
 if (GetParent(wnd)->ReturnCode == ID_OK &&
 (ct->class == EDITBOX 
 ct->class == COMBOBOX)) {
 if (wnd->TextChanged) {
 ct->itext=realloc(ct->itext,strlen(wnd->text)+1);
 strcpy(ct->itext, wnd->text);
 if (!isMultiLine(wnd)) {
 char *cp = ct->itext+strlen(ct->itext)-1;
 if (*cp == '\n')
 *cp = '\0';
 }
 }
 }
 }
}

/* -- generic window processor used by dialog box controls -- */
static int ControlProc(WINDOW wnd,MESSAGE msg,PARAM p1,PARAM p2)
{
 DBOX *db;
 CTLWINDOW *ct;

 if (wnd == NULL)
 return FALSE;
 db = GetParent(wnd) ? GetParent(wnd)->extension : NULL;
 ct = GetControl(wnd);

 switch (msg) {
 case CREATE_WINDOW:
 CtlCreateWindowMsg(wnd);
 break;
 case KEYBOARD:
 if (CtlKeyboardMsg(wnd, p1, p2))
 return TRUE;
 break;
 case PAINT:
 if (GetClass(wnd) == EDITBOX 
 GetClass(wnd) == LISTBOX 
 GetClass(wnd) == TEXTBOX)
 SetScrollBars(wnd);
 break;
 case BORDER:
 if (GetClass(wnd) == EDITBOX) {
 WINDOW oldFocus = inFocus;
 inFocus = NULL;
 DefaultWndProc(wnd, msg, p1, p2);
 inFocus = oldFocus;
 return TRUE;
 }
 break;
 case SETFOCUS:
 if (p1) {
 DefaultWndProc(wnd, msg, p1, p2);

 GetParent(wnd)->dFocus = ct;
 SendMessage(GetParent(wnd), COMMAND,
 inFocusCommand(db), ENTERFOCUS);
 return TRUE;
 }
 else
 SendMessage(GetParent(wnd), COMMAND,
 inFocusCommand(db), LEAVEFOCUS);
 break;
 case CLOSE_WINDOW:
 CtlCloseWindowMsg(wnd);
 break;
 default:
 break;
 }
 return DefaultWndProc(wnd, msg, p1, p2);
}

/* ---- change the focus to the next or previous control --- */
static void ChangeFocus(WINDOW wnd, int direc)
{
 DBOX *db = wnd->extension;
 CTLWINDOW *ct = db->ctl;
 CTLWINDOW *ctt;

 /* --- find the control that has the focus --- */
 while (ct->class) {
 if (ct == wnd->dFocus)
 break;
 ct++;
 }
 if (ct->class) {
 ctt = ct;
 do {
 /* ----- point to next or previous control ----- */
 if (direc) {
 ct++;
 if (ct->class == 0)
 ct = db->ctl;
 }
 else {
 if (ct == db->ctl)
 while (ct->class)
 ct++;
 --ct;
 }

 if (ct->class != BOX && ct->class != TEXT) {
 SendMessage(ct->wnd, SETFOCUS, TRUE, 0);
 SendMessage(ctt->wnd, PAINT, 0, 0);
 SendMessage(ct->wnd, PAINT, 0, 0);
 break;
 }
 } while (ct != ctt);
 }
}

void SetFocusCursor(WINDOW wnd)
{

 if (wnd == inFocus) {
 SendMessage(NULL, SHOW_CURSOR, 0, 0);
 SendMessage(wnd, KEYBOARD_CURSOR, 1, 0);
 }
}

























































June, 1992
STRUCTURED PROGRAMMING


Roots Grow Too




Jeff Duntemann, KG7JF


I slid into the bench-style driver's seat and took the hard thin wheel with
both hands for a moment. Then, one twist of the key, and from under the hood
came a roar so familiar it close to brought tears to my eyes.
Shakespeare has returned.
Welcome back, old friend.
It's been 18 years since I've driven a Chevelle, but there's one in my garage
again. White '69 Malibu 307 V8, with a sharp turquoise interior and lots more
chrome than my original bottom-of-the-line '68. While the new Shakespeare has
95,000 miles on him, beneath his dents, a pinched rear bumper, and 23 years of
Arizona dirt lies not a particle of rust.
And the old familiar spirit is there. I feel it on the curves taken under
power--the smooth assurance of GM's most beautiful Chevy. It was there before,
and it's there again.
Shakespeare and I went a lot of places together--to college, to Washington in
April of '71 to protest the madness in Vietnam, to the mouth of the St.
Lawrence to photograph a solar eclipse, to see a ghost (I'll tell that story
some day), to work, and finally to the junkyard when there was practically
nothing left.
If you want to know all about how a boy became a man, ask his first car.
Fortunately for me and a lot of other ex boys, cars keep their secrets well.


Back to Basic


I expect to take a lot of ribbing about Shakespeare--sure, yeah, another
balding Boomer searching for his lost youth in a fast car. Go ahead and
laugh--I find pieces of my lost youth all over the place, and I enjoy them
unapologetically, from the position of power that comes of making it to 39
intact, self-employed, and debt-free. I take what works, and leave behind what
doesn't. You can't go home again, but you can sometimes pick up your roots
cheap on the classic roots market.
And hey, roots grow too.
I gave up Basic for Pascal in 1981, after a brief fling with Forth that left
me trying to put my socks on over my shoes. Basic was pretty crude back then,
but it was appropriate technology: It worked, and worked without undue
complication on the memory-poor machines we could build 15 years ago.
But with the explosion of memory sizes in the mid-eighties came an explosion
of interest in true compilers. We seized on Pascal and C because we could--the
machines were abruptly able to support the "monstrous" 50K or 60K code files
that compilers produced. Basic became a compiler in time, and took its place
with them, but always as the poor-cousin "kiddie language" that true hackers
tried to wipe from their memories like bugs from their bumpers.


Crawling Through the Same Maze


Until very recently, straight-line native code compilers have ruled. This is
now changing, once again because the machines we use are changing. Instead of
a mostly empty 640K bottle with DOS lying at the bottom like your last slug of
Coke, the machine is now an 8-Mbyte web with Windows lurking in the middle
somewhere, putting tendrils down into anything and everything and controlling
it all.
For ten years computing has been bound by the speed of the CPU and the amount
of memory we had. Now, with 16-Mbyte, 33-MHz machines common-place, (and 486s
on the desks of the avant garde) we're bound more and more by the
computational overhead of the platform than by any other single factor.
Windows demands that things be done just its way--and all code, however
efficient, is forced to crawl through the same identical maze. I would argue
that any application that spends the majority of its time making Windows API
calls is interpreted, regardless of how the responsible code generator
operates. So for Windows work, does it matter whether code is compiled or
interpreted?
Not much. Really.
This is why, after an ice age lasting close to ten years, interpreters are
finding their place in the sun once again. They're not the same crude text or
token interpretation systems we saw in the original Basics. There's generally
some level of compilation to intermediate code going on, and if you really
need raw to-the-metal speed, there's almost always a provision to drop into
straight-line compiled C code or (better yet) pure assembly code.
I haven't provided much coverage of Windows development in this column, in
part because I've been watching the Windows tools market to see what trends
emerge, and in part because I'm still doing the research--anyone who tells you
that understanding the Windows platform is easy is probably intending to run
for Congress. This new trend toward interpreters under Windows is the first
clear trend I've seen, and it's worth a closer look.


Events and Interpreters


In broad terms, the last several columns have been about event-driven
programming. I've tested a lot of event-driven programming systems lately, and
against their many benefits I have to remind you that event-driven programming
is a serious eater of cycles. A broadcast event under Turbo Vision, for
example, sets off an explosion of procedure calls that ultimately reaches
every single object in the application with an event loop. Even a focused
event must travel from the application to the focused object, which is a
longer path more often than a shorter one.
It's absolutely valid to think of an event-driven application as an
interpreter of events, where the internal structure of the code resembles that
of our primordial Basic interpreter far more than it resembles the sleek
optimized straight line executables produced by products such as TopSpeed
Modula-2.
Furthermore, Windows virtually requires that an application be event-driven,
because Windows itself is event-driven. This forces the internal structure of
Windows applications into pretty much the same mold, and makes the
code-performance issue utterly different than what it is for traditional
programming under DOS.
I'm saying all this to make you understand that you don't automatically create
fast Windows applications by using C or C++. Windows application performance
depends overwhelmingly on difficult issues such as segment tuning and memory
management that are language-independent and depend far more on the
implementation of the language rather than on the language itself. In other
words, with Windows in control, you can write faster applications with a good
Basic than with an indifferent C.
So if you're contemplating moving to Windows, don't assume that you have to
use C or C++. There's a wonderful smorgasbord of structured languages
available for Windows, probably including the language you're currently using
under DOS. Porting same-language code to Windows from DOS isn't easy--but it's
certainly easier than trying to learn a new language and then port the code,
say, from Pascal under DOS to C under Windows.


Visual Basic


One of my early favorites among the new Windows languages is Visual Basic.
Microsoft has done a good job hiding most of the internal Windows machinery
that SDK-based development forces you to look in the eye, and the language's
few lapses are quickly being filled by enthusiastic third-party developers.
In its logistics, Visual Basic has a lot in common with Borland's Object
Vision, which I discussed in detail last month. Both products provide
sophisticated Integrated Development Environments (IDEs) essential for
application development--essential in that you no longer have the option of
sneering at the IDE and doing all your work from the command line.
Both products are form-based. A form is simply a window in which something
happens, and it escapes me why we don't just call them windows and be done
with it. The term "forms" implies that there is something special about them,
which there isn't.
An application may have any reasonable number of forms, which may invoke one
another as required. Both products allow you to design your forms with
on-screen drawing tools, using drag-and-drop strategies to choose controls
from a tool-selection menu of some sort and position those controls on your
form.
Both products allow you to define actions that components within the
application will take in response to certain events such as mouse clicks, or
getting or losing the focus. In Object Vision this is done in a very
innovative fashion by creating a graphical "event tree" that plots a logic
path to be taken when an event kicks off some action on the part of some
application component.

Event trees are sharp, and the concept has a rich future, but as they exist
today, OV's event trees are incomplete and somewhat limiting. For example,
there's no way at all to iterate within an event tree. You can branch based on
logical tests, but you can't loop, and this makes it agony to build certain
kinds of action into your OV programs.
Visual Basic has no such limitations. You create responses to VB events in
good old-fashioned Basic code, with all the control structures you're used to
having. You can define and use variables freely, and unlike Object Vision
(where you have only visible fields and not true variables, and where
everything is irritatingly global), the forms metaphor provides a scoping
mechanism that allows you to decide what should be local and what should be
global.
Visual Basic lacks the sophisticated, built-in database features of Object
Vision, but the third-party market is filling in that gap quickly, through
database engine products in the form of Windows DLLs. You can, in fact, call
Borland's Paradox Engine from Visual Basic if you choose, and I've been
playing with more than one SQL database engine to good effect.
Visual Basic lacks some necessary system-access features. It's impossible, for
example, to make DOS calls or sniff around in the DOS data areas in low
memory. Most of the time we do this to find things out about the system (such
as the number and types of drives installed, or how much free space is left on
them) and if VB is going to be hard nosed about system access, it really ought
to provide a way to query the system for any conceivable information a program
might need to know--dangerous or not.
Alas, PEEK and POKE are gone, but Visual Basic remains Basic, with some of
Basic's old bad habits. Reference a variable name that hasn't been declared,
(which I do frequently by misspelling the name of a declared variable) and VB
simply creates a new variable with that name. Mercifully, these unwanted
variables are local to the form in which they are referenced. Would that they
were local to the procedure instead!


Scope and Program Organization


Which brings us to an area in which Basic's roots have definitely grown:
Locality and scope. VB has it all over Object Vision in this regard, which is
especially near and dear to my heart. Scope is the soul of structured
programming, after all; the very first thing I investigate in a new language
is how it implements scoping and locality. Scoping in VB is closely tied to
the physical organization of VB programs, which is much more complex than
Basic programs of yore.
VB organizes applications under development as projects, and provides hidden
machinery to manage the files that comprise a project. The project has a name,
and this name is eventually applied to the .EXE file when you're finished
developing interactively and want to bundle the project's components together
into a single executable.
Each project consists of one or more form, windows that accept or display
information. You attach objects to forms to interact with the user. These are
usually controls such as buttons, text boxes, combo boxes, and so on, but also
include static text and some innovative things like timers; forms themselves
are considered objects as well. (You can't embed a form within a form,
however.)
Forms may contain data definitions and procedure definitions. Nearly all
objects generate events under some circumstances, and the event handlers for
these events must be written within the form containing the objects that
generate the events. The form already contains do-nothing stub event handler
procedures for every possible event the form or its objects may generate, and
you can add Basic statements to any or all of these event handlers at will.
A project may also incorporate modules, files of data definitions and
procedures that are not directly called by events, nor associated explicitly
with any form or object.
You can create new procedures inside a form, even if those procedures have
nothing to do with handling events. This isn't generally a good idea, although
it works. The rule is simple: Place code you intend to reuse inside modules;
code inside forms should consist of event handlers only.
I've sketched it out in Figure 1. Forms contain objects, event handlers, and
variable definitions; modules contain only Basic procedures and variable
definitions. Event handlers can and should call procedures in modules to do
most of their work. In other words, if a button-click event is supposed to
recalculate a mortgage table, the recalculation should be written as a
procedure in a mortgage module, not as part of a button event handler!
Procedures in modules can call procedures in other modules. It makes good
structured sense to keep this (or any) intermodule coupling to a minimum.


Locality


I haven't shown anything at a truly project-global level in Figure 1. Visual
Basic allows as an option a single global module in any given project, but the
global module may contain only variable, constant, or type definitions. (Yes,
Visual Basic allows programmer-defined types. Roots grow too!) I think it's
wise to use the global module for nothing but constant or type definitions.
Forms and modules may both contain local variables visible from any procedures
defined within those forms or modules. A procedure itself may define a local
variable visible only within that procedure. Procedures may not, however,
contain local procedures.
It's enough locality to be useful, and not so much (as in Pascal's unlimited
nestability) as to be confusing. A variable may be defined only where it needs
to be known: Within a procedure, to be used by that procedure only; within a
form or module, to be shared by that form or module's procedures, or (in dire
need) within the global module, where anything in the entire project can use
or abuse it.


The Visual Basic Development Process


This is a lot more structure than I'm used to having in Basic, but it's a fine
feeling indeed. The VB design process is straightforward: Sketch out your user
interface. I still use paper for my first pass or two, because on paper I can
sketch out connections between UI elements and modules. When you've decided on
the number of forms and the distribution of visible objects across them, start
drawing your forms.
It sounds silly, but you should do the interface first because it feels good,
and gives you a sense of getting somewhere. It may also indicate that your
forms are ugly or inconvenient before you spend a lot of time connecting the
UI objects to fleshed-out event handlers. But once I have my UI reasonably
defined, I take a piece of paper and sketch out a simple cause-effect diagram
for events. I show, for example, a button marked "Recalculate" with an effect
of calling the routine that amortizes a mortgage into a mortgage table. The
cause-effect diagram may help you get a handle on what should be in which
modules.
You may not identify every event that needs servicing until you're well into
development. Events can be like that. The nice part is that Visual Basic has
stubbed out all possible events harmlessly, so in most cases all that happens
is...nothing. You figure out what event you missed, and you flesh it out.


The Bug Problem


Visual Basic shares a knotty problem with every single event-driven
development system I've examined: Debug support is more suited to old-style,
non-event driven code. VB has most of the debugging features of Turbo Pascal,
although they're not as neatly packaged. You can set breakpoints, and you can
singlestep, either by code line or by procedure call (for instance, T ace Into
or Step Over). You can open an immediate window that accepts output from a
special debug object using a Print method. It's the poor cousin of the Turbo
Pascal watch, in that you have to explicitly place the Debug. Print statement
in your program code.
Debugging works tolerably well, if only because VB is a blacker box than Turbo
Vision, and its internals are (thankfully!) neither alterable nor visible.
Still, I keep feeling that there is some conceptual breakthrough to be made on
the fundamental mechanisms of debugging event-driven code. Some bright person
is gonna make some money when he or she hits on just the right idea. You could
do worse than be that person.


A Blacker Box


The more I work in Visual Basic, the better I like it. People have carped
about its limitations, but I've found on probing that many of them are simply
pining for access to the waydowndeep Windows API calls. I keep reminding
myself that the only thing possibly worse than not getting what you want is
getting it, but some people never seem to learn that particular lesson.
Windows is a handful. If you're interested in Windows internals, by all means
use tools such as QuickC Windows, Turbo Pascal for Windows, Turbo C++ for
Windows, or Borland C++. Keep in mind that in Tom Swan's terrific 800-page
book Turbo Pascal for Windows Programming, he spends not a single page
teaching Pascal! The entire book is essentially about calling the Windows API.
Hey man, but what if I just want to write a mortgage calculator? Then, m'boy,
stand tall and use Visual Basic.


VB=MC{2}


I've come across an obscure little book that's been very helpful and lots of
fun in my pursuit of Visual Basic. It has its flaws (mostly on the production
end of things) but it's useful and different enough to recommend, especially
because you won't see it in the stores. It's VB=MC{2}, by J.D. Evans, Jr. and
published by ETN Corp.
The book is basically a printed lecture by the author on the use of Visual
Basic in application development. I call it a "lecture" because it reads as
though it were given from the podium before an audience: chatty, complete with
informal diction and an occasional joke. ("Now, thinking is something that
makes my head hurt and is something to be avoided at all costs--this is a
Southern trait and is most definitely inherited.") This joke to the contrary,
it is an extremely thoughtful book that traces the author's logic in working
through the Visual Basic design process and building an SQL query design tool.
Thoughtful--but ugly, uninterrupted by technical figures aside from a small
handful of screen shots. The author italicizes every third word for emphasis,
which of course means all italicized words lose emphasis and simply look
weird.
About half the book is VB source code, which is included on a diskette as part
of the package.
Despite its flaws, the book is valuable because it reflects the author's
real-world experience with VB, and contains a lot of heuristics that I found
useful. Mr. Evans points out that complex forms run more slowly than simple
forms and that extremely complex forms don't always run correctly (something
you're not likely to see in Microsoft's documentation); that you should not
put nonevent code into a form, and many other similar things. The discussion
on what to put in event code vs. module (nonevent) code is worth the price of
admission.
Like a lot of other good things, VB=MC{2} is quirky, but it contains
reality-centered advice I haven't seen anywhere else, and a lot of solid VB
source code.



A Visual C++


It's really not my job to talk about C stuff in this column, but I've been
playing with another very slick, event-driven product that is essentially
Visual C++. It's called VZ Programmer, and it works in much the same way as
Visual Basic: You draw an interface, and then connect the interface elements
to interpreted C++ functions. It works about as quickly as Visual Basic, and
while the documentation is not terrific, the visual metaphor and toolsets are
somewhat richer and more versatile.
The company is out there and selling product, so evidently an interpreted C++
is not anathema. Let me therefore yell: Hey already, when is somebody going to
do me up a Visual Pascal?


Products Mentioned


Visual Basic Microsoft Corp. One Microsoft Way Redmond, WA 98052-6399
206-882-8080 $199.00
VB=MC{2} by J.D. Evans ETN Corp. RR 4 Box 659 Montoursville, PA 17754-9433 460
pages, $29.95 (disk $9.95)
VZ Programmer VZ Corp. 175 S. Main St., Ste. 1550 Salt Lake City, UT 84111
801-595-1352 $595.00















































June, 1992
GRAPHICS PROGRAMMING


Fast Antialiasing


 This article contains the following executables: XSHARP20.ZIP


Michael Abrash


The thought first popped into my head as I unenthusiastically picked through
the salad bar at a local "family" restaurant, trying to decide whether the
meatballs, the fried clams, or the lasagna was likely to shorten my life the
least. I decided on the chicken in mystery sauce.
The thought recurred when my daugther asked, "Dad, is that fried chicken?"
"I don't think so," I said. "I think it's stewed chicken."
"It looks like fried chicken."
"Maybe it's fried, stewed chicken," my wife volunteered hopefully. I took a
bite. It was, indeed, fried, stewed chicken. I can now, unhesitatingly and
without reservation, recommend that you avoid fried, stewed chicken at all
costs.
The thought I had was as follows: This is not good food. Not a profound
thought, but it raises an interesting question: Why was I eating in this
restaurant? The answer, to borrow a phrase from E.F. Schumacher, is
appropriate technology. For a family on a budget, with a small child, tired of
staring at each other over the kitchen table, this was a perfect place to eat.
It was cheap, it had greasy food and ice cream, no one cared if children
dropped things or talked loudly or walked around, and, most important of all,
it wasn't home. So what if the food was lousy? Good food was a luxury, a
bonus; everything on the above list was necessary. A family restaurant was the
appropriate dining-out technology, given the parameters within which we had to
work.
When I read through SIGGRAPH proceedings and other state-of-the-art
computer-graphics material, all too often I feel like I'm dining at a
four-star restaurant with two-year-old triplets and an empty wallet. We're
talking incredibly inappropriate technology for PC graphics here. Sure, I say
to myself as I read about an antialiasing technique, that sounds wonderful--if
I had 24-bpp color, and dedicated hardware to do the processing, and all day
to wait to generate one image. Yes, I think, that is a good way to do hidden
surface removal--in a system with hardware z-buffering. Most of the stuff in
Computer Graphics is riveting, but, alas, pretty much useless on PCs. When an
80x86 has to do all the work, speed becomes th overriding parameter,
especially for real-time graphics.
Literature that's applicable to fast PC graphics is hard enough to find, but
what we'd really like is above-average image quality combined with terrific
speed, and there's almost no literature of that sort around. There is some,
though, and you folks are right on top of it. For example, alert reader
Michael Chaplin, of San Diego, wrote to suggest that I might enjoy the
line-antialiasing algorithm presented in Xiaolin Wu's article, "An Efficient
Antialiasing Technique," in the July 1991 issue of Computer Graphics. Michael
was dead-on right. This is a great algorithm, combining excellent antialiased
line quality with speed that's close to that of non-antialiased Bresenham's
line drawing. This is the sort of algorithm that makes you want to go out and
write a wire-frame animation program, just so you can see low good those
smooth lines look in motion. Wu antialiasing is a wonderful example of what
can be accomplished on inexpensive, mass-market hardware with the proper
programming perspective. In short, it's a splendid example of appropriate
technology for PCs.


Wu Antialiasing


To recap briefly, antialiasing is the process of smoothing lines and edges so
that they appear less jagged. Antialiasing is partly an aesthetic issue,
because it makes images more attractive. It's partly an accuracy issue,
because it makes it possible to position and draw images with effectively more
precision than the resolution of the display. Finally, it's partly a flat-out
necessity, to avoid the horrible, crawling, jagged edges of temporal aliasing
when performing animation.
The basic premise of Wu antialiasing is almost ridiculously simple: As the
algorithm steps one pixel unit at a time along the major (longer) axis of a
line, it draws the two pixels bracketing the line along the minor axis at each
point. Each of the two bracketing pixels is drawn with a weighted fraction of
the full intensity of the drawing color, with the weighting for each pixel
equal to one minus the pixel's distance along the minor axis from the ideal
line. Figure 1 illustrates this concept.
The intensities of the two pixels that bracket the line are selected so that
they always sum to exactly 1; that is, to the intensity of one fully
illuminated pixel of the drawing color. The presence of aggregate full-pixel
intensity means that at each step, the line has the same brightness it would
have if a single pixel were drawn at precisely the correct location. Moreover,
thanks to the distribution of the intensity weighting, that brightness is
centered at the ideal line. Not coincidentally, a line drawn with pixel pairs
of aggregate single-pixel intensity, centered on the ideal line, is perceived
by the eye not as a jagged collection of pixel pairs, but as a smooth line
centered on the ideal line. Thus, by weighting the bracketing pixels properly
at each step, we can readily produce what looks like a smooth line at
precisely the right location, rather than the jagged pattern of line segments
that non-antialiased line-drawing algorithms such as Bresenham's trace out.
You might expect that the implementation of Wu antialiasing would fall into
two distinct areas: tracing out the line (that is, finding the appropriate
pixel pairs to draw) and calculating the appropriate weightings for each pixel
pair. Not so, however. The weighting calculations involve only a few shifts,
XORs, and adds; for all practical purposes, tracing and weighting are rolled
into one step--and a very fast step it is. How fast is it? On a 33-MHz 486
with a fast VGA, a good but not maxed-out assembler implementation of Wu
antialiasing draws a more than respectable 5000 150-pixel-long vectors per
second. That's especially impressive considering that about 1,500,000 actual
pixels are drawn per second, meaning that Wu antialiasing is drawing at over
50 percent of the maximum memory bandwidth--and hence more than half the
fastest theoretically possible drawing speed--of an AT-bus VGA. In short, Wu
antialiasing is about as fast an antialiased line approach as you could ever
hope to find for the VGA.


Tracing and Intensity in One


Horizontal, vertical, and diagonal lines do not require Wu antialiasing
because they pass through the center of every pixel they meet; such lines can
be drawn with fast, special-case code. For all other cases, Wu lines are
traced out one step at a time along the major axis by means of a simple,
fixed-point algorithm. The move along the minor axis with respect to a
one-pixel move along the major axis (the line slope for lines with slopes less
than 1, 1/slope for lines with slopes greater than 1) is calculated with a
single integer divide. This value, called the "error adjust," is stored as a
fixed-point fraction, in 0.16 format (that is, all bits are fractional, and
the decimal point is just to the left of bit 15). An error accumulator, also
in 0.16 format, is initialized to 0. Then the first pixel is drawn; no
weighting is needed, because the line intersects its endpoints exactly.
Now the error adjust is added to the error accumulator. The error accumulator
indicates how far between pixels the line has progressed along the minor axis
at any given step; when the error accumulator turns over, it's time to advance
one pixel along the minor axis. At each step along the line, the major-axis
coordinate advances by one pixel. The two bracketing pixels to draw are simply
the two pixels nearest the line along the minor axis. For instance, if X is
the current major-axis coordinate and Y is the current minor-axis coordinate,
then the two pixels to be drawn are (X,Y) and (X,Y+1). In short, the
derivation of the pixels at which to draw involves nothing more complicated
than advancing one pixel along the major axis, adding the error adjust to the
error accumulator, and advancing one pixel along the minor axis when the error
accumulator turns over.
So far, nothing special; but now we come to the true wonder of Wu
antialiasing. We know which pair of pixels to draw at each step along the
line, but we also need to generate the two proper intensities, which must be
inversely proportional to distance from the ideal line and sum to 1, and
that's a potentially time-consuming operation. Let's assume, however, that the
number of possible intensity levels to be used for weighting is the value
NumLevels = 2{n} for some integer n, with the minimum weighting (0 percent
intensity) being the value 2{n-1}, and the maximum weighting (100 percent
intensity) being the value 0. Given that, lo and behold, the most significant
n bits of the error accumulator select the proper intensity value for one
element of the pixel pair, as shown in Figure 2. Better yet, 2{n-1} minus the
intensity of the first pixel selects the intensity of the other pixel in the
pair, because the intensities of the two pixels must sum to 1; as it happens,
this result can be obtained simply by flipping the n least-significant bits of
the first pixel's value. All this works because what the error accumulator
accumulates is precisely the ideal line's current distance between the two
bracketing pixels.
The intensity calculations take longer to describe than they do to perform.
All that's involved is a shift of the error accumulator to right-justify the
desired intensity weighting bits, and then an XOR to flip the
least-significant n bits of the first pixel's value in order to generate the
second pixel's value. Listing One (page 154) illustrates just how efficient Wu
antialiasing is; the intensity calculations take only three statements, and
the entire Wu line-drawing loop is only nine statements long. Of course, a
single C statement can hide a great deal of complexity, but Listing Six (page
156), the inner loop of an assembler implementation, shows that only 15
instructions are required per step along the major axis--and the number of
instructions could be reduced to ten by special-casing and loop unrolling.
Make no mistake about it, Wu antialiasing is fast.


Sample Wu Antialiasing


The true test of any antialiasing technique is how good it looks, so let's
have a look at Wu antialiasing in action. Listing One is a C implementation of
Wu antialiasing. Listing Two (page 155) is a sample program that draws a
variety of Wu-antialiased lines, followed by non-antialiased lines, for
comparison. Listing Three (page 156) contains DrawPixel() and SetMode()
functions for mode 13h, the VGA's 320x200 256-color mode. Finally, Listing
Four (page 156) is a simple, non-antialiased line-drawing routine. Link these
four listings together and run the resulting program to see both
Wu-antialiased and non-antialiased lines.
Listing One isn't particularly fast, because it calls DrawPixel() for each
pixel. On the other hand, DrawPixel() makes it easy to try out Wu antialiasing
in a variety of modes; just adapt the code in Listing Three for the 256-color
mode you want to support. For example, Listing Five (page 156) shows code to
draw Wu-antialiased lines in 640X480 256-color mode on a Super-VGA built
around the Tseng Labs ET4000 chip with at least 512K of display memory
installed. It's well worth checking out Wu antialiasing at 640x480. Although
antialiased lines look much smoother than normal lines at 320x200 resolution,
they're far from perfect, because the pixels are so big that the eye can't
blend them properly. At 640x480, however, Wu-antialiased lines look fabulous;
from a couple of feet away, they look as straight and smooth as if they were
drawn with a ruler.
Listing One requires that the DAC palette be set up so that a NumLevel-long
block of palette entries contains linearly decreasing intensities of the
drawing color. The size of the block is programmable, but must be a power of
two. The more intensity levels, the better. Wu says that 32 intensities is
enough; on my system, eight and even four levels looked pretty good. I found
that gamma correction, which gives linearly spaced intensity steps, improved
antialiasing quality significantly. Fortunately, we can program the palette
with gamma-corrected values, so our drawing code doesn't have to do any extra
work.
Listing One isn't very fast, so I implemented Wu antialiasing in assembler,
hard-coded for mode 13h. The inner loop of the assembler code is shown in
Listing Six; the full assembler routine will be available as part of the code
archive from this issue of DDJ. High-speed graphics code and fast VGAs go
together like peanut butter and jelly, which is to say very well indeed; the
assembler implementation ran more than twice as fast on my 486 after I ran the
SETBUS utility from last month to put the VGA into 16-bit mode. Enough said!


Notes on Wu Antialiasing


Wu antialiasing can be applied to any curve for which it's possible to
calculate at each step the positions and intensities of two bracketing pixels,
although the implementation will generally be nowhere near as efficient as it
is for lines. However, Wu's article in Computer Graphics does describe an
efficient algorithm for drawing antialiased circles. Wu also describes a
technique for antialiasing solids, such as filled circles and polygons. Wu's
approach biases the edges of filled objects outward. Although this is no good
for adjacent polygons of the sort used in rendering, it's certainly possible
to design a more accurate polygon-antialiasing approach around Wu's basic
weighting technique. The results would not be quite so good as more
sophisticated antialiasing techniques, but they would be much faster.
In general, in fact, the results obtained by Wu antialiasing are only so-so,
by theoretical measures. Wu antialiasing amounts to a simple box filter placed
over a step approximation of a line, and that process introduces a good deal
of deviation from the ideal. On the other hand, Wu notes that even a 10
percent error in intensity doesn't lead to noticeable loss of image quality,
and for Wu-antialiased lines up to 1K pixels in length, the error is under 10
percent. If it looks good, it is good--and it looks good. With a 16-bit error
accumulator, fixed-point inaccuracy becomes a problem for Wu-antialiased lines
longer than 1K. For such lines, you should switch to using 32-bit error
values, which would let you handle lines of any practical length.
In the listings, I have chosen to truncate, rather than round, the
error-adjust value. This increases the intensity error of the line but
guarantees that fixed-point inaccuracy won't cause the minor axis to advance
past the endpoint. Over-running the endpoint would result in the drawing of
pixels outside the line's bounding box, and potentially even in an attempt to
access pixels off the edge of the bitmap.
Finally, I should mention that, as published, Wu's algorithm draws lines
symmetrically, from both ends at once. I haven't done this for a number of
reasons, not least of which is that symmetric drawing is an inefficient way to
draw lines that span banks on banked Super-VGAs. Banking aside, however,
symmetric drawing is potentially faster, because it eliminates half of all
calculations; in so doing, it cuts intensity error in half, as well.



Matrix Orthogonalization Made Easy


Peter Brooks of MicroMind passed along the following nifty idea. Fixed-point
rotation matrices tend to drift from the proper values over the course of
repeated concatenations, due to fixed-point error, with the columns gradually
ceasing to be mutually perpendicular (orthogonal). It is essential that
rotation matrices remain orthogonal, however, in order to perform
nondistorting rotations. One way to deal with this is to periodically
recalculate one of the columns as the cross-product of the other two, which
works because the cross-product of two vectors is an orthogonal vector.
Peter's insight was: Why not recalculate one column as the cross-product of
the other two every time you recalculate a rotation matrix? In other words, do
the matrix multiplication to generate two columns of the result matrix, then
just calculate the third column as the cross-product of the other two. The row
or column calculated as the cross-product should be rotated regularly. Not
only does this guarantee an orthogonal matrix at all times, but it also turns
out to be faster, because it takes fewer operations to calculate a
cross-product than to recalculate a matrix column. (All of the above applies
equally well if you substitute "row" for "column.")
This sounds like a great idea to me. My only question is whether one should
alternate between rows and columns, in order to distribute error more evenly.
Any thoughts, readers?


Next Time


We didn't quite make it back to X-Sharp this month, although the topics were
certainly relat d. Next month, for sure. In the meantime, keep those excellent
suggestions coming. And stay the heck away from fried, stewed chicken.


_GRAPHICS PROGRAMMING COLUMN_
by Michael Abrash



[LISTING ONE]

/* Function to draw an antialiased line from (X0,Y0) to (X1,Y1), using an
 * antialiasing approach published by Xiaolin Wu in the July 1991 issue of
 * Computer Graphics. Requires that the palette be set up so that there
 * are NumLevels intensity levels of the desired drawing color, starting at
 * color BaseColor (100% intensity) and followed by (NumLevels-1) levels of
 * evenly decreasing intensity, with color (BaseColor+NumLevels-1) being 0%
 * intensity of the desired drawing color (black). This code is suitable for
 * use at screen resolutions, with lines typically no more than 1K long; for
 * longer lines, 32-bit error arithmetic must be used to avoid problems with
 * fixed-point inaccuracy. No clipping is performed in DrawWuLine; it must be
 * performed either at a higher level or in the DrawPixel function.
 * Tested with Borland C++ 3.0 in C compilation mode and the small model.
 */
extern void DrawPixel(int, int, int);

/* Wu antialiased line drawer.
 * (X0,Y0),(X1,Y1) = line to draw
 * BaseColor = color # of first color in block used for antialiasing, the
 * 100% intensity version of the drawing color
 * NumLevels = size of color block, with BaseColor+NumLevels-1 being the
 * 0% intensity version of the drawing color
 * IntensityBits = log base 2 of NumLevels; the # of bits used to describe
 * the intensity of the drawing color. 2**IntensityBits==NumLevels
 */
void DrawWuLine(int X0, int Y0, int X1, int Y1, int BaseColor, int NumLevels,
 unsigned int IntensityBits)
{
 unsigned int IntensityShift, ErrorAdj, ErrorAcc;
 unsigned int ErrorAccTemp, Weighting, WeightingComplementMask;
 int DeltaX, DeltaY, Temp, XDir;

 /* Make sure the line runs top to bottom */
 if (Y0 > Y1) {
 Temp = Y0; Y0 = Y1; Y1 = Temp;
 Temp = X0; X0 = X1; X1 = Temp;
 }
 /* Draw the initial pixel, which is always exactly intersected by

 the line and so needs no weighting */
 DrawPixel(X0, Y0, BaseColor);

 if ((DeltaX = X1 - X0) >= 0) {
 XDir = 1;
 } else {
 XDir = -1;
 DeltaX = -DeltaX; /* make DeltaX positive */
 }
 /* Special-case horizontal, vertical, and diagonal lines, which
 require no weighting because they go right through the center of
 every pixel */
 if ((DeltaY = Y1 - Y0) == 0) {
 /* Horizontal line */
 while (DeltaX-- != 0) {
 X0 += XDir;
 DrawPixel(X0, Y0, BaseColor);
 }
 return;
 }
 if (DeltaX == 0) {
 /* Vertical line */
 do {
 Y0++;
 DrawPixel(X0, Y0, BaseColor);
 } while (--DeltaY != 0);
 return;
 }
 if (DeltaX == DeltaY) {
 /* Diagonal line */
 do {
 X0 += XDir;
 Y0++;
 DrawPixel(X0, Y0, BaseColor);
 } while (--DeltaY != 0);
 return;
 }
 /* Line is not horizontal, diagonal, or vertical */
 ErrorAcc = 0; /* initialize the line error accumulator to 0 */
 /* # of bits by which to shift ErrorAcc to get intensity level */
 IntensityShift = 16 - IntensityBits;
 /* Mask used to flip all bits in an intensity weighting, producing the
 result (1 - intensity weighting) */
 WeightingComplementMask = NumLevels - 1;
 /* Is this an X-major or Y-major line? */
 if (DeltaY > DeltaX) {
 /* Y-major line; calculate 16-bit fixed-point fractional part of a
 pixel that X advances each time Y advances 1 pixel, truncating the
 result so that we won't overrun the endpoint along the X axis */
 ErrorAdj = ((unsigned long) DeltaX << 16) / (unsigned long) DeltaY;
 /* Draw all pixels other than the first and last */
 while (--DeltaY) {
 ErrorAccTemp = ErrorAcc; /* remember currrent accumulated error */
 ErrorAcc += ErrorAdj; /* calculate error for next pixel */
 if (ErrorAcc <= ErrorAccTemp) {
 /* The error accumulator turned over, so advance the X coord */
 X0 += XDir;
 }
 Y0++; /* Y-major, so always advance Y */

 /* The IntensityBits most significant bits of ErrorAcc give us the
 intensity weighting for this pixel, and the complement of the
 weighting for the paired pixel */
 Weighting = ErrorAcc >> IntensityShift;
 DrawPixel(X0, Y0, BaseColor + Weighting);
 DrawPixel(X0 + XDir, Y0,
 BaseColor + (Weighting ^ WeightingComplementMask));
 }
 /* Draw the final pixel, which is always exactly intersected by the line
 and so needs no weighting */
 DrawPixel(X1, Y1, BaseColor);
 return;
 }
 /* It's an X-major line; calculate 16-bit fixed-point fractional part of a
 pixel that Y advances each time X advances 1 pixel, truncating the
 result to avoid overrunning the endpoint along the X axis */
 ErrorAdj = ((unsigned long) DeltaY << 16) / (unsigned long) DeltaX;
 /* Draw all pixels other than the first and last */
 while (--DeltaX) {
 ErrorAccTemp = ErrorAcc; /* remember currrent accumulated error */
 ErrorAcc += ErrorAdj; /* calculate error for next pixel */
 if (ErrorAcc <= ErrorAccTemp) {
 /* The error accumulator turned over, so advance the Y coord */
 Y0++;
 }
 X0 += XDir; /* X-major, so always advance X */
 /* The IntensityBits most significant bits of ErrorAcc give us the
 intensity weighting for this pixel, and the complement of the
 weighting for the paired pixel */
 Weighting = ErrorAcc >> IntensityShift;
 DrawPixel(X0, Y0, BaseColor + Weighting);
 DrawPixel(X0, Y0 + 1,
 BaseColor + (Weighting ^ WeightingComplementMask));
 }
 /* Draw the final pixel, which is always exactly intersected by the line
 and so needs no weighting */
 DrawPixel(X1, Y1, BaseColor);
}






[LISTING TWO]
/* Sample line-drawing program to demonstrate Wu antialiasing. Also draws
 * non-antialiased lines for comparison.
 * Tested with Borland C++ 3.0 in C compilation mode and the small model.
 */
#include <dos.h>
#include <conio.h>

void SetPalette(struct WuColor *);
extern void DrawWuLine(int, int, int, int, int, int, unsigned int);
extern void DrawLine(int, int, int, int, int);
extern void SetMode(void);
extern int ScreenWidthInPixels; /* screen dimension globals */
extern int ScreenHeightInPixels;


#define NUM_WU_COLORS 2 /* # of colors we'll do antialiased drawing with */
struct WuColor { /* describes one color used for antialiasing */
 int BaseColor; /* # of start of palette intensity block in DAC */
 int NumLevels; /* # of intensity levels */
 int IntensityBits; /* IntensityBits == log2 NumLevels */
 int MaxRed; /* red component of color at full intensity */
 int MaxGreen; /* green component of color at full intensity */
 int MaxBlue; /* blue component of color at full intensity */
};
enum {WU_BLUE=0, WU_WHITE=1}; /* drawing colors */
struct WuColor WuColors[NUM_WU_COLORS] = /* blue and white */
 {{192, 32, 5, 0, 0, 0x3F}, {224, 32, 5, 0x3F, 0x3F, 0x3F}};

void main()
{
 int CurrentColor, i;
 union REGS regset;

 /* Draw Wu-antialiased lines in all directions */
 SetMode();
 SetPalette(WuColors);
 for (i=5; i<ScreenWidthInPixels; i += 10) {
 DrawWuLine(ScreenWidthInPixels/2-ScreenWidthInPixels/10+i/5,
 ScreenHeightInPixels/5, i, ScreenHeightInPixels-1,
 WuColors[WU_BLUE].BaseColor, WuColors[WU_BLUE].NumLevels,
 WuColors[WU_BLUE].IntensityBits);
 }
 for (i=0; i<ScreenHeightInPixels; i += 10) {
 DrawWuLine(ScreenWidthInPixels/2-ScreenWidthInPixels/10, i/5, 0, i,
 WuColors[WU_BLUE].BaseColor, WuColors[WU_BLUE].NumLevels,
 WuColors[WU_BLUE].IntensityBits);
 }
 for (i=0; i<ScreenHeightInPixels; i += 10) {
 DrawWuLine(ScreenWidthInPixels/2+ScreenWidthInPixels/10, i/5,
 ScreenWidthInPixels-1, i, WuColors[WU_BLUE].BaseColor,
 WuColors[WU_BLUE].NumLevels, WuColors[WU_BLUE].IntensityBits);
 }
 for (i=0; i<ScreenWidthInPixels; i += 10) {
 DrawWuLine(ScreenWidthInPixels/2-ScreenWidthInPixels/10+i/5,
 ScreenHeightInPixels, i, 0, WuColors[WU_WHITE].BaseColor,
 WuColors[WU_WHITE].NumLevels,
 WuColors[WU_WHITE].IntensityBits);
 }
 getch(); /* wait for a key press */

 /* Now clear the screen and draw non-antialiased lines */
 SetMode();
 SetPalette(WuColors);
 for (i=0; i<ScreenWidthInPixels; i += 10) {
 DrawLine(ScreenWidthInPixels/2-ScreenWidthInPixels/10+i/5,
 ScreenHeightInPixels/5, i, ScreenHeightInPixels-1,
 WuColors[WU_BLUE].BaseColor);
 }
 for (i=0; i<ScreenHeightInPixels; i += 10) {
 DrawLine(ScreenWidthInPixels/2-ScreenWidthInPixels/10, i/5, 0, i,
 WuColors[WU_BLUE].BaseColor);
 }
 for (i=0; i<ScreenHeightInPixels; i += 10) {
 DrawLine(ScreenWidthInPixels/2+ScreenWidthInPixels/10, i/5,

 ScreenWidthInPixels-1, i, WuColors[WU_BLUE].BaseColor);
 }
 for (i=0; i<ScreenWidthInPixels; i += 10) {
 DrawLine(ScreenWidthInPixels/2-ScreenWidthInPixels/10+i/5,
 ScreenHeightInPixels, i, 0, WuColors[WU_WHITE].BaseColor);
 }
 getch(); /* wait for a key press */

 regset.x.ax = 0x0003; /* AL = 3 selects 80x25 text mode */
 int86(0x10, &regset, &regset); /* return to text mode */
}

/* Sets up the palette for antialiasing with the specified colors.
 * Intensity steps for each color are scaled from the full desired intensity
 * of the red, green, and blue components for that color down to 0%
 * intensity; each step is rounded to the nearest integer. Colors are
 * corrected for a gamma of 2.3. The values that the palette is programmed
 * with are hardwired for the VGA's 6 bit per color DAC.
 */
void SetPalette(struct WuColor * WColors)
{
 int i, j;
 union REGS regset;
 struct SREGS sregset;
 static unsigned char PaletteBlock[256][3]; /* 256 RGB entries */
 /* Gamma-corrected DAC color components for 64 linear levels from 0% to
 100% intensity */
 static unsigned char GammaTable[] = {
 0, 10, 14, 17, 19, 21, 23, 24, 26, 27, 28, 29, 31, 32, 33, 34,
 35, 36, 37, 37, 38, 39, 40, 41, 41, 42, 43, 44, 44, 45, 46, 46,
 47, 48, 48, 49, 49, 50, 51, 51, 52, 52, 53, 53, 54, 54, 55, 55,
 56, 56, 57, 57, 58, 58, 59, 59, 60, 60, 61, 61, 62, 62, 63, 63};

 for (i=0; i<NUM_WU_COLORS; i++) {
 for (j=0; j<WColors[i].NumLevels; j++) {
 PaletteBlock[j][0] = GammaTable[((double)WColors[i].MaxRed * (1.0 -
 (double)j / (double)(WColors[i].NumLevels - 1))) + 0.5];
 PaletteBlock[j][1] = GammaTable[((double)WColors[i].MaxGreen * (1.0 -
 (double)j / (double)(WColors[i].NumLevels - 1))) + 0.5];
 PaletteBlock[j][2] = GammaTable[((double)WColors[i].MaxBlue * (1.0 -
 (double)j / (double)(WColors[i].NumLevels - 1))) + 0.5];
 }
 /* Now set up the palette to do Wu antialiasing for this color */
 regset.x.ax = 0x1012; /* set block of DAC registers function */
 regset.x.bx = WColors[i].BaseColor; /* first DAC location to load */
 regset.x.cx = WColors[i].NumLevels; /* # of DAC locations to load */
 regset.x.dx = (unsigned int)PaletteBlock; /* offset of array from which
 to load RGB settings */
 sregset.es = _DS; /* segment of array from which to load settings */
 int86x(0x10, &regset, &regset, &sregset); /* load the palette block */
 }
}






[LISTING THREE]



/* VGA mode 13h pixel-drawing and mode set functions.
 * Tested with Borland C++ 3.0 in C compilation mode and the small model.
 */
#include <dos.h>

/* Screen dimension globals, used in main program to scale. */
int ScreenWidthInPixels = 320;
int ScreenHeightInPixels = 200;

/* Mode 13h draw pixel function. */
void DrawPixel(int X, int Y, int Color)
{
#define SCREEN_SEGMENT 0xA000
 unsigned char far *ScreenPtr;

 FP_SEG(ScreenPtr) = SCREEN_SEGMENT;
 FP_OFF(ScreenPtr) = (unsigned int) Y * ScreenWidthInPixels + X;
 *ScreenPtr = Color;
}

/* Mode 13h mode-set function. */
void SetMode()
{
 union REGS regset;

 /* Set to 320x200 256-color graphics mode */
 regset.x.ax = 0x0013;
 int86(0x10, &regset, &regset);
}






[LISTING FOUR]

/* Function to draw a non-antialiased line from (X0,Y0) to (X1,Y1), using a
 * simple fixed-point error accumulation approach.
 * Tested with Borland C++ 3.0 in C compilation mode and the small model.
 */
extern void DrawPixel(int, int, int);

/* Non-antialiased line drawer.
 * (X0,Y0),(X1,Y1) = line to draw, Color = color in which to draw
 */
void DrawLine(int X0, int Y0, int X1, int Y1, int Color)
{
 unsigned long ErrorAcc, ErrorAdj;
 int DeltaX, DeltaY, XDir, Temp;

 /* Make sure the line runs top to bottom */
 if (Y0 > Y1) {
 Temp = Y0; Y0 = Y1; Y1 = Temp;
 Temp = X0; X0 = X1; X1 = Temp;
 }
 DrawPixel(X0, Y0, Color); /* draw the initial pixel */

 if ((DeltaX = X1 - X0) >= 0) {
 XDir = 1;
 } else {
 XDir = -1;
 DeltaX = -DeltaX; /* make DeltaX positive */
 }
 if ((DeltaY = Y1 - Y0) == 0) /* done if only one point in the line */
 if (DeltaX == 0) return;

 ErrorAcc = 0x8000; /* initialize line error accumulator to .5, so we can
 advance when we get halfway to the next pixel */
 /* Is this an X-major or Y-major line? */
 if (DeltaY > DeltaX) {
 /* Y-major line; calculate 16-bit fixed-point fractional part of a
 pixel that X advances each time Y advances 1 pixel */
 ErrorAdj = ((((unsigned long)DeltaX << 17) / (unsigned long)DeltaY) +
 1) >> 1;
 /* Draw all pixels between the first and last */
 do {
 ErrorAcc += ErrorAdj; /* calculate error for this pixel */
 if (ErrorAcc & ~0xFFFFL) {
 /* The error accumulator turned over, so advance the X coord */
 X0 += XDir;
 ErrorAcc &= 0xFFFFL; /* clear integer part of result */
 }
 Y0++; /* Y-major, so always advance Y */
 DrawPixel(X0, Y0, Color);
 } while (--DeltaY);
 return;
 }
 /* It's an X-major line; calculate 16-bit fixed-point fractional part of a
 pixel that Y advances each time X advances 1 pixel */
 ErrorAdj = ((((unsigned long)DeltaY << 17) / (unsigned long)DeltaX) +
 1) >> 1;
 /* Draw all remaining pixels */
 do {
 ErrorAcc += ErrorAdj; /* calculate error for this pixel */
 if (ErrorAcc & ~0xFFFFL) {
 /* The error accumulator turned over, so advance the Y coord */
 Y0++;
 ErrorAcc &= 0xFFFFL; /* clear integer part of result */
 }
 X0 += XDir; /* X-major, so always advance X */
 DrawPixel(X0, Y0, Color);
 } while (--DeltaX);
}






[LISTING FIVE]

/* Mode set and pixel-drawing functions for the 640x480 256-color mode of
 * Tseng Labs ET4000-based SuperVGAs.
 * Tested with Borland C++ 3.0 in C compilation mode and the small model.
 */
#include <dos.h>


/* Screen dimension globals, used in main program to scale */
int ScreenWidthInPixels = 640;
int ScreenHeightInPixels = 480;

/* ET4000 640x480 256-color draw pixel function. */
void DrawPixel(int X, int Y, int Color)
{
#define SCREEN_SEGMENT 0xA000
#define GC_SEGMENT_SELECT 0x3CD /* ET4000 segment (bank) select reg */
 unsigned char far *ScreenPtr;
 unsigned int Bank;
 unsigned long BitmapAddress;

 /* Full bitmap address of pixel, as measured from address 0 to 0xFFFFF */
 BitmapAddress = (unsigned long) Y * ScreenWidthInPixels + X;
 /* Bank # is upper word of bitmap addr */
 Bank = BitmapAddress >> 16;
 /* Upper nibble is read bank #, lower nibble is write bank # */
 outp(GC_SEGMENT_SELECT, (Bank << 4) Bank);
 /* Draw into the bank */
 FP_SEG(ScreenPtr) = SCREEN_SEGMENT;
 FP_OFF(ScreenPtr) = (unsigned int) BitmapAddress;
 *ScreenPtr = Color;
}

/* ET4000 640x480 256-color mode-set function. */
void SetMode()
{
 union REGS regset;

 /* Set to 640x480 256-color graphics mode */
 regset.x.ax = 0x002E;
 int86(0x10, &regset, &regset);
}






[LISTING SIX]

; Inner loop for drawing Y-major lines from Wu-antialiased line drawer.
YMajorLoop:
 add dx,bp ;calculate error for next pixel
 jnc NoXAdvance ;not time to step in X yet
 ;the error accumulator turned over,
 ; so advance the X coord
 add si,bx ;add XDir to the pixel pointer
NoXAdvance:
 add si,SCREEN_WIDTH_IN_BYTES ;Y-major, so always advance Y
; The IntensityBits most significant bits of ErrorAcc give us the intensity
; weighting for this pixel, and the complement of the weighting for the
; paired pixel.
 mov ah,dh ;msb of ErrorAcc
 shr ah,cl ;Weighting = ErrorAcc >> IntensityShift;
 add ah,al ;BaseColor + Weighting
 mov [si],ah ;DrawPixel(X, Y, BaseColor + Weighting);

 mov ah,dh ;msb of ErrorAcc
 shr ah,cl ;Weighting = ErrorAcc >> IntensityShift;
 xor ah,ch ;Weighting ^ WeightingComplementMask
 add ah,al ;BaseColor + (Weighting ^ WeightingComplementMask)
 mov [si+bx],ah ;DrawPixel(X+XDir, Y,
 ; BaseColor + (Weighting ^ WeightingComplementMask));
 dec di ;--DeltaY
 jnz YMajorLoop


[WU ANTIALIASING]

; C near-callable function to draw an antialiased line from
; (X0,Y0) to (X1,Y1), in mode 13h, the VGA's standard 320x200 256-color
; mode. Uses an antialiasing approach published by Xiaolin Wu in the July
; 1991 issue of Computer Graphics. Requires that the palette be set up so
; that there are NumLevels intensity levels of the desired drawing color,
; starting at color BaseColor (100% intensity) and followed by (NumLevels-1)
; levels of evenly decreasing intensity, with color (BaseColor+NumLevels-1)
; being 0% intensity of the desired drawing color (black). No clipping is
; performed in DrawWuLine. Handles a maximum of 256 intensity levels per
; antialiased color. This code is suitable for use at screen resolutions,
; with lines typically no more than 1K long; for longer lines, 32-bit error
; arithmetic must be used to avoid problems with fixed-point inaccuracy.
; Tested with TASM 3.0.
;
; C near-callable as:
; void DrawWuLine(int X0, int Y0, int X1, int Y1, int BaseColor,
; int NumLevels, unsigned int IntensityBits);

SCREEN_WIDTH_IN_BYTES equ 320 ;# of bytes from the start of one scan line
 ; to the start of the next
SCREEN_SEGMENT equ 0a000h ;segment in which screen memory resides

; Parameters passed in stack frame.
parms struc
 dw 2 dup (?) ;pushed BP and return address
X0 dw ? ;X coordinate of line start point
Y0 dw ? ;Y coordinate of line start point
X1 dw ? ;X coordinate of line end point
Y1 dw ? ;Y coordinate of line end point
BaseColor dw ? ;color # of first color in block used for
 ; antialiasing, the 100% intensity version of the
 ; drawing color
NumLevels dw ? ;size of color block, with BaseColor+NumLevels-1
 ; being the 0% intensity version of the drawing color
 ; (maximum NumLevels = 256)
IntensityBits dw ? ;log base 2 of NumLevels; the # of bits used to
 ; describe the intensity of the drawing color.
 ; 2**IntensityBits==NumLevels
 ; (maximum IntensityBits = 8)
parms ends

 .model small
 .code
; Screen dimension globals, used in main program to scale.
_ScreenWidthInPixels dw 320
_ScreenHeightInPixels dw 200


 .code
 public _DrawWuLine
_DrawWuLine proc near
 push bp ;preserve caller's stack frame
 mov bp,sp ;point to local stack frame
 push si ;preserve C's register variables
 push di
 push ds ;preserve C's default data segment
 cld ;make string instructions increment their pointers

; Make sure the line runs top to bottom.
 mov si,[bp].X0
 mov ax,[bp].Y0
 cmp ax,[bp].Y1 ;swap endpoints if necessary to ensure that
 jna NoSwap ; Y0 <= Y1
 xchg [bp].Y1,ax
 mov [bp].Y0,ax
 xchg [bp].X1,si
 mov [bp].X0,si
NoSwap:

; Draw the initial pixel, which is always exactly intersected by the line
; and so needs no weighting.
 mov dx,SCREEN_SEGMENT
 mov ds,dx ;point DS to the screen segment
 mov dx,SCREEN_WIDTH_IN_BYTES
 mul dx ;Y0 * SCREEN_WIDTH_IN_BYTES yields the offset
 ; of the start of the row start the initial
 ; pixel is on
 add si,ax ;point DS:SI to the initial pixel
 mov al,byte ptr [bp].BaseColor ;color with which to draw
 mov [si],al ;draw the initial pixel

 mov bx,1 ;XDir = 1; assume DeltaX >= 0
 mov cx,[bp].X1
 sub cx,[bp].X0 ;DeltaX; is it >= 1?
 jns DeltaXSet ;yes, move left->right, all set
 ;no, move right->left
 neg cx ;make DeltaX positive
 neg bx ;XDir = -1
DeltaXSet:

; Special-case horizontal, vertical, and diagonal lines, which require no
; weighting because they go right through the center of every pixel.
 mov dx,[bp].Y1
 sub dx,[bp].Y0 ;DeltaY; is it 0?
 jnz NotHorz ;no, not horizontal
 ;yes, is horizontal, special case
 and bx,bx ;draw from left->right?
 jns DoHorz ;yes, all set
 std ;no, draw right->left
DoHorz:
 lea di,[bx+si] ;point DI to next pixel to draw
 mov ax,ds
 mov es,ax ;point ES:DI to next pixel to draw
 mov al,byte ptr [bp].BaseColor ;color with which to draw
 ;CX = DeltaX at this point
 rep stosb ;draw the rest of the horizontal line
 cld ;restore default direction flag

 jmp Done ;and we're done

 align 2
NotHorz:
 and cx,cx ;is DeltaX 0?
 jnz NotVert ;no, not a vertical line
 ;yes, is vertical, special case
 mov al,byte ptr [bp].BaseColor ;color with which to draw
VertLoop:
 add si,SCREEN_WIDTH_IN_BYTES ;point to next pixel to draw
 mov [si],al ;draw the next pixel
 dec dx ;--DeltaY
 jnz VertLoop
 jmp Done ;and we're done

 align 2
NotVert:
 cmp cx,dx ;DeltaX == DeltaY?
 jnz NotDiag ;no, not diagonal
 ;yes, is diagonal, special case
 mov al,byte ptr [bp].BaseColor ;color with which to draw
DiagLoop:
 lea si,[si+SCREEN_WIDTH_IN_BYTES+bx]
 ;advance to next pixel to draw by
 ; incrementing Y and adding XDir to X
 mov [si],al ;draw the next pixel
 dec dx ;--DeltaY
 jnz DiagLoop
 jmp Done ;and we're done

; Line is not horizontal, diagonal, or vertical.
 align 2
NotDiag:
; Is this an X-major or Y-major line?
 cmp dx,cx
 jb XMajor ;it's X-major

; It's a Y-major line. Calculate the 16-bit fixed-point fractional part of a
; pixel that X advances each time Y advances 1 pixel, truncating the result
; to avoid overrunning the endpoint along the X axis.
 xchg dx,cx ;DX = DeltaX, CX = DeltaY
 sub ax,ax ;make DeltaX 16.16 fixed-point value in DX:AX
 div cx ;AX = (DeltaX << 16) / DeltaY. Won't overflow
 ; because DeltaX < DeltaY
 mov di,cx ;DI = DeltaY (loop count)
 sub si,bx ;back up the start X by 1, as explained below
 mov dx,-1 ;initialize the line error accumulator to -1,
 ; so that it will turn over immediately and
 ; advance X to the start X. This is necessary
 ; properly to bias error sums of 0 to mean
 ; "advance next time" rather than "advance
 ; this time," so that the final error sum can
 ; never cause drawing to overrun the final X
 ; coordinate (works in conjunction with
 ; truncating ErrorAdj, to make sure X can't
 ; overrun)
 mov cx,8 ;CL = # of bits by which to shift
 sub cx,[bp].IntensityBits ; ErrorAcc to get intensity level (8
 ; instead of 16 because we work only

 ; with the high byte of ErrorAcc)
 mov ch,byte ptr [bp].NumLevels ;mask used to flip all bits in an
 dec ch ; intensity weighting, producing
 ; result (1 - intensity weighting)
 mov bp,BaseColor[bp] ;***stack frame not available***
 ;***from now on ***
 xchg bp,ax ;BP = ErrorAdj, AL = BaseColor,
 ; AH = scratch register

; Draw all remaining pixels.
YMajorLoop:
 add dx,bp ;calculate error for next pixel
 jnc NoXAdvance ;not time to step in X yet
 ;the error accumulator turned over,
 ; so advance the X coord
 add si,bx ;add XDir to the pixel pointer
NoXAdvance:
 add si,SCREEN_WIDTH_IN_BYTES ;Y-major, so always advance Y

; The IntensityBits most significant bits of ErrorAcc give us the intensity
; weighting for this pixel, and the complement of the weighting for the
; paired pixel.
 mov ah,dh ;msb of ErrorAcc
 shr ah,cl ;Weighting = ErrorAcc >> IntensityShift;
 add ah,al ;BaseColor + Weighting
 mov [si],ah ;DrawPixel(X, Y, BaseColor + Weighting);
 mov ah,dh ;msb of ErrorAcc
 shr ah,cl ;Weighting = ErrorAcc >> IntensityShift;
 xor ah,ch ;Weighting ^ WeightingComplementMask
 add ah,al ;BaseColor + (Weighting ^ WeightingComplementMask)
 mov [si+bx],ah ;DrawPixel(X+XDir, Y,
 ; BaseColor + (Weighting ^ WeightingComplementMask));
 dec di ;--DeltaY
 jnz YMajorLoop
 jmp Done ;we're done with this line

; It's an X-major line.
 align 2
XMajor:
; Calculate the 16-bit fixed-point fractional part of a pixel that Y advances
; each time X advances 1 pixel, truncating the result to avoid overrunning
; the endpoint along the X axis.
 sub ax,ax ;make DeltaY 16.16 fixed-point value in DX:AX
 div cx ;AX = (DeltaY << 16) / Deltax. Won't overflow
 ; because DeltaY < DeltaX
 mov di,cx ;DI = DeltaX (loop count)
 sub si,SCREEN_WIDTH_IN_BYTES ;back up the start X by 1, as
 ; explained below
 mov dx,-1 ;initialize the line error accumulator to -1,
 ; so that it will turn over immediately and
 ; advance Y to the start Y. This is necessary
 ; properly to bias error sums of 0 to mean
 ; "advance next time" rather than "advance
 ; this time," so that the final error sum can
 ; never cause drawing to overrun the final Y
 ; coordinate (works in conjunction with
 ; truncating ErrorAdj, to make sure Y can't
 ; overrun)
 mov cx,8 ;CL = # of bits by which to shift

 sub cx,[bp].IntensityBits ; ErrorAcc to get intensity level (8
 ; instead of 16 because we work only
 ; with the high byte of ErrorAcc)
 mov ch,byte ptr [bp].NumLevels ;mask used to flip all bits in an
 dec ch ; intensity weighting, producing
 ; result (1 - intensity weighting)
 mov bp,BaseColor[bp] ;***stack frame not available***
 ;***from now on ***
 xchg bp,ax ;BP = ErrorAdj, AL = BaseColor,
 ; AH = scratch register
; Draw all remaining pixels.
XMajorLoop:
 add dx,bp ;calculate error for next pixel
 jnc NoYAdvance ;not time to step in Y yet
 ;the error accumulator turned over,
 ; so advance the Y coord
 add si,SCREEN_WIDTH_IN_BYTES ;advance Y
NoYAdvance:
 add si,bx ;X-major, so add XDir to the pixel pointer

; The IntensityBits most significant bits of ErrorAcc give us the intensity
; weighting for this pixel, and the complement of the weighting for the
; paired pixel.
 mov ah,dh ;msb of ErrorAcc
 shr ah,cl ;Weighting = ErrorAcc >> IntensityShift;
 add ah,al ;BaseColor + Weighting
 mov [si],ah ;DrawPixel(X, Y, BaseColor + Weighting);
 mov ah,dh ;msb of ErrorAcc
 shr ah,cl ;Weighting = ErrorAcc >> IntensityShift;
 xor ah,ch ;Weighting ^ WeightingComplementMask
 add ah,al ;BaseColor + (Weighting ^ WeightingComplementMask)
 mov [si+SCREEN_WIDTH_IN_BYTES],ah
 ;DrawPixel(X, Y+SCREEN_WIDTH_IN_BYTES,
 ; BaseColor + (Weighting ^ WeightingComplementMask));
 dec di ;--DeltaX
 jnz XMajorLoop

Done: ;we're done with this line
 pop ds ;restore C's default data segment
 pop di ;restore C's register variables
 pop si
 pop bp ;restore caller's stack frame
 ret ;done
_DrawWuLine endp
 end

















June, 1992
PROGRAMMER'S BOOKSHELF


Operating Systems: The Nightmare Continues




Andrew Schulman


Andrew Tanenbaum has a new book out. For Tanenbaum's many fans, this is a
major piece of news, somewhat like hearing that Volume 4 of Knuth's Art of
Computer Programming has come out (it hasn't) or that Volume 18 of Inside
Macintosh has just appeared. I know that Ray Duncan reviewed Tanenbaum's new
book, Modern Operating Systems, in last month's "Programmer's Bookshelf," but
I really couldn't resist saying some more about it in this month's column.
Modern Operating Systems is in some ways an update of Tanenbaum's earlier book
on his UNIX-compatible operating system, MINIX (Operating Systems: Design and
Implementation, 1987), except that the new book doesn't present the source
code for an operating system, and is much less about UNIX. Instead, the book
presents in-depth case studies of four operating systems: two "traditional"
operating systems (MS-DOS and UNIX) and two distributed operating systems
(Mach and Tanenbaum's Amoeba).
The coverage of MS-DOS is surprising and welcome. Once again, Tanenbaum has
come through with an introductory textbook that deals with the real world. In
addition to presenting a fairly accurate picture of the most widely used
operating system of all time, it also provides some valuable lessons: "If
anyone had realized that within 10 years this tiny system that was picked up
almost by accident was going to be controlling 50 million computers,
considerably more thought might have gone into it."
Besides the case studies, there are very detailed chapters on processes,
memory management, file systems, input/output, and deadlocks. This provides
excellent background to read along with the ongoing DDJ series by the
Jolitzes, on porting BSD UNIX to the 386.
The entire second half of Modern Operating Systems is on distributed operating
systems: the client-server model and remote procedure call (RPC); threads;
processes and processor allocation; and distributed file systems. The 30-page
section on RPC deserves to be read by anyone even remotely (sorry) interested
in the problems of getting two programs to talk to each other. And if you have
been wondering about the thinking behind so-called "microkernel" operating
systems like Microsoft's NT, this is the place to start. (There's no mention
of NT, of course, but it's all applicable.)
The idea behind a microkernel is that most of the services traditionally
provided by an operating-system kernel can be moved into user-level processes.
As Tanenbaum explains, the microkernel does basically nothing; it simply
provides the essence of operating-system-hood, a framework or substrate on
which services such as file systems, system API calls, and process management
can be built. This means that multiple operating-system interfaces can be
plugged in on top of the same microkernel. That is exactly what Microsoft
wants to do in NT. I guess we'll find out in a few years if anyone wants it.
This brings up an interesting point about MS-DOS. A frequent complaint about
DOS is that it barely merits being called an operating system, since it
provides so little. But that's exactly what a microkernel is! Okay, so the
services provided by a modern microkernel are better thought-out than the ones
DOS provides, but DOS is still essentially a place for folks to plug in their
own extensions. You don't like the memory-management system in DOS? (Who
does?) Then you can get a memory manager or a DOS extender. You don't like the
file system? There are tons of network file systems available. To get a true
feel for DOS, you have to look not at what Microsoft has provided in IO.SYS
and MSDOS.SYS, but at what third-party vendors have built on top of it
(networks, protected-mode DOS extenders, environments such as Windows, TSRs,
drivers, you name it). And it's a lot more "micro" than those microkernels,
too.
Interestingly, Tanenbaum's own MINIX is modeled much like a distributed
operating system. Both the memory manager and the file system reside outside
the kernel; they are separate processes that communicate with the rest of the
system via messages.
Now, this mention of messages should make a light bulb go off in the head of
anyone working with a message-based, event-driven environment such as
Microsoft Windows. Much of the material from Tanenbaum on distributed
operating systems is readily applicable to Windows and other message-based
environments. In one particularly useful section, Tanenbaum demonstrates the
equivalence of all interprocess-communications (IPC) primitives, showing, for
example, how semaphores and monitors can be built in terms of messages.
In contrast to microkernels, there is the traditional "monolithic" kernel.
According to Tanenbaum,
...the only potential advantage of the monolithic kernel is performance.
Traping to the kernel and doing everything there may well be faster than
sending messages to remote servers. However...other factors tend to dominate,
and the small amount of time required to send a message and get a reply
(typically about 1 msec) is usually negligible.
This makes sense; if you're really worried about performance, you should be
working to minimize the number of operating-system calls you make in the first
place.
In his preface, Tanenbaum says he plans to update his MINIX book in the
foreseeable future, to produce an up-to-date, hands-on book. I am looking
forward to this new MINIX book. Mind you, I have little use for UNIX itself
(except that I'm totally dependent on it for e-mail, the editor I use is
UNIX-inspired, and all the utilities I depend on are, too). I figure that if
you make an operating-system that graduate students will want to use, then
only graduate students will want to use it. But, almost everything in these
books is applicable to systems programming in general, anywhere.
More important, UNIX is the only operating system that's been subjected to
rigorous dissection and explanation of its design and implementation. In
addition to Tanenbaum's books, there is Comer's Operating System Design: The
Xinu Approach, Bach's The Design of the Unix Operating System, Leffler et
al.'s Design and Implementation of the 4.3BSD Unix Operating System and, of
course, the Jolitz series running in DDJ (which will, I certainly hope, be
turned into a book). There is also the original Lions book, Source Code and
Commentary on UNIX Level 6 but this is (ahem) no longer available. (Somewhat
off the subject, I am surprised that there is no book like this on the inner
workings of X-Windows.)
I know of no such book for MS-DOS. There is only one book on DOS that even
attempts an inside look, but it doesn't provide source code for a DOS clone,
is written largely (though not entirely) by outsiders, and doesn't attempt to
explain how all the pieces tie together. (For one thing, the book has not a
single diagram!) My point is simply that, if you're interested in how
operating systems work, then you really ought to pick up a few of these
textbooks on UNIX internals, even if you have no interest in UNIX itself.


Inside OS/2 Again


I said a moment ago that UNIX is the only operating system that's been
subjected to rigorous explanation of its design and implementation. Actually,
that's not quite true: OS/2 also has produced this sort of literature.
Readers may recall that a number of years ago a good book titled Inside OS/2
came out under Gordon Letwin's name. It seems a shame that no such book exists
for DOS or Windows, which matter far more than UNIX or OS/2 (though OS/2 2.0
may still surprise us all). Perhaps it's because there's no design (other than
what Tanenbaum calls "The Big Mess") to describe in the first place? Perhaps
there's even a direct correlation between this ad hoc (ad hack?) approach and
success in the marketplace?
In any case, there is a new book on the design of 32-bit OS/2 2.0. One of the
authors of The Design of OS/2 is Michael Kogan from IBM, and the other is
Harvey Deitel, a writer of operating-systems textbooks. Whatever you think of
OS/2 itself, this book really is worth reading. Nothing in here shines like
Tanenbaum's book, but I found the discussions of OS/2 kernel architecture, 1.x
and 2.x memory management, and compatibility useful. The "Kernel Architecture"
section describes a number of OS/2 internal data structures; the
memory-management sections have good discussions of arenas.
The book contains an entire chapter called "Compatibility." This in many ways
is the most important chapter in the book, just as compatibility is in many
ways the most important part of an operating system (and one not discussed by
Tanenbaum, by the way). OS/2 2.0, will live or die by its ability to run "old"
applications from DOS and Windows. This chapter contains excellent discussions
of some of the tricks and hackery needed to run old software on new operating
systems.
Actually, one of the things that becomes clear while reading this book is that
OS/2 2.0 isn't all that new. While it runs 32-bit applications and has a
32-bit programming interface, much of the system is still 16-bit. OS/2 2.0
device drivers are still 16 bit for example, which is a good thing, not a bad
thing, because it means that this time around there will be some printer
drivers available!
Consequently, the need to mix 16-bit and 32-bit code is something that seems
to be much better thought-out in OS/2 2.0 than in Microsoft's NT. This mixing
strikes me as a positive, rather than a negative, about OS/2 2.0. Except: IBM
is going around claiming that OS/2 represents a clean slate, instead of a
"thing on a thing" like Windows. This is a stupid claim, for two reasons.
First, OS/2 is as much a "thing on a thing" as Windows or a DOS ex ender. (The
authors call it "a hybrid of 16-bit and 32-bit code internally.") Second, the
fact that OS/2 2.0 is such a hybrid may well give it an edge over NT, which is
so pure that 32-bit Windows applications won't be able to call 16-bit code.
(On the other hand, if you've read Tanenbaum, you know the solution to that
problem: If you can't make a function call, use messages instead, and package
them up as function calls. In other words, write a 16-bit Windows application
to call the 16-bit DLL, and send messages from the 32-bit Windows application
to the 16-bit application.)
There is also plenty annoying about this book. It oversells what seem like
minor innovations by the IBM development team, such as LDT tiling. It also has
the same padded lists of terminology in every chapter that Deitel uses in his
other books, and perfectly idiotic exercises. ("6.19: What are infosegs? What
kinds of information appear in the global infoseg? What kinds of information
appear in the local infosegs?") The authors should look at Tanenbaum's book to
see the right way to do end-of-chapter exercises.






















June, 1992
DESIGN FOR VISUALIZATION


Is it a communications tool or a planning tool?




Peter D. Varhol


Peter is an assistant professor of computer science and mathematics at Rivier
College in New Hampshire. He can be contacted through the DDJ offices.


Most CASE tools are meant just for software developers. They're designed to
provide a level and scope of information that only these professionals have an
interest in and a need for. However, developers must share their concepts and
designs with a larger community that has a legitimate need to understand the
application design, but may not have the technical background or patience to
follow a library of detailed software-design diagrams.
In a team project I recently became involved with, the users of the software
package we were developing were scientists, not programmers, and we faced the
problem of how to communicate ideas and progress, but without using the
traditional software-design tools. This meant choosing one or more design
tools that could be useful to the developers, yet provide important
information to the people who were paying for the effort.
Certainly it would be possible to use any good drawing package to create
high-level diagrams, flow charts, illustrative displays, and presentation
charts, but that effort would contribute nothing to the work of creating the
software. What we needed was a toolset that would communicate our ideas both
inside our team and to outsiders.


Data Visualization, Expert Systems, and More


The project was to design and develop a complex software package for the
setting up, running, and visual analysis of the results of several atmospheric
simulation programs. The application was to have several modules--an expert
system for choosing the appropriate program and setting up the input
parameters, a data visualization package for examining the simulation output,
a database of pre-computed results for fast estimates, and the simulation
programs themselves.
The package would be implemented initially in Microsoft Windows, and each
module would run as a separate task, communicating via Dynamic Data Exchange
or similar mechanism. Since many of these simulation programs require long
execution times, the separate-task approach would enable the user to start a
simulation run, visually examine the results of a previous run, and send
output to the printer.
These characteristics made the application much different from those that most
professionals have to develop, in that it was actually several tightly
integrated applications. These applications, or modules, had to communicate
both with the user and with one another. The expert system passed data to the
atmospheric-simulation programs as input parameters, while the programs passed
their results into the visualization module.
Another requirement was that the application would eventually be ported to the
Sun SPARCstation running UNIX and Open Look, and possibly to other UNIX/X
platforms. This meant that those parts of the application that would be system
specific had to be carefully separated from the code that could be reused
between these platforms.
The design tools used for this system had to be flexible enough to accommodate
the existence of and communication between the application modules, help
delineate the code that was system specific, and successfully communicate the
design concepts to our customers and end-user community.


How Meta Design Fit In


Meta Design was a good tool to assist with the design of this software, for
several reasons. Because it runs Microsoft Windows, it enabled me to pursue
the design and development under the same operating environment. More
importantly, several of the features available in Design directly supported
the unique needs of this project. For example, the ability to layer the
design, create more detailed diagrams on separate pages, link different layers
of the design, import graphics, and customize the design environment all
contributed to the initial design process.
Many of these capabilities are available in a number of CASE tools. However,
Meta Design is both more and less than a typical CASE package. Less, in that
the user does not carry around the baggage of one or more built-in design
methodologies that may be inappropriate for the problem at hand. One of the
biggest limitations of many CASE packages is that they are almost always built
around a particular methodology. If you buy the package, you buy into that
methodology.
Meta Design offers more in that it can be easily customized to support
multiple design methodologies. It comes with support for flow charts and
entity-relation diagrams, but the user can create new libraries for any number
of methodologies.


Setting Up and Using Meta Design


Meta Design supports multiple methodologies by using the concept of the
palette. A palette page can contain virtually any library of symbols,
diagrams, or graphics. A palette can be created in several different
ways--from the drawing tools within Meta Design, in an external drawing
package (even a traditional CASE tool), or even from a scanned image.
Design supported an important feature not found on most CASE design tools--the
ability to completely customize the palette of shapes and objects that could
be used in a document. A palette can be composed of one type of graphic, and
be used anywhere within a Design document. This allowed me to, in effect,
create my own design methodology, or to combine methodologies to suit the
needs of the different modules. Design has no built-in or predefined
methodology, although Meta Software itself has devoted considerable effort to
the Petri-net design approach.
Further, a single document could make use of multiple palettes. I didn't have
to worry about creating a new Design document every time I wanted to use
elements of a different design methodology. Combining different documents
would have proved unwieldy for a design effort of this magnitude, and instead
I was easily able to keep track of the overall system design in the same
document.


Design at Work


My effort on the project described above was devoted mostly to the design of
the knowledge-based system portion of the application. Novice users of the
application would be assisted though a rather lengthy dialog to determine what
atmospheric program or programs they should run, and how to set up its
parameters. More experienced users could choose the appropriate program
themselves and enter the parameters into a template.
I constructed the general design in a hierarchical fashion, much as a novice
user would traverse the knowledge-base menu structure. While not showing every
decision point (leaving that to the detailed design), I outlined the choices a
user would have to make in order to enter either the expert-system dialog or
the template for a particular program.
One feature of Meta Design that assisted during this process was the ability
to move blocks around on the screen and not worry about reconnecting the
blocks--the connectors follow the blocks anywhere they are moved. This enabled
me to quickly construct new architectures, save them for later examination,
and move on to new concepts.
An important part of the design concept was to have a one-to-one relationship,
inasmuch as possible, between major functions as seen from the user's point of
view and software components as seen from the programmer's point of view. This
was the most important part of my strategy for interacting with the customer.
The resulting diagram, then, is appropriate for the software designers to
begin a more detailed design, as well as for the customer, who wants to ensure
that all of the required functionality is included.
The primary Design diagram (see Figure 1) had multiple levels, each
representing a level of the actual software design. The top level was both the
top level of the overall design and the user's first view upon launching the
application. The next level (see Figure 2) defined the top-level
functionality--the expert system for setting up the problem, the data
visualization module, the database of precomputed results, and the system
maintenance facility. Subsequent diagrams go on to define lower levels of
functionality, including templates for data input, unit transformations, and
different graphical views of the output.


Putting Screen Designs in Place



As a last touch, I constructed several mock atmospheric scenes, such as those
that would be generated by the simulation programs, using the bitmap drawing
tools provided with Microsoft Windows' Paintbrush accessory. I then imported
these into Meta Design and installed them on pages linked to the design of the
visualization modules. As the user navigates through the design of the
software, these images will appear in those places where the actual visual
image will appear in the completed software. This gives the user an idea not
only of the high-level design, but also of the resulting outputs.
For the remaining parts of the diagram, I have created illustrative screen
designs that are linked in the same way. These, however, were created mostly
within Meta Design itself, using its drawing tools. The end result was not
merely a software design, but also a part of a functional specification. The
user can see the proposed displays.
Each display and atmospheric scene was simply copied from Paintbrush or other
graphics package and pasted into a palette page. Once in a palette, the
graphics could be pasted from the palette into the document. The result is a
multilayered, multimedia document that serves the creative juices of the
software designer and the curiosity of the customer.


Project Status to Date


The design of the simulation support and visualization application is well
along, and coding should begin shortly. The expert-system development
environments have been chosen, an overall system architecture has been
developed, and we've begun a number of knowledge-acquisition tasks. The
project team has also been prototyping the user interface.
During this process, the Meta Design documents have been useful to me in
keeping the overall design objectives in mind, and in communicating various
design features to others, both within and outside of the project. As
knowledge engineer on the project, my Design diagrams have been especially
invaluable in showing often-skeptical experts how the final product will work.


Advantages and Limitations


The one aspect of the Windows version of Design that I found lacking was the
ability to run a simulation on a Petri net diagram. This option is found in
the Macintosh implementation, but has not yet been ported to the PC.
Therefore, I made much less use of Petri nets in my design than I might have
otherwise. While I used a palette that contained some Petri-net symbols, there
was no way of actually running a Petri-net simulation using them.
Design is also not an appropriate tool for full-blown prototypes. My diagrams
were of high-level concepts rather than detailed operations, and there is no
convenient way of letting the end user navigate through a Design document as
though it were an application. We developed high-level user-interface
prototypes using Microsoft's Visual Basic, and knowledge-base prototypes with
Neuron Data's Nexpert Object.
One unexpected advantage I obtained in investing the time to create new object
palettes was that I can now use these palettes in future software and
system-design efforts. One project I am assisting with is an Actor
application. Using Coad and Yourdon's book, Object-Oriented Analysis (Yourdon
Press, 1990), I am in the process of extending one of my old palettes with
symbols appropriate to an object-oriented design effort.


Where the Developer and End User Meet


Granted, the designs for this project could not have been accomplished with
Meta Design alone. There was still a need for a traditional CASE tool to
incorporate the design into a more structured format, and to manage the data
dictionary and other important constructs. While it is possible to do these
things in Meta Design, the amount of effort involved in customizing the
environment would have been prohibitive. Our traditional CASE tool was
EasyCase, which like Meta Design provided a highly functional design
environment at a modest price.
Meta Design, however, bridged the gap between the software professionals and
the customers or end users. Our design concepts could be codified for those
who had to have an overall understanding of the application as it came to
life, while the end user could appreciate navigating through the design
structure to reach displays and atmospheric scenes.


Products Mentioned


Meta Design Meta Software Inc. 125 Cambridge Park Drive Cambridge, MA 02140
617-576-6920 $350.00 System requirements: MS Windows 3.0, 2 Mbytes RAM,
1-Mbyte hard disk, EGA or better






























June, 1992
OF INTEREST





Desqview/X, a graphical, multitasking, and windowing DOS environment, is
shipping from Quarterdeck Office Systems. Desqview/X brings workstation
capabilities to stand-alone and networked 386 machines. It features a
graphical desktop, scalable fonts, keystroke macros and customizable menus,
data transfer, and remote computing. It can run both DOS text and Windows
graphics programs in small windows or remotely on other Desqview/X PCs or X
workstations.
Desqview/X is the first DOS client/ server implementation of the X Window
system incorporating X Window-system graphics and network protocols and
Rational Systems' 16- and 32-bit shared DOS-extender technology. Desqview/X
comes with a graphical desktop and program organizer; a file manager for
either local DOS or remote DOS and non-DOS files; a graphics tool for creating
and editing pixmap and bitmap icons; and an outline font manager. The QEMM-386
memory manager and Manifest system and memory reporting tool are bundled with
Desqview/X.
The price for Desqview/X is $275.00, $100.00 for registered Quarterdeck users.
Reader service no. 21.
Quarterdeck Office Systems Inc. 150 Pico Boulevard Santa Monica, CA 90405
310-392-9851
Autodesk has released HyperChem, molecular-modeling software that allows you
to build, analyze, and manipulate three-dimensional molecular structures on
the desktop, running under Windows. Besides building and displaying molecular
structures, Hyper-Chem can be used to investigate the reactivity of molecules,
evaluate chemical pathways and mechanisms, study the dynamic behavior of
molecules, and construct proteins and nucleic acids. It offers a selection of
classical and semi-empirical quantum-mechanical computational methods.
Additionally, HyperChem features an open architecture to facilitate extending
and customizing the software.
The suggested retail price is $3500; educational pricing is $595.00. Reader
service no. 22.
Autodesk Inc. 2320 Marinship Way Sausalito, CA 94965 800-424-9737
TCXL 6.0., a CUA-style user-interface toolkit, has been announced by
Innovative Data Concepts. TCXL has an event-driven architecture combined with
virtual windows, virtual memory (supporting EMS, XMS, VCPI, and DPMI) dialog
controls, mouse support, and more. TCXL 6.0 allows you to build Windows
applications and a C++ front end and contains more than 500 multipurpose
functions.
TCXL supports C and C++ compilers from Borland, Microsoft, TopSpeed, Zortech,
Watcom, Intel, and MetaWare. It is available for DOS, 286 and 386 protected
mode, Windows, OS/2 Presentation Manager, and various UNIX platforms.
The retail price of TCXL for DOS is $99.00. Reader service no. 23.
Innovative Data Concepts Inc. 122 North York Road, Suite 5 Hatboro, PA 19040
215-443-9705
A new version of Dolphin Encrypt, a private-key encryption system that uses an
8-60 character encryption key, is available from Dolphin Software. Dolphin
Encrypt can encrypt multiple files in a single operation without limit on
size, type, or number of files. Files are compressed during encryption.
Dolphin Encrypt can be run from a batch file to encrypt or decrypt multiple
files in multiple subdirectories with a single command. You can encrypt to
binary, text, or a script language which allows complex and conditional
encryption operations. A recoverability option is included to allow the
encryption key to be recovered if it is lost or becomes unavailable.
The encryption algorithm encrypts files at about 8 Kbytes per second and
produces encrypted data indistinguishable from random bytes. This is available
separately as a C-function library.
Dolphin Encrypt sells for $195.00. A decryption-only version that can be sent
to recipients of encrypted material is sold separately. Reader service no. 24.
Dolphin Software 48 Shattuck Square #147 Berkeley, CA 94704 510-464-3009
Advanced Technology for Developers is a new monthly newsletter for developers
edited by Jane Klimasauskas and published by High-Tech Communications. The
newsletter will feature, among other topics, articles on time-series
forecasting; selling neural computing technology; building neural networks to
handle unusual events; methods for transforming data that improve performance;
neural networks applied to database retrieval; genetic algorithms; consequence
theory; fuzzy logic; neurogenetic algorithms; database design for advanced
technologies, and statistical process control.
Domestic subscriptions cost $99.00 per year; international subscriptions are
$179.00. Reader service no. 25.
High-Tech Communications 103 Buckskin Court Swickley, PA 15143 412-741-7699
Otter Research has released AUTODIF, an array-language extension to C++.
AUTODIF incorporates the reverse mode of automatic differentiation to
calculate the derivatives of a function of one or many variables. Derivative
calculations take no more than five times the time needed to evaluate the
function itself, even with hundreds of independent variables, and derivatives
are calculated to the same precision as the function itself.
Using AUTODIF's vector and matrix classes, you can produce tight, efficient
code for rapid prototyping of nonlinear models. There are no restrictions on
allowable code constructions; AUTODIF features backward compatibility with C
code so that existing numerical routines written in C can be incorporated in
minutes. Included are methods and examples for implementing robust nonlinear
regression techniques; sample code for defining and training a feed-forward
neural network with hidden layers and nodes; and Quasi-Newton and
conjugate-gradient function-minimization routines.
AUTODIF for Borland and Zortech C++ costs $149.00; for the cfront C++
translator for SPARCstations the price is $299.00. Reader service no. 26.
Otter Research Ltd. P.O. Box 265, Station A Nanaimo, BC V9R 5K9 Canada
604-756-0956
Now shipping from The Periscope Company is Periscope/32 for Windows, a
source-level debugger for Windows virtual device drivers, Windows device
drivers, DOS device drivers, programs running in the DOS box, and other
system-level software running in Windows enhanced mode. Periscope/32 runs on a
host DOS system connected to a target Windows system, providing the stability
necessary for system-level debugging.
Periscope/32 operates at the systems level and is compatible with
applications-level debuggers. It aids in determining the cause of
Unrecoverable Application Errors and offers easy access to any memory location
in the system.
Periscope/32 is priced at $445.00. Reader service no. 27.
The Periscope Company Inc. 1197 Peachtree Street Atlanta, GA 30361
800-722-7006 or 404-875-8080
Lahey Computer Systems is shipping F77L-EM/32 5.0, a 32-bit Fortran compiler
that incorporates the Phar Lap 386/DOS-Extender (which supports VCPI, XMS, and
DPMI). The DOS-Extender enables the compiler to operate entirely in protected
mode so that you can build multi-megabyte Fortran applications that can access
all the memory available on a given machine.
The new version has an updated debugger with an intuitive user interface,
faster performance, and features that allow you to break and trace execution
at specified Fortran statement labels. Arrays can be as large as 2 Gbytes, 255
files can be opened, and substring bounds can be checked. A 486 optimization
switch has also been included.
The compiler (which has an interface to MetaWare High C) sells for $1195;
upgrades are $250.00. Reader service no. 31.
Lahey Computer Systems Inc. 865 Tahoe Boulevard Incline Village, NV 89450
702-831-2500
New from The MathWorks is MATLAB 4.0, an integrated environment that merges
numeric-computation software and a family of application-specific toolboxes
with new graphics capabilities. MATLAB allows you to analyze and visualize
data; prototype, analyze, and optimize engineering system designs and
algorithms; explore new concepts in scientific research; create mathematical
models and solve systems of equations; and perform general engineering and
scientific computations. Sparse matrix support, flexible file I/O, additional
debugging tools, and sound output are new to the version as well.
Also available from The MathWorks is SIMULAB, model-building and simulation
software for Windows. SIMULAB is an interactive, intuitive, computer-aided
engineering tool for dynamic system simulation targeted at electrical,
mechanical, chemical, aerospace, and automotive engineering applications.
SIMULAB comprises: a set of tools for modeling and analyzing dynamic systems,
including linear, nonlinear, continuous-time, dicrete-time, hybrid, and
multirate models; an environment for describing systems graphically in
block-diagram form or mathematically by differential and difference equations;
hierarchical model structures; on-screen simulation that lets you modify
system parameters and solver methods interactively during a simulation;
transferring and exchanging of SIMULAB models between platforms; and an
expandable built in block library of functions and the ability to include
existing models using C, Fortran, or MATLAB code.
MATLAB costs $2995; SIMULAB is $3995. Reader service no. 29.
The MathWorks Inc. Cochituate Place 24 Prime Park Way Natick, MA 01760
508-653-1415
More Details.


The DDJ Handprinting Recognition Contest


Last month we announced that DDJ is conducting a first-ever contest of
handwriting-recognition software. Here's an update with some truly exciting
news: Apple Computer is providing a PowerBook 100 as first prize. Many
observers agree that the PowerBook series represents the state of the art in
mobile graphical computing, so it's only fitting that a contest involving
emerging technology begin with the current best of the breed.
Recapping some of the contest details; the contest officially begins on June
15th, when source code, test data, and contest entry blank will be available
electronically and by mail. Some components are available now, so contact us
now for more information. The deadline for submissions is September 15th.
We'll announce a winner in our December 1992 issue.
Remember that you don't need a pen computer or pen operating system to
participate in the contest. We've built a platform-independent test harness
that, in the most general case, allows you to plug in your C function and
check the result.
The core code of the test harness consists of 200 lines of C, and was written
by Ron Avitzur to run as a batch process using the standard I/O library
functions. This code has been tested on the Macintosh, SPARC, and DOS
platforms. On the PC, the code compiles with both Borland and Microsoft
compilers.
The harness processes a data file containing stored "ink" (stylus data
points). The data in this file represents characters that were captured with
an interactive program, from at least a dozen different individuals. The test
alphabet consists of alphanumeric characters and some punctuation marks. There
are multiple instances of each character, some of which are used for training
the recognizer (if your engine requires this), and the rest for testing.
During the training phase, the test harness reads multiple instances of a
character and, for each instance, calls your recognizer's training routine,
Train(). Then, during the testing phase, the harness calls your recognizer's
recognition routine, Guess(), with a different set of instances of each
character. Both Train() and Guess() are passed pointers to an in-memory data
structure representing the strokes that compose a character. Your routines
must know how to parse this data structure. To serve as a guide, we are
publishing both the test harness and a sample recognizer that works with that
harness. The harness, sample recognizer, and a small data file are all
available now.
Of course, it's more fun to work with the system interactively and to visually
inspect the data that the system is munching on, but this is not essential to
the process. Ron has implemented an interactive ink-capture program that
currently runs on the Macintosh and assumes the Wacom digitizer. A version of
this program appeared with Ron's article in the April issue of DDJ. That
version stored ink data in processed form rather than as raw-data points; now
available is an updated version that reads and writes raw ink data files. We
are in the process of writing an equivalent program for Microsoft Windows, to
be ready by June 15th.
--Ray Valdes

































































June, 1992
SWAINE'S FLAMES


The First Annual Swaine's Flames Brain Games




Michael Swaine


Welcome to the First Annual Swaine's Flames Brain Games. The Games consist of
a number of Events, each with its own rules: the Acronym Event, the Anagram
Event, the Palindrome Event, and a Bonus Question thrown in as a last hope for
those who can't figure out any of the other Events. Read the rules, tackle the
Events, and good luck!


Acronym Event


Decipher the acronyms, determining the word that each letter stands for. A
letter may stand for more than one word, as in ASCII or SCSI. The first
acronym is a familiar one and the others are all derived from it. Each acronym
has something to do with the software industry, and each is followed by a
clue. Two more clues: "S/W" stands for "software" and "F" may stand for
"flaming."
RTFM (universally ignored advice) RTFS (same advice, but for engineers) RTFB
(advice for novice users of Macintosh System 7) RWFM? TFRM, TFUM, OTFG-SM?
(choices, choices) IAEF, RTFM (last-resort advice) YWTFS/WRSN&IWTFMASAP (said
the documenter to the developer) IRTFM&ISCUTFS/W (why the advice is
universally ignored) ICRTFM; T1WWTFS/WWTFM2 (why programmers should program)
FTFM&FU2 (inflammatory response) RWTFM? TJOS, TKR, OTKS? (maybe "F" doesn't
always stand for "flaming")


Anagram Event


Unscramble each anagram to form a word or phrase related to the software
industry, being sure to use all the letters.
Real friend or nuts? Magic land of lost tribes. Comment in jest by MIS; grad
gets poorer.


Palindrome Event


Here the goal is to generate new palindromes, strings of characters that make
some kind of sense, have something to do with the software industry, and read
the same backward and forward. As indicated by the examples provided, they
don't have to make a lot of sense. Spaces, capitalization, and punctuation can
be ignored. Some notes on the example palindromes: XINU and GNU are species of
UNIX; LOGLAN is a logical language and ALGOL an archaeological one; and DRI
doesn't really have a BASIC.
Ada, sides reversed, is Ada. XINU = GNU hung = UNIX. LOGLAN is in ALGOL. C is
a BASIC. C is a BASIC is a BASIC is a BASIC is a BASIC is a BASIC. C is a
basis, as is a BASIC. C is a Bastille. Hell, it's a BASIC. C is a bird. (No
put-on, miss: I'm not up on DRI BASIC.)


Bonus Question


Why is Christmas the same as Halloween?


Stay Tuned for the Results


Solutions to the Acronym and Anagram Events and the Bonus Question will be
published in a random future issue of DDJ. All entries must be postmarked by
midnight. No prizes will be awarded, but bad answers may be ridiculed in print
and good answers stolen for next year's Games. The decisions of the judge are
infallible.















July, 1992
July, 1992
EDITORIAL


Lights, Camera, Legal Action!




Jonathan Erickson


It's hard not to find someone jumping on the multimedia bandwagon: end-users
wanting flashy new applications; hardware vendors hoping to pump up flat
sales; and programmers developing tools and applications that put full-bore,
full-motion video and full-fidelity sound onto a PC.
But as promising as it is, the road to multimedia is littered with landmines.
By definition, multimedia involves a variety of information forms--audio,
motion video, still photography, text, graphics, and more--which can be
integrated into an application using electronic cut-and-paste. Technically,
this isn't a problem. Legally and ethically, it's a mess.
Try this scenario: Let's say you have a multimedia authoring system that
enables you to quickly build a presentation like that on the cover of this
issue describing the 80486. The application includes full-motion video taped
from an Intel television commercial, still-frame illustrations scanned in from
a book, and text descriptions. Further assume that the commercial you
videotaped includes a popular song performed by a renowned recording artist,
and an equally famous actor pitching the chip.
While today's authoring tools let you quickly put together a sophisticated
presentation like this, the problem remains that it still might take you
months to get permission to use the information. For starters, you'd go to
Intel for the okay to use the commercial. Assuming the company gives you its
blessing, you'd likely next have to contact the recording company, composer,
and singer for permission to use the music. Luckily, music-related
organizations like ASCAP, BMI, or others might help you here. After that,
you'd track down and negotiate with the actor, the book publisher (and
author), photographers, and so on. What you've learned in the process is that
virtually all of the information forms--literature, music, drama, pictorial,
video, and sound--used in even the most minimal multimedia project are
protected as intellectual property, and you legally and ethically need
permission from the copyright holder to use them. And once the copyright
issues are attended to, you might have to then deal with trademarks, patents,
and trade secrets.
After going through all this, you may consider alternatives: works in the
public domain (beware, however, of movies in the public domain that contain
songs covered by underlying copyrights), going to stock houses for photos,
film, music, and sound effects, or contacting a rights and permissions agent
to do the legwork for you. Or you may decide that it's faster and less
expensive to produce the entire application--including sound and video--from
scratch.
Jumping through the permissions/cost hoops scares the big guys, too. When
Microsoft was looking to team up with a book publisher for its multimedia
application development, it chose Dorling-Kindersley (publisher of David
Macauley's wonderful book The Way Things Work) because D-K owns full rights to
its illustrations, photographs, and text. And IBM, which is heavily committed
to multimedia, is reportedly ready to buy a piece of the Time-Warner rock,
providing ready access to movies, TV shows, and music. Ditto Sony's purchase
of Columbia Pictures and CBS Records.
Developers aren't the only ones who might get stung. For instance, I and every
other editor in the country receive stacks of press releases that include
text, photos, and illustrations every day. There's never any problem with
using this material for editorial purposes--that's why it's sent out. In fact,
it's generally assumed that press release info is released into the public
domain. Within the past couple of years, however, the latest rage is to send
out "video" press releases--ten minute VHS tapes instead of printouts and
photos. The question in my mind concerns the rights being conveyed with this
tape. Is it purely for information purposes, or is it, like its paper
counterparts, being released into the public domain? Does the company who sent
out the video own all rights to its contents, or are there underlying
intellectual property rights pertaining to actors and sound? Can I take clips
from one of those tapes and use it in a commercial multimedia application? Can
all of these video press releases be compiled into a public-domain video
library?
There are changes on the horizon. One suggestion is support for public and
private media-specific archiving, distribution, and licensing organizations.
The music industry has BMI and ASCAP, magazine publishers have the Copyright
Clearance Center, there's Valentino (and others) for sound effects, the
Picture Network International for electronic images, and The Image Bank for
still photos and film footage. This whole area is what they call a "growth
industry."
Furthermore, proposed legislation before both the U.S. House and Senate would
automatically extend the copyright of pre-1978 works for another 28 years.
(Works copyrighted prior to 1978 currently expire unless authors renew the
copyright.) The good news is that this eliminates the question of copyright
(you assume that everything is still copyrighted), thereby saving you time;
the bad news is that the flow of works moving into the public domain is
severely choked off.
In short, the per-project royalty and rights costs associated with commercial
multimedia development can make it an expensive proposition. How we deal with
these issues will make or break multimedia.






































July, 1992
LETTERS







DOS Device Drivers


Dear DDJ,
I read with interest Jim Kyle's article, "Loading Device Drivers from the DOS
Command Line" (DDJ, November 1991), as I had written such a utility in the
past. What I found during my experience may prove to be useful to others
planning to make use of such a program.
When loading device drivers which are to control system files/devices, and
which are also likely to notify DOS of critical error occurrences (PRN and
AUX, for example), supplemental logic is required to install such character
devices safely. When a driver meeting the above criteria returns an error to
DOS, the user is presented with the Abort, Retry, Ignore/Fail message. If the
retry option is chosen, control returns to the driver that is pointed to by
the driverptr field in the SFT (system file table) entry for the corresponding
file/device, which may or may not be the driver which notified DOS of the
error. When DOS loads a driver via a config.sys device=entry, it sets the
driverptr field to the segment: offset of the driver's address and the
starting cluster field to the offset of this address also. Listing One, page
149, demonstrates the method of overcoming this problem. A call to this
routine should be inserted in the GetOut routine for character device
installations.
I hope some of you find this a helpful addition to this handy utility.
Dan Winter
Stratford, Ontario


Sound Suggestions and Fourier Facts


Dear DDJ,
Listing Two, page 149, shows my quick and dirty approach to the speech-to-PC
speaker discussion that began in the January 1992 "Letters" column. It was
assembled using the Microsoft Assembler, and was tested in Turbo C++. Its C++
function prototype is:
 extern "c" void far VoiceTrk(unsigned char huge *strt, int leng);
where strt contains the start location of the sample bytes and leng is the
number of sample bytes in the track. The audio track was digitized using a
Sound-Blaster Pro sound board.
The function establishes a sample period which is however long it takes my
386SX to count down from 32 in the outer loop. With the processor running at
16 MHz, this results in a sample period of approximately 6300 Hz. Pulse-width
modulation is accomplished by turning the speaker on at the beginning of the
sample period, and turning it off when the value of the sample, truncated to 5
bits, has been counted out. The speaker then remains off for the remainder of
the 32 counts in the sample period.
Note that interrupts are suppressed with the CLI instruction before playing
the audio track and set afterward with STI. This temporarily inhibits the
18.2-Hz time-of-day interrupt which otherwise produces a warble when music is
played. I have used this function to play speech, as well as Bach, with
tolerable success considering the relatively poor quality of the speaker
system.
I would also like to comment on Mac Cody's April 1992 article, "The Fast
Wavelet Transform." Mac makes several erroneous statements about the Fourier
transform, perhaps confusing it with the Fourier series.
The Fourier transform makes no assumptions about the periodicity of the
function being transformed. This transform was originally developed for
probability theory. The Fourier transform applied to a correlation function
produces the spectral density function for a given random process. Correlation
functions are not, in general, periodic. It therefore has no difficulty with
transient functions. One of the most often seen transforms is that of the
rectangular pulse centered at the origin, which gives the well-known (sin x)/x
spectral function.
Also, sharp transitions do not prevent the integral from converging if the
function being transformed satisfies the Dirichlet Conditions, which require
that the function have a finite number of finite discontinuities, and a finite
number of finite maxima and minima, in any finite interval of the independent
variable.
Similarly, the Fourier transform takes into account any translation of the
function in time. If a time function f(t) has a transform F(w), then the time
function translated in time is f(t-a), and its transform is exp(-jwa)F(w).
Digital algorithms designed to implement the Fourier transform may suffer from
the problems Mr. Cody describes, but the transform itself does not.
Harold M. Martin
Houston, Texas
Mac responds: Mr. Martin brings up some valid points with respect to the
Fourier transform vs. the Fourier series (and for that matter, the discrete
Fourier transform, or DFT). In digital signal processing applications, the
term "Fourier transform " is often synonymous with the fast Fourier transform
(FFT) algorithm. Please excuse my slip in the use of terminology. The
assumption of periodicity of a function indeed lies with the Fourier series.
The Fourier series is closely related to the DFT. Both share the same
periodicity assumption. The FFT is simply an efficient algorithm for the
calculation of the DFT. Therefore, the issue of implied periodicity is real to
those who use the FFT.
The problem that the Fourier transform (as well as the Fourier series, DFT,
and FFT) has with the treatment of transient (especially discontinuous)
functions is that a transient event in the original function results in
nonzero coefficients throughout the Fourier transform as a distribution of the
function's energy throughout the Fourier spectrum. This tends to "mask" the
existence of the transient elements of the original function.
In the original draft of my article, I illustrated the differences between the
Fourier transform of a continuous sinewave vs. a sinewave pulse. The Fourier
transform of the sinewave is a single nonzero coefficient at the frequency of
the sinewave. On the other hand, the Fourier transform of a sinewave pulse is
the (sin x)/x function centered on the frequency of the sinewave. The Fourier
transform does not intuitively convey the nature of the sinewave pulse, which
is a single frequency existing for a finite length of time. I am not saying
that the transform is incorrect; according to Fourier theory, it is correct.
What I am saying is that the process of mapping transient events into spectral
distributions hides or embeds them along with the rest of the spectral content
of the transform. The transient components cannot then be easily discerned
within the spectrum.
Mr. Martin's reference to the shift theorem of the Fourier transform
[translation of a function, that is, f(t-a), yields the transform
exp(-jwa)F(w)] is also valid. Most applications of the FFT, though, use the
magnitude of the transform, which is invariant with time shift. Use of the
unmodified real and imaginary components (or the phase) of the transform to
maintain the time relationships is possible but inconvenient to use. In
addition, the information related to specific transients is still buried
within the body of the transform.
While it is true that the Fourier transform itself may not suffer from some of
the problems described in my article, and addressed in Mr. Martin's letter,
the digital algorithms which implement the Fourier transform do. These
algorithms are used in real-world digital signal processing applications
(which is, ultimately, what we're concerned with) and the user must recognize
and contend with their limitations. The ability to overcome these limitations
is what makes the wavelet transform and its fast algorithms so attractive.


GATT Real


Dear DDJ,
The April 1992 editorial, "Born in the USA," misstates the truth about GATT.
Apparently Mr. Erickson thinks that when the revised GATT is presented to
Congress there will be no opportunity to debate the merits of software
patents.
Most legal experts agree that software patents were made possible by the 1981
Supreme Court decision Diamond vs. Diehr. The Uruguay Round of GATT
negotiations began over eight years ago. Thus, the software industry had ample
time to lobby the President and the U.S. Trade Representative for changes in
U.S. law or GATT. Congress had plenty of time to revise U.S. patent law.
Neither the industry nor Congress have acted.
This inaction indicates that intellectual property owners, and the majority of
the software industry, support patents in the U.S. and abroad as a way to
preserve U.S. market share and protect America's intellectual assets. GATT
will not "force sweeping changes" on the U.S.; it will harmonize intellectual
property law, enabling continuing protection of America's creative,
economically valuable software engineering community.
Christopher J. Palermo
Patent Attorney
Pasadena, California
Jonathan responds: Thanks for the clarification, Christopher. I've stated
several times that the problem is less with the issue of intellectual property
rights, but more with the way the patent system works. It seems that the only
parties satisfied with the current state of affairs are lawyers who appear to
be profiting the most from it.


Japatent Challenge



Dear DDJ,
This letter is directed to those Master Programmers who either believe they
have successfully met every conceivable challenge, or that they can. I have
one for you. I'll bet a steak dinner that no one in the world has or can do
this one within the next six months. First one to successfully respond to my
satisfaction gets the steak dinner--there'll only be one of those! But there
may be alternate prizes for subsequent responses.
The challenge is Japanese patents; more specifically, putting a U.S. patent in
a form such that it can be filed in the Japanese patent office in Tokyo with a
reasonable hope of acceptance. It seems they still insist on the use of
Japanese language characters, even though many technical terms are simply
direct phonetic translations into Katakana or Hiragana characters. Maybe
someday they will get around to permitting filings in English; until then,
non-Japanese inventors are stuck with some very unpleasant facts: Either learn
Japanese or use an expensive law firm if you want to have coverage in Japan.
Those facts tend to shut out everyone but large corporations.
Proposed solution: A computer program that will translate from a U.S. patent
on disk to produce a Japanese patent application on disk. I understand that
Tokyo laws allow patent applications to be on floppy disk, as long as they are
in proper format and in the Japanese language. The format itself does not
appear to be much different from ours.
There are some formidable problems, such as accommodating the double-byte
Kanji characters. Another is language syntax. Hey, if it requires a
supercomputer, just say so; but I may ask for proof!
Partial or alternate solutions will be considered. Any takers?
Homer B. Tilton
Tucson, Arizona


Swap Scene


Dear DDJ,
Regarding Greg Renzelman's SWAP macro in the April 1992 "Letters" column,
don't use it in an If, For, Do, or While clause, or strange bugs may occur. I
do admire the clever use of the XOR operator, but the macro needs a bit of
modification to work properly. To see what the problem is, consider Example
1(a) (the larger number goes in b), which expands to Example 1(b).
Only the a^=b; binds with the If; the b^=a; a^=b; is always executed. The
solution is to rewrite the macro definition as in Example 1(c). And by the
way, I wouldn't have noticed the bug either if I hadn't read about a similar
case in Alan Holub's superb Compiler Design in C (Prentice Hall, 1990); it's
on page 786, in case anyone wants to look it up.
Martin Bohme
Bensheim, Germany
Example 1

 (a)

 if (a>b)
 SWAP (a,b)

 (b)

 if (a>b)
 a^=b; b^=a; a^=b;

 (c)

 #define SWAP (a,b) (a^=b, b^=a, a^=b)

Dear DDJ,
Regarding C Language Q&A #36 (see "Letters," April and June 1992), it seems
that most of the difficulty lies in the temporary variable; there is no
obvious data type to use, and it is awkward to introduce a new scope
arbitrarily in a program. If we could eliminate the need for a temporary
variable, we should be able to do it.
Thinking back to a year or so ago, I recall having read an article which
mentioned an old mainframe (IBM/360?) that provided a machine instruction that
swapped two regions of memory using XOR. After reading the article, I began
experimenting with the technique in C. I first came up with the following
series of instructions for swapping two values:
 a^=b; b^=a;
 a^=b;
Noticing that C allows me to express this more compactly, I came up with the
following C macro: #define SWAP(a, b) (a^=b^=a^=b).
The beauty in using this technique is that the macro swaps two values without
using a temporary variable! Of course, it is subject to the usual cautions
about macros (e.g., don't say SWAP (++x,y)), and it only makes sense when
sizeof(a)==sizeof(b).
For the skeptics, here is a proof that the macro actually works. The variable
b is effectively set to b=b^a^b. Since XOR is associative and commutative, we
can rewrite this as b=a^(b^b). Simplifying using the identities (x^x)=O and
(x^O)=x, we get b=a^O=a. Similarly for a, we can write a=(a^b)^b^(a^b)=b^
(a^a)^(b^b)=b^O^O=b. (Note that we did not write a=a^b^(a^b) here--this would
have been incorrect since the value of a changes early in the evaluation of
the first macro at a^=b).
Bill Wilder
Needham, Massachusetts
_LETTERS_


[LISTING ONE]

void PatchSFT ( void )
{ char *sft,
 *next_sft;
 unsigned num_files;
 int found;

 // get first SFT block
 _AH = 0x52;

 geninterrupt ( 0x21 );
 sft = * (char ** ) MK_FP ( _ES, _BX + 4 );

 // searchthru till end of block chain
 while ( sft != ( char * ) 0x0000ffff ) {
 next_sft = * ( char ** ) sfrt;
 // each block indicates the
 // number of elements within
 num_files = * ( int * ) ( sft + 4 );
 sft += 6;
 // search thru this block's
 // elements looking for match
 do {
 found = strncmp ( drvptr + 10, sft + 32, 8 );
 sft += _osmajor < 4 ? 53 : 59 ;
 } while ( --num-files && found !+ 0 );
 sft = next_sft;
 }

 if ( found == 0 ) {
 ( char ** ) ( sft + 7 ) = drvptr;
 ( unsigned * ) ( sft + 11) = FP_OFF ( drvptr );
 }
}





[LISTING TWO]

; as generated by SoundBlaster as a pulse width
; modulated signal on the internal IBM speaker,

VoiceTrkParms struc
 dw 4 dup (?) ; pushed BP, ES and return address
StartOffset dw ? ; offset of address of voice track buffer
StartSegment dw ? ; segment address of voice track buffer
;StartOffset dw ? ; offset of address of voice track buffer
TrackLength dw ? ; Length of voice track
VoiceTrkParms ends
 .model large
 .code
 public _VoiceTrk

_VoiceTrk proc far
 push bp
 push es
 mov bp,sp
 mov si,StartOffset[bp]
 mov es,StartSegment[bp]
 mov cx,TrackLength[bp]
 in al,61h
 push ax
 mov ah,1Fh
 and al,0FCh
 out 61h,al
 mov bp,0
 cli

OuterLoop:
 mov dh,0
 mov dl,es:[si+bp]
 test dl,ah
 jz InnerLoop
 or al,2
 out 61h,al
InnerLoop:
 cmp dl,dh
 jne NoShutDown
 and al0FCh
 out 61h,al
NoShutDown:
 inc dh
 cmp dh,32
 jne InnerLoop
 inc bp
 loop OuterLoop
 sti
 pop ax
 out 61h,al
 pop es
 pop bp
 retf
_VoiceTrk endp
 end




































July, 1992
CAPTURING DIGITAL VIDEO USING DVI


Multimedia and the i750 video processor


 This article contains the following executables: AVKCAPT.ZIP


James L. Green


James is a senior software engineer at Intel's multimedia and supercomputing
components group in Princeton, New Jersey. He is one of the principal
architects of the audio video kernel and is a member of the Interactive
Multimedia Association's technical working group on multimedia software
architectures. You can reach him through the DDJ offices.


The DVI multimedia tools, developed by Intel and IBM, provide application
developers and users with a highly integrated set of multimedia capabilities.
The ActionMedia II delivery board available for ISA and Micro Channel bus PCs
can be used to play digital audio and video data on desktop PCs running DOS,
Windows, or OS/2. ActionMedia II cards utilize the i750 video processor to
perform real-time encoding and decoding of digital video images and come
configured with two megabytes of video memory (VRAM). The system software used
to enable these capabilities under Windows and OS/2 is called the "audio video
kernel" (AVK). AVK provides control over digital multimedia elements such as
audio, video, and still images.
By attaching the optional capture module to the Action-Media II delivery card,
applications can capture and compress audio and video data. All of the analog
signals (both audio and video) enter the system via an 8-pin mini-DIN
connector located on the delivery board. A variety of video signals (Y-C, RGB,
and Composite) are supported, as well as stereo audio. The capture subsystem
performs analog-to-digital conversion of the source signal and deposits the
data into VRAM. Digitizing, compressing, and displaying are independent events
under the control of the software. This enables a variety of data-flow
scenarios. For example, the data can be digitized and displayed without
compression (laser-disc emulation), or the data can be digitized, compressed,
and stored (or transmitted) without displaying (video mail/teleconferencing).


Touring the AVK


AVK is a set of OS-independent, dynamically linked libraries that provide
applications with a collection of components similar to those found in a
recording studio. These objects can be configured in various ways for
manipulating multimedia data. All AVK function calls take the form
AvkObject-Method(Params). The objects defined in the AVK programming interface
are: groups, buffers, streams, views, images, and connectors.
An AVK group is the unit of control synchronization and is analogous to the
tape-transport functions of a tape deck. Group calls include starting,
pausing, and recording. A group buffer is the digital representation of a
tape; an area in VRAM used as a temporary repository of compressed audio and
video data. Since the audio and video data is often interleaved, a group
buffer can contain multiple streams of data as long as they all play at the
same rate, just as all the tracks on an analog tape must pass the tape heads
at the same rate.
A stream is analogous to a track of audio or video data. While a motion-video
sequence is physically delivered as a series of consecutive frames, it can be
viewed as a logical stream of data. A video stream is implemented as a
circular array of bitmaps. While capturing, the digitizer on the capture
module places each frame into one of the bitmaps, while the encode task
running on the i750 video processor compresses each frame and places it into
the group buffer. The audio data is handed to an audio DSP for encoding before
it is placed into its own group buffer.
Another AVK object, called a "view," implements the notion of a video monitor.
A view is a special kind of bitmap that can genlock to the host display
system, allowing DVI video and standard VGA/XGA graphics to be mixed on a
pixel-by-pixel basis. Views also include a collection of rectangular visual
regions called "boxes" which are mapped into windows by the application. Video
streams and still images are typically the sources of these visual regions.
The concept of the view is analogous to a visual "mix." Applications can
create and maintain multiple views and select the view to be monitored on the
display.
If the group is a tape deck, and the view is a monitoring system, then there
needs to be a way to connect them. This is handled by an AVK object called a
"connector," analogous to a channel on an audio/video mixing board. It has an
input ("source"), an output ("destination"), and parameters for altering the
data in real time. Video streams, images, views, and the digitizer can all be
connected in various configurations depending on the application's
requirements. At its simplest, the connector is a higher-level abstraction of
a bitmap copy operation. Connectors allow boxes to be defined for the source
and destination bitmaps. The size of the boxes can be modified in real time to
allow resizing and relocating of images to support windowing. Connectors
behave differently, depending on which objects are used as their source and
destination. For example, if the source is a video stream and the destination
is a view, the connector will copy each frame automatically based on the frame
rate. If the source is an image and the destination is a view, the connector
will perform a single copy. Connectors also provide control for scaling,
cropping, and adjusting the tint, contrast, saturation, and brightness of the
image.


Capturing Digital Video


AvkCapt is a Windows program that captures video and audio from an analog
source using Intel's ActionMedia II board set. AvkCapt allows you to monitor
the analog source and capture the audio/video data to a file. You can enter a
filename, turn monitoring on and off, and turn capturing on and off by making
selections from pull down menus. When you begin monitoring, AvkCapt digitizes
the data, sending the audio out to the speakers (attached to the delivery
board) and the video to the computer's display screen. When capture is toggled
on, AvkCapt begins compressing the incoming data and writing it out to a file.
The audio and video data is compressed using different algorithms (see the
sidebar "Data Compression and the AVK"). The video is compressed using the
real-time video (RTV 2.0) algorithm at a resolution of 128x240 pixels by
30-frames per second (NTSC) or 128x244 pixels by 25-frames per second (PAL).
The AVK function AvkDeviceVideoIn() allows the application to determine the
type of source video. RTV doubles the number of horizontal pixels on playback,
resulting in a 256x240 (NTSC) or 256x288 (PAL) video. Using NTSC as the
example, if this video is displayed using a 512x480 view, the result will
appear in a quarter-screen video window. If a 256x240 view is used, the video
will appear full screen. However, since the horizontal resolution of the
capture stream is only 128 pixels, we can't show the monitored video at
full-screen size on the 256x240 view. This is because the current version of
AVK can't scale the video up in real time using a connector. In AVK, if the
resolution of the destination is larger than the source, the video will be
displayed in the upper-left corner of the destination box. Therefore, AvkCapt
uses a fixed window of 128x120 pixels to display the monitored video.
The source to the AvkCapt program is longer than can be reproduced here. I
can, however, describe the three main aspects of the program apart from the
GUI interaction: configuring the AVK objects for recording audio and video
data, controlling the flow of data through the system, and writing the
compressed data to disk.
More Details.


Building the Recorder


As described above, AVK provides a collection of components that can be
configured in a variety of ways. For our purposes, we need to build an
audio/video recorder. Figure 1 illustrates the configuration used by the
AvkCapt program. In any AVK program, the first steps include initiating an AVK
session and opening the ActionMedia II board. The function InitAvk() in
Listing Two, page 90 begins an AVK session with a call to AvkBeginMsg(). This
function takes the application's window handle as one of its parameters. AVK
will send messages to this window to notify it of various events. Next the
board is opened for the exclusive use of the application with a call to
AvkDeviceOpen(), and finally, a request is sent to the device to identify the
capture sync of source video connected to the digitizer. AVK will respond to
this request by sending an AVK_IDENTIFY message to the application window. The
capture sync will be returned as part of the 32-bit parameter to the message.
The application's main window procedure intercepts this message, and the
capture sync is passed as a parameter to the CreateAvkResources() function.
This function builds the recorder by creating and formatting the appropriate
AVK objects.
CreateAvkResources() calls a number of other functions that do the real work.
The GetDevCaps() function uses AvkGet-DevCaps() to retrieve the device
capabilities from the AVK.INI file. One of the attributes retrieved by this
call is the DviMonitorSync, which is used to decide on the type of the AVK
view and the x and y resolutions. Once this value is known, the attributes of
the view-control structure can be defined. Since more than one monitor choice
can be bitmapped in DviMonitorSync, we default in whatever order most suits
the specific application's needs. In this case, we let VGA take precedence
over XGA if both are indicated, and either VGA or XGA over either PAL or NTSC.
We then calculate the screen-to-AVK coordinate-conversion deltas. These deltas
will be used to convert from the native-screen resolution to the AVK-view
resolution. For example, given a VGA-screen resolution of 64Ox480 and an AVK
View of 256x240, we convert an x coordinate with the formula: Xavk =
(int)((double) Xscreen* (256.0/640.0)). This is necessary because the video
pixels have a 5/4 aspect ratio. (There are five video pixels for every four
VGA pixels.)
Once the view parameters have been defined, the Create-View() function is used
to create and display an AVK view by calling AvkViewCreate() and
AvkViewDisplay(), respectively. RTV uses a YUV9 bitmap format (YUV color space
with 4-1-1 subsampling). The view is initially displayed as a black rectangle.
Finally, a call to SetDstBox() sets the destination-box coordinates for the
stream-to-view connector according to the coordinates of the main window's
client rectangle.
The LoadVshFile() function loads data used by RTV during the compression
process. (A discussion of the VSH data is beyond the scope of this article.)
Now that we have identified and configured the source for the data (the
digitizer) and the destination (the view), we need to create a capture group.
As shown in CreateCaptureGroup(), two group buffers are created--one for audio
and one for video. AVK requires that separate buffers be used when capturing
data, although this data is typically interleaved together when it is written
to disk. On playback, the group buffers can be configured to contain multiple
streams of data. This allows the interleaved files to be played as is, without
the application having to parse the data back into separate streams. In
addition to the group buffers, which use video RAM on the ActionMedia II
board, buffers in host RAM are also created for holding video and audio frames
while they are being written to disk.
CreateVideoStream() creates and formats a video stream for the video-capture
buffer. The RTV encoding parameters Rtv2OArgs and the x and y resolutions for
the specific capture sync are passed along with the VSH data read in by
LoadVshData(). Once the video stream has been formatted, the area of memory
used to store the VSH data can be discarded, since AvkVidStrmFormat() makes
its own copy.
The last step in building the recorder is to create the connectors from the
digitizer to the video stream and from the video stream to the view. These
connectors act like the channels in a mixing console, allowing us to control
the flow of data from one place to another. When the connector from the
digitizer to the video stream is enabled, capture data will begin to flow from
the board to the stream. When the connector between the video stream and the
view is enabled, the captured data will begin to flow from the video stream to
the view and will appear on the screen in the rectangle defined in
View.DstBox. Creating the audio stream is more straightforward: Simply create
the stream and format it with the frame rate, sample rate, and algorithm name
(in this case ADPCM4).
Closing the AVK session is a simple matter of calling AvkEnd() and freeing up
the memory used for the host I/O buffers. While there are calls in AVK to
explicitly deallocate AVK objects, AvkEnd() will implicitly destroy all
created objects.


Controlling the Recorder


There are two types of control objects in the AVK library--connectors and
groups. Groups control the flow of data through the compression/decompression
process, and connectors control the flow of visual data through the monitoring
system. These data flows are shown in Figure 2. AvkCapt defines four states
that determine its behavior: uninitialized, initialized, monitoring, and
capturing. AvkCapt uses the three functions ToState(), IsState(), and
GetState() shown in Listing Three , page 94, to alter and query the current
state. These states are used by the application to control the menu options
available to the user.
The ToggleMonitor() and ToggleCapture() functions (also shown in Listing
Three) illustrate how AvkCapt controls the connectors and groups by changing
state. In ToggleMonitor(), if the current state is initialized and monitoring
is off, we turn it on. If monitoring is on and we are not capturing, we turn
it off. The functions MonitorOn() and MonitorOff() are used to do the real
work. MonitorOn() uses the AvkConnEnable() function to enable the connectors
from the digitizer to the video stream and from the video stream to the view,
causing video to be displayed, and it turns the audio on by calling the
AvkDeviceAudioIn() function. Monitor-Off() turns off the flow of video by
hiding the connectors. AvkConnHide() paints the key color (black) into the
connector's destination and then disables the connector. Another call to
AvkDeviceAudioIn() turns off audio monitoring.
ToggleCapture() toggles the capture state on or off (assuming a file has been
opened to receive the captured data). If we are monitoring, we turn on capture
by starting the group. If we are already capturing, we turn it off by pausing
the group. For any other state, we simply return without doing anything.



Writing Compressed Data to Disk


The most difficult part of capturing the incoming digitized data is keeping up
with it. If AvkCapt does not read the frames from the VRAM buffers fast
enough, the frames will be lost, and a series of blank frames will have to be
inserted to take their place (in order to keep the frame rate constant). This
will cause skipping effects on playback. On the other hand, if too much time
is spent retrieving data, the message loop may not respond promptly and mouse
action may be degraded.
AvkCapt illustrates two different approaches to retrieving data in a timely
manner. The first involves calling a read routine each time AvkVCapt receives
a AVK_CAPTURE_DATA_AVAILABLE message from AVK informing it that a designated
amount of data (called the "hungry granularity") has been captured into a VRAM
group buffer. The application sets this level when creating the group buffer.
The read routine then retrieves as much data from the VRAM buffer as it can,
parses it into frames, and writes it out to the AVSS file. The ~ returns to
process the message loop and awaits the next AVK_CAPTURE_DATA_AVAILABLE
message.
The second method (enabled by selecting the Timer option from the File pop-up
menu) involves setting up a Windows timer and calling the same read routine on
each timer. (We use a timer tick of 500 milliseconds in AvkCapt.) This will
result in a maximum of two calls per second, so the capture function has to
write about 15 frames per call to keep up. CaptureAvioData() writes out more
than that if more data is present, so the timer messages may back up. Since
windows discards these if another set of timer ticks is already waiting in the
queue, this is not a problem.
AvkCapt's capturing performance can be tuned by varying the TIMER_INTERVAL,
the HOST_BUF_SIZE, or the value for CAPTURE_LOOPS (which dictates how many
iterations of the read write loop will be executed in CaptureAvioData() before
it is forced back to the main message loop). These values are defined in
avkcapt.h; see Listing One, page 90.
Listing Four, page 94, shows the CaptureAvioData() and ReadGrpBuf() functions.
These functions retrieve frames from the group buffers in VRAM and write them
out to an AVSS file on disk. AvkCapt uses the AVKIO file I/O subsystem to
create an AVSS file. Video frames and one frame's worth of audio samples are
retrieved separately from their respective buffers. The AVKIO function
AvioFileFrmWrite() interleaves the video and audio into the AVSS file. Each
iteration of the main loop starts by checking to see whether the application's
video or audio host RAM buffer is empty, and, if so, it reads one buffer's
worth of frames from the VRAM group buffers. ReadGrpBuf() is used to read
newly captured frames from an AVK group buffer into one of the application's
host RAM buffers. The count of bytes read is put in the CAPT structure's
BufDataCnt element. If any data is read, the caller's flag, pbDataRead is set
to True. Next we loop through the video and audio host RAM buffers writing out
matched video and audio frames to the file. When we run out of either video or
audio frames, we loop back to the top to retrieve more frames. This loop
continues until all frames currently captured in VRAM have been retrieved, or
until the loop has executed CAPTURE_LOOPS times. We use this countdown value
to prevent the loop from executing for too long without giving the message
loop time to run. If frames are being captured as fast as we are reading them,
we might otherwise never exit this loop.
It is rather unlikely, but possible, that we will have a reentrancy problem
here. Since the function creates a message box in the case of an error, it can
allow the message loop to process new messages before we exit it. This might
result in a new timer tick or an AVK message causing reentry before we have
finished displaying the error message and killing the process. To prevent this
contingency, we use an ownership semaphore. If the semaphore is set when we
enter, something has gone wrong in the currently executing code. So, instead
of just blocking on the semaphore, the new occurrence exists. The semaphore is
set to indicate that the code is executing and is cleared as the last
operation before a successful exit. Note that we do not clear the semaphore
before exiting on an error condition, since we will be terminating the
application on any error here and do not want to begin executing this code
again between this exit and the applications termination.


Conclusion


AVK's function calls are identical between the Windows and OS/2 versions of
the library. So the bulk of the code I have discussed will remain the same for
an OS/2 implementation of AvkCapt. The main difference will be in the code
that writes the data to disk. With OS/2, this can be accomplished using a
couple of threads and sharing a common-host memory buffer, eliminating the
need for the timer-tick mechanism.
It is also possible to create other configurations using the AVK objects. For
example, applications can build recorders that capture only video or only
audio data, players that play audio and video data from different sources, or
players that combine motion-video and still-image data to a common view.
This kind of flexibility has its costs in terms of code complexity, however.
Both the i750 video processor and AVK were architected to be platform
independent. AVK can support a variety of higher-level APIs which encapsulate
OS-specific file-I/O functions.
As one example, Intel and IBM are working on a higher-level library that
implements the digital-video media-control interface (DV MCI) command set for
multimedia extensions to Windows and OS/2. QuickTime will be implemented on
top of AVK by New Video (Venice, California) for its DVI Macintosh products.
(See "The QuickTime/AVK Connection" on page 28.) Both DV MCI products and
QuickTime provide "preconfigured" player/recorder objects for developers who
do not need or want to roll their own.


Acknowledgments


The author would like to express his appreciation to John Novack, who
developed the AvkCapt program described in this article.


Data Compression and the AVK


The Audio Video Kernel (AVK) is a multilayered architecture that isolates
hardware-specific features from the application programmer while enabling
porting of audio and video data to other platforms. The AVK is itself
sandwiched between an environment-specific API (in this case, the Windows API)
and the Action Media II hardware. The ActionMedia II board includes the i750
video processor, an optional capture module, and typically two megabytes of
local video memory (VRAM). The i750 consists of two processors: the 82750PB
pixel processor and the 82750DB display processor.
Closest to the hardware is the microcode engine, a collection of routines
loaded into instruction memory aboard the pixel processor. These routines
manage real-time requirements such as task scheduling, data compression and
decompression, and image scaling and copying. The big win here is that by
loading these routines into instruction memory, there is no hardwriting, for
example, of compression and decompression algorithms. The microcode routines
can be modified to add or change functionality without updating the hardware.
The next layer in the AVK, the audio/video driver (AVD), provides a C
interface to the ActionMedia II hardware, thus providing access to each
component of the board. Included are functions to access VRAM, load microcode
functions into instruction memory aboard the pixel processor, set display
formats for the display processor, and access the audio and capture
subsystems. Intel has also created a conceptual model of a "digital production
studio" which contains individual subsystems that correspond to real-world
systems such as tape decks, effects processors, mixing boards, and so on. The
audio/video library (AVL) adds a set of multimedia functions that are
independent of the host environment to implement these concepts.


Compressing Data


The AVK currently supports two forms of compression for video images. The
first is real-time video (RTV), which is implemented in microcode and
processed on the pixel processor. RTV takes multiple passes over the video
data using several techniques, including frame differencing and Huffman coding
to reduce the bit rate. Therefore, RTV compression is lossy. RTV 2.0 improves
over the original algorithm with better image quality and adjustable data
rates of up to 300 Kbytes/second, or twice that of CD-ROM rates. Image
quality, which is directly affected by the amount of data lost, can be
specified by the application as good, better or best. Good is typically used
at lower data rates such as CD-ROM. Better quality is recommended when
playback will occur from the hard drive and best is used when compressing to a
RAM disk.
Production-level video (PLV) uses essentially the same compression as RTV, but
takes advantage of offline compression services to gain the highest quality.
Thus, PLV data including audio can be decompressed and displayed at rates
similar to best quality RTV, which is closer to 150 Kbytes/second.
AVK uses JPEG for capture and lossy compression of still images, using 4:1:1
YUV color sampling. Though the compression technique is different, still
images are treated as a special case of motion video that contains just a
single frame. From the AVK perspective, programmers can open, play, and close
a still image using the same calls as motion video. Developers can also adjust
frequency for still images.
Audio data is compressed using a 4-bit adaptive-compression algorithm
(ADPCM4). This is a straightforward technique that predicts the next audio
sample based on the previous sample. As with video quality, audio quality can
be specified as good, better, or best. But since this algorithm was originally
intended for voice samples, it doesn't achieve the high quality one might
expect. The encoding of audio data is an area that MPEG greatly improves upon
over ADPCM4. Intel promises to support the MPEG standard when it is finalized,
so look for big gains here.


Data Streams


A stream is composed of a set of audio or video frames. A video stream
consists of a starting reference frame whose entire image is encoded.
Subsequent frames, called "dependent" frames, are encoded as changes to
previously decompressed images. (Only pixels that change between frames are
stored.) Occasionally, when the image significantly changes or image quality
begins to deteriorate, a new reference frame is inserted. A benefit of
reference frames is that they can be decompressed independent of other frames.
Note, however, that you currently cannot seek to a dependent frame--only to a
reference frame. Note also that audio frames are independent of video. As
previously mentioned, they are compressed using a 4-bit ADPCM algorithm. Once
decompressed, audio and video streams can be interleaved on a frame-by-frame
basis.
Finally, it's interesting to note that Intel, at the request of the
Interactive Multimedia Association (IMA), is making available details of RTV's
compressed video bit-stream format. Documentation is available to developers
on a licensing basis, thus opening the door for software-only decompression of
AVSS files. Fluent Machines (Framingham, Massachusetts) is expected to be the
first to offer a software-only solution. For more information, contact the
IMA, 3 Church Circle, Suite 800, Annapolis, MD 21401; 410-626-1380.
--Michael Floyd



_CAPTURING DIGITAL VIDEO USING DVI_
by James L. Green


[LISTING ONE]


//--- AvkCapt.h Copyright Intel Corp. 1991, 1992, All Rights Reserved ---

#include "avkapi.h"

// File name for the RTV 2.0 VSH data file (from avkalg.h)
#define VSHFILE_NAME AVK_RTV_20_ENCODE_DATA_NAME

// A couple of shorthand AVK #defines for convenience
#define OK AVK_ERR_OK
#define NOW AVK_TIME_IMMEDIATE
#define HNULL ((HAVK)0)

// Values for capturing
#define AUD_SAMPLE_RATE (U32)33075
#define FRAME_RATE (U32)33367

// Size of Capture Data Buffers
#define VID_BUF_SIZE (256L * 1024L)
#define VID_BUF_GRAN ( 64L * 1024L)
#define AUD_BUF_SIZE (128L * 1024L)
#define AUD_BUF_GRAN ( 16L * 1024L)
#define HOST_BUF_SIZE 32768U

// Maximum number of iterations of the capture loop
// before we are forced back to the main message loop
#define CAPTURE_LOOPS 10

// ID value for the capture Windows timer
#define TIMER_ID 1

// Number of milliseconds between timer ticks
#define TIMER_INTERVAL 500

// States for the capture engine
#define ST_UNINITIALIZED 0
#define ST_INITIALIZED 1
#define ST_MONITORING 2
#define ST_CAPTURING 3

// Control structure for the current view
typedef struct tagVIEW
{
 HAVK hView; // AVK View handle
 HAVK hConnDigi2Strm; // Digitizer to Video Stream connector
 HAVK hConnStrm2View; // Video Stream to View connector
 BOOL bConnEnabled; // TRUE if the connector is enabled
 WORD DviMonitorSync; // DviMonitorSync value from AVK.INI
 I16 cxView; // View's x resolution
 I16 cyView; // View's y resolution
 double xDelta; // used to convert screen
 double yDelta; // coords to view
 I16 cxScreen; // physical screen's x resolution
 I16 cyScreen; // physical screen's y resolution
 U16 VidType; // View's video type
 U16 BmFmt; // View's bitmap format
 BOOL bIsKeyed; // TRUE if the View is keyed
 BOX SrcBox; // connector's source rectangle
 BOX DstBox; // connector's destination rectangle

} VIEW;

// Control structure for capture buffers
typedef struct tagCAPT
{
 HAVK hGrpBuf; // group buffer handle
 HAVK hStrm; // stream handle
 char far *pBufHead; // host RAM I/O buffer
 char far *pBufCurr; // current position in host I/O buffer
 U32 BufDataCnt; // amount of data in host I/O buffer
} CAPT;

// Structure for storing sync resolutions. The sync table
// will be an array of VIDEO_SYNC structures called Syncs[].
typedef struct tagVIDEO_SYNC
{
 WORD xResRTV; // RTV capture x resolution
 WORD xResVid; // Video stream premonitor x resolution
 WORD yResVid; // Video stream premonitor y resolution
 WORD FrameRate;
 WORD PixelAspect;
} VIDEO_SYNC;

// These sync values are subscripts into a table of VIDEO_SYNC structures
#define SYNC_NTSC 0
#define SYNC_PAL 1






[LISTING TWO]

// ---- Windows AVK Capture Program - Create Recorder ----------------
// ---- Copyright Intel Corp. 1991, 1992, All Rights Reserved ---------

extern HWND hwndMain;
extern VIDEO_SYNC Syncs[];

// Local variables
static WORD State = ST_UNINITIALIZED; // current state of capture engine
WORD CaptureSync = SYNC_NTSC; // default to NTSC
char far *pVshBuf = NULL; // buffer for reading VSH data.
U32 VshSize; // size of the VSH data
VIEW View; // view control structure
AVIO_SUM_HDR Avio; // master control struct for AVSS file I/O
CAPT Vid; // video capture control structure
CAPT Aud; // audio capture control structure
I16 AvkRet; // general AVK return code variable

// RTV 2.0 encoding arguments
AVK_RTV_20_ENCODE_ARGS Rtv20Args =
{
 12, // argument count
 AVK_RTV_2_0, // algorithm size
 0,0, // x,y coords of origin
 128, 240, // xLength, yLength
 3, // still period

 0, 0, // bytes,lines
 AVK_RTV_20_PREFILTER AVK_RTV_20_ASPECT_25, // flags
 0, 0 // quantization values
};

// AVK handles
HAVK hAvk = (HAVK)0;
HAVK hDev = (HAVK)0;
HAVK hGrp = (HAVK)0;

// Create AVK session and initialize the device
BOOL InitAvk()
{
 if (!IsState(ST_UNINITIALIZED))
 return TRUE;

 // Start an AVK session with messaging
 if ((AvkRet = AvkBeginMsg(hwndMain, &hAvk,
 AVK_SESSION_DEFAULT)) != OK)
 return DispAvkErr(AvkRet, "AvkBeginMsg");

 // Open the ActionMedia(R) device
 if ((AvkRet = AvkDeviceOpen(hAvk, 0,
 AVK_DEV_OPEN_EXCLUSIVE, &hDev)) != OK)
 return DispAvkErr(AvkRet, "AvkDeviceOpen");

 // Get the capture sync by calling AvkDeviceVideoIn()
 if ((AvkRet = AvkDeviceVideoIn(hDev, AVK_CONN_DIGITIZER)) != OK)
 return DispAvkErr(AvkRet, "AvkDeviceVideoIn");

 return TRUE;
}
// Check device capabilities and build the recorder
BOOL CreateAvkResources(WORD NewCaptureSync)
{
 switch (NewCaptureSync)
 {
 case AVK_SYNC_NTSC: CaptureSync = SYNC_NTSC; break;
 case AVK_SYNC_PAL: CaptureSync = SYNC_PAL; break;
 }

 // Get the AVK device capabilities from AVK.INI
 if (!GetDevCaps(&View))
 return FALSE;

 if (!CreateView(&View))
 return FALSE;

 // The Vsh file contains data used in compressing
 // the incoming motion video into an RTV 2.0 file.
 if (!LoadVshFile())
 return FALSE;

 if (!CreateCaptureGroup())
 return FALSE;

 ToState(ST_INITIALIZED);

 return TRUE;

}
// Get the device capabilities from AVK
BOOL GetDevCaps(VIEW *pView)
{
 DVICAPS DevCaps;

 // Get the physical screen resolution from the system
 pView->cxScreen = GetSystemMetrics(SM_CXSCREEN);
 pView->cyScreen = GetSystemMetrics(SM_CYSCREEN);

 // Get the AVK device capabilities which were set in AVK.INI
 if ((AvkRet = AvkGetDevCaps(0, sizeof(DevCaps), &DevCaps)) != OK)
 return DispAvkErr(AvkRet, "AvkGetDevCaps");

 if (DevCaps.DigitizerRevLevel == 0)
 return DispErr("GetDevCaps",
 "Digitizer needed for capturing - check AVK.INI");

 if (DevCaps.DviMonitorSync & 0x10) // VGA
 {
 pView->cxView = 256;
 pView->cyView = 240;
 pView->VidType = AVK_VID_VGA_KEYED;
 pView->bIsKeyed = TRUE;
 }
 else if (DevCaps.DviMonitorSync & 0x100) // XGA
 {
 pView->cxView = 256;
 pView->cyView = 192;
 pView->VidType = AVK_VID_XGA_KEYED;
 pView->bIsKeyed = TRUE;
 }
 else if (DevCaps.DviMonitorSync & 0x02) // PAL
 {
 pView->cxView = 306;
 pView->cyView = 288;
 pView->VidType = AVK_VID_PAL;
 }
 else if (DevCaps.DviMonitorSync & 0x01) // NTSC
 {
 pView->cxView = 256;
 pView->cyView = 240;
 pView->VidType = AVK_VID_NTSC;
 }
 else
 return DispErr("GetDevCaps", "Invalid monitor sync");

 // Calculate Screen-To-AVK coordinate conversion deltas.
 pView->xDelta = (double)pView->cxView / (double)pView->cxScreen;
 pView->yDelta = (double)pView->cyView / (double)pView->cyScreen;

 return TRUE;
}
// Create and display an AVK View
static BOOL CreateView(VIEW *pView)
{
 if ((AvkRet = AvkViewCreate(hDev, pView->cxView, pView->cyView,
 AVK_YUV9, pView->VidType, &pView->hView)) != OK)
 return DispAvkErr(AvkRet, "AvkViewCreate");


 // Display the View
 if ((AvkRet = AvkViewDisplay(hDev, pView->hView, NOW,
 AVK_VIEW_DISPLAY_DEFAULT)) != OK)
 return DispAvkErr(AvkRet, "AvkViewDisplay");

 // Set the destination box for the stream-to-view connector
 if (!SetDstBox(hwndMain))
 return FALSE;

 return TRUE;
}
// Set the destination box for the stream-to-view connector
BOOL SetDstBox(HWND hwndMain)
{
 RECT WinRect;
 BOX NewDstBox;

 GetClientRect(hwndMain, (LPRECT)&WinRect);
 ClientToScreen(hwndMain, (LPPOINT)&WinRect);
 WinRect.right = WinRect.left + (View.cxScreen >> 1) - 1;
 WinRect.bottom = WinRect.top + (View.cyScreen >> 1) -1;
 WinRect2AvkBox(&WinRect, &NewDstBox, &View);

 if (View.hConnStrm2View)
 {
 if ((AvkRet = AvkConnHide(View.hConnStrm2View, NOW)) != OK)
 return DispAvkErr(AvkRet, "AvkConnHide");

 if ((AvkRet = AvkViewCleanRect(View.hView,
 &View.DstBox)) != OK)
 return DispAvkErr(AvkRet, "AvkViewCleanRect");

 // Reset the destination of the connector to our new box
 if ((AvkRet = AvkConnModSrcDst(View.hConnStrm2View, NULL,
 &NewDstBox, NOW)) != OK)
 return DispAvkErr(AvkRet, "AvkConnModSrcDst");

 if ((AvkRet = AvkConnEnable(View.hConnStrm2View, NOW)) != OK)
 return DispAvkErr(AvkRet, "AvkConnEnable");

 }

 // Copy new destination coords into the view's destination box
 COPYBOX(&View.DstBox, &NewDstBox);

 return TRUE;
}
// Get the standard VSH file that comes with AVK
static BOOL LoadVshFile()
{
 int fhVsh;
 OFSTRUCT Of;

 // Open the VSH file
 if ((fhVsh = OpenFile(VSHFILE_NAME, &Of, OF_READ)) == -1)
 return DispErr("LoadVshFile",
 "Unable to find the file KE080200.VSH");


 VshSize = filelength(fhVsh);

 // Range check - Reject if VshSize == 0 or VshSize > 65535L
 if (!VshSize VshSize & 0xffff0000)
 return DispErr("LoadVshFile", "VSH file too large to load");

 // Allocate a buffer to stash the VSH file.
 if ((pVshBuf = MemAlloc((WORD)VshSize)) == NULL)
 return DispErr("LoadVshFile",
 "Unable to allocate VSH file buffer");

 // Read the VSH data from the file
 if (_lread(fhVsh, pVshBuf, (WORD)VshSize) != (WORD)VshSize)
 return DispErr("LoadVshFile", "Unable to read VSH file");

 return TRUE;
}
// Create Capture Group and resources needed for premonitoring
static BOOL CreateCaptureGroup()
{
 if ((AvkRet = AvkGrpCreate(hDev, &hGrp)) != OK)
 return DispAvkErr(AvkRet, "AvkGrpCreate");

 if ((AvkRet = AvkGrpBufCreate(hGrp, AVK_BUF_CAPTURE, VID_BUF_SIZE,
 VID_BUF_GRAN, 1, &Vid.hGrpBuf)) != OK)
 return DispAvkErr(AvkRet, "AvkGrpBufCreate");

 if ((AvkRet = AvkGrpBufCreate(hGrp, AVK_BUF_CAPTURE, AUD_BUF_SIZE,
 AUD_BUF_GRAN, 1, &Aud.hGrpBuf)) != OK)
 return DispAvkErr(AvkRet, "AvkGrpBufCreate");

 // Create host RAM I/O buffers for retrieving
 // video and audio frames and initialize them.
 if ((Vid.pBufHead = MemAlloc(HOST_BUF_SIZE)) == NULL
 (Aud.pBufHead = MemAlloc(HOST_BUF_SIZE)) == NULL)
 return DispErr("CreateCaptureGroup",
 "Unable to allocate host RAM I/O buffer");
 Vid.BufDataCnt = (U32)0;
 Aud.BufDataCnt = (U32)0;

 if (!CreateVideoStream())
 return FALSE;

 if (!CreateAudioStream())
 return FALSE;

 if ((AvkRet = AvkGrpFlush(hGrp)) != OK)
 return DispAvkErr(AvkRet, "AvkGrpFlush");

 return TRUE;
}
// Create and format a video stream for the video capture buffer
static BOOL CreateVideoStream()
{
 if ((AvkRet = AvkVidStrmCreate(Vid.hGrpBuf, 0, &Vid.hStrm)) != OK)
 return DispAvkErr(AvkRet, "AvkVidStrmCreate");

 // Format the video stream
 Rtv20Args.xLen = Syncs[CaptureSync].xResRTV;

 Rtv20Args.yLen = Syncs[CaptureSync].yResVid;
 if ((AvkRet = AvkVidStrmFormat(Vid.hStrm,
 6,
 Syncs[CaptureSync].xResVid,
 Syncs[CaptureSync].yResVid,
 AVK_YUV9,
 Syncs[CaptureSync].FrameRate,
 AVK_RTV_2_0,
 &Rtv20Args, sizeof(Rtv20Args), sizeof(Rtv20Args),
 pVshBuf, VshSize, 64L * 1024L)) != OK)
 return DispAvkErr(AvkRet, "AvkVidStrmFormat");

 // Free the VSH buffer
 MemFree(pVshBuf);

 // Create a connector from the digitizer to the video stream
 if ((AvkRet = AvkConnCreate(AVK_CONN_DIGITIZER, NULL, Vid.hStrm,
 NULL, 0, &View.hConnDigi2Strm)) != OK)
 return DispAvkErr(AvkRet,
 "AvkConnCreate(Digitizer to Stream)");

 // Create the connector from the video stream to the view
 if ((AvkRet = AvkConnCreate(Vid.hStrm, NULL, View.hView,
 &View.DstBox, AVK_PRE_MONITOR, &View.hConnStrm2View)) != OK)
 return DispAvkErr(AvkRet, "AvkConnCreate (Stream to View)");

 return TRUE;
}
// Create and format a audio stream for the audio capture buffer
static BOOL CreateAudioStream()
{
 if ((AvkRet = AvkAudStrmCreate(Aud.hGrpBuf, 0, &Aud.hStrm)) != OK)
 return DispAvkErr(AvkRet, "AvkAudStrmCreate");

 // Format the audio stream
 if ((AvkRet = AvkAudStrmFormat(Aud.hStrm, FRAME_RATE,
 AUD_SAMPLE_RATE, AVK_ADPCM4, AVK_AUD_MIX, NULL, 0, 0)) != OK)
 return DispAvkErr(AvkRet, "AvkAudStrmFormat");

 return TRUE;
}
// Close the AVK session
BOOL EndAvk()
{
 BOOL Ret = TRUE;

 if (hAvk != HNULL)
 {
 if ((AvkRet = AvkEnd(hAvk)) != OK)
 {
 DispAvkErr(AvkRet, "AvkEnd");
 Ret = FALSE;
 }
 }
 if (Vid.pBufHead)
 {
 MemFree(Vid.pBufHead);
 Vid.pBufHead = NULL;
 }

 if (Aud.pBufHead)
 {
 MemFree(Aud.pBufHead);
 Aud.pBufHead = NULL;
 }

 // Null out all of the AVK handles
 hAvk = hDev = HNULL;
 hGrp = HNULL;
 Vid.hGrpBuf = Vid.hStrm = HNULL;
 Aud.hGrpBuf = Aud.hStrm = HNULL;
 View.hView = HNULL;
 View.hConnDigi2Strm = View.hConnStrm2View = HNULL;

 ToState(ST_UNINITIALIZED);

 return Ret;
}






[LISTING THREE]

//---- Windows AVK Capture Program - Recorder Control ------------
//---- Copyright Intel Corp. 1991, 1992, All Rights Reserved -----

// Sets a new state and enables/disables the applicable menu options
WORD ToState(WORD NewState)
{
 WORD OldState;

 if (NewState == ST_CAPTURING
 NewState == ST_MONITORING
 NewState == ST_INITIALIZED
 NewState == ST_UNINITIALIZED)
 {
 if (State != NewState)
 {
 OldState = State;
 State = NewState;
 UpdateMenus(State);
 return OldState;
 }
 else
 return NewState;
 }
 return 0xffff;
}
// Checks whether the current state equals the caller's query state
BOOL IsState(WORD QueryState)
{
 return State == QueryState;
}
// Returns the current state to the caller
WORD GetState()
{

 return State;
}
// Toggle monitoring on and off based on user input
BOOL ToggleMonitor(VOID)
{
 BOOL bRet;

 switch (GetState())
 {
 case ST_INITIALIZED: bRet = MonitorOn(); break;
 case ST_MONITORING: bRet = MonitorOff(); break;
 default: bRet = TRUE; break;
 }
 return bRet;
}
// Turn on premonitoring
static BOOL MonitorOn()
{
 if ((AvkRet = AvkConnEnable(View.hConnDigi2Strm, NOW)) != OK
 (AvkRet = AvkConnEnable(View.hConnStrm2View, NOW)) != OK)
 return DispAvkErr(AvkRet, "AvkConnEnable");

 if ((AvkRet = AvkDeviceAudioIn(hDev, AVK_AUD_CAPT_LINE_INPUT,
 AVK_MONITOR_ON)) != AVK_ERR_OK)
 return DispAvkErr(AvkRet, "AvkDeviceAudioIn");

 ToState(ST_MONITORING);

 SetClipTimer();

 return TRUE;
}
// Turn off premonitoring
static BOOL MonitorOff()
{
 KillClipTimer();

 if ((AvkRet = AvkConnHide(View.hConnStrm2View, NOW)) != OK
 (AvkRet = AvkConnHide(View.hConnDigi2Strm, NOW)) != OK)
 return DispAvkErr(AvkRet, "AvkConnHide");

 if ((AvkRet = AvkDeviceAudioIn(hDev, AVK_AUD_CAPT_LINE_INPUT,
 AVK_MONITOR_OFF)) != AVK_ERR_OK)
 return DispAvkErr(AvkRet, "AvkDeviceAudioIn");

 ToState(ST_INITIALIZED);

 return TRUE;
}
// Toggles the capture on or off
BOOL ToggleCapture()
{
 // If no file has been opened, return
 if (!bAvioFileExists)
 {
 DispMsg("You must open a file before you can capture");
 return TRUE;
 }


 switch(GetState())
 {
 case ST_MONITORING:
 // If we are monitoring, turn on
 // capture by starting the group
 if ((AvkRet = AvkGrpStart(hGrp, NOW)) != OK)
 return DispAvkErr(AvkRet, "AvkGrpStart");
 ToState(ST_CAPTURING);
 break;

 case ST_CAPTURING:
 // If we are already capturing, turn
 // it off by pausing the group
 if ((AvkRet = AvkGrpPause(hGrp, NOW)) != OK)
 return DispAvkErr(AvkRet, "AvkGrpPause");
 break;

 default:
 // Any other state, just do nothing - no error
 break;
 }
 return TRUE;
}






[LISTING FOUR]

// ---- Windows AVK Capture Program - Write Captured Data to Disk ------
// ---- Copyright Intel Corp. 1991, 1992, All Rights Reserved ----------

extern CAPT Aud;
extern CAPT Vid;
extern I16 AvkRet;
extern HAVK hGrp;
extern WORD CaptureSync;
AVIO_SUM_HDR Avio;
BOOL bAvioFileExists = FALSE;
I16 AvioRet;
static BOOL ReadGrpBuf(CAPT *, BOOL *);
I16 DispAvioErr(char *pMsg);

VIDEO_SYNC Syncs[2] =
{
 { 128, 128, 240, AVK_NTSC_FULL_RATE, AVK_PA_NTSC },
 { 128, 153, 288, AVK_PAL_FULL_RATE, AVK_PA_PAL }
};

// Initialize the AVIO summary header and use it to create an AVSS file.
BOOL OpenAvioFile(char *pFileSpec)
{
 AVIO_VID_SUM FAR *pVid;
 AVIO_AUD_SUM FAR *pAud;
 VIDEO_SYNC *pSync;

 if (!*pFileSpec)

 return DispErr("OpenAvioFile", "No file spec");

 // Clear out the Avio structure.

 _fmemset((char FAR *)&Avio, 0, sizeof(Avio));

 // Initialize the structure.
 Avio.SumHdrSize = sizeof(AVIO_SUM_HDR);
 Avio.VidSumSize = sizeof(AVIO_VID_SUM);
 Avio.AudSumSize = sizeof(AVIO_AUD_SUM);

 Avio.StrmCnt = 2;
 Avio.VidCnt = 1;
 Avio.AudCnt = 1;

 if ((AvioRet = AvioFileAlloc((AVIO_SUM_HDR FAR *)&Avio)) < 0)
 return DispAvioErr("AvioFileAlloc");

 // Fill out the video stream substructure.

 pSync = &Syncs[CaptureSync]; // sync data (NTSC or PAL)

 pVid = Avio.VidStrms;

 pVid->StrmNum = 0; // video stream number
 pVid->Type = AVL_T_CIM; // compressed data
 pVid->SubType = AVL_ST_YVU; // packed data
 pVid->StillPeriod = AVL_CIM_RANDOM_STILL; // freq of still frames
 pVid->xRes = pSync->xResVid << 1; // x resolution
 pVid->yRes = pSync->yResVid; // y resolution
 pVid->BitmapFormat = AVK_BM_9; // bitmap format
 pVid->FrameRate = pSync->FrameRate; // frame rate
 pVid->PixelAspect = pSync->PixelAspect; // NTSC aspect ratio
 pVid->AlgCnt = 1; // only one algorithm
 pVid->AlgName[0] = AVK_RTV_2_0; // RTV 2.0 compression alg

 // Fill out the audio stream substructure.

 pAud = Avio.AudStrms;

 pAud->StrmNum = 1; // audio stream number
 pAud->LeftVol = 100; // left channel volume = 100%
 pAud->RightVol = 100; // right channel volume = 100%
 pAud->FrameRate = pSync->FrameRate; // frame rate
 pAud->SamplesPerSecond = AUD_SAMPLE_RATE; // audio samples-per-second
 pAud->AudChannel = AVK_AUD_MIX; // both speakers
 pAud->AlgCnt = 1; // number of algorithms
 pAud->AlgName[0] = AVK_ADPCM4; // audio ADPCM4 algorithm

 // Now create the file with all standard AVSS headers.

 if ((AvioRet = AvioFileCreate((char far *)pFileSpec,
 (AVIO_SUM_HDR FAR *)&Avio, OF_CREATE)) < 0)
 return DispAvioErr("AvioFileCreate");

 bAvioFileExists = TRUE;

 return TRUE;
}

// This function retrieves frames from the Group Buffers
// in VRAM and writes them out to an AVSS file on disk.
BOOL CaptureAvioData()
{
 static BOOL bInUse = FALSE;
 AVIO_FRM_HDR FAR *pFrmHdr[2]; // frame header pointers
 // for video & audio
 BOOL bDataRead;
 int Ret;
 U32 VidFrmSize, AudFrmSize;
 WORD Count;

 if (bInUse)
 return TRUE;

 bInUse = TRUE;

 // Error if no buffers have been allocated.
 if (!Vid.pBufHead !Aud.pBufHead)
 return DispErr("CaptureAvioData",
 "NULL host RAM buffer pointer");

 Count = CAPTURE_LOOPS;

 do {
 // Init the data-read flag
 bDataRead = FALSE;

 if (!Vid.BufDataCnt)
 {
 if (!ReadGrpBuf(&Vid, &bDataRead))
 return FALSE;
 }
 if (!Aud.BufDataCnt)
 {
 if (!ReadGrpBuf(&Aud, &bDataRead))
 return FALSE;
 }

 while (Vid.BufDataCnt && Aud.BufDataCnt)
 {
 pFrmHdr[0] = (AVIO_FRM_HDR FAR *)Vid.pBufCurr;
 pFrmHdr[1] = (AVIO_FRM_HDR FAR *)Aud.pBufCurr;

 if ((Ret = AvioFileFrmWrite((AVIO_SUM_HDR FAR *)&Avio,
 pFrmHdr)) < 0)
 return DispAvioErr("AvioFileFrmWrite");

 VidFrmSize = (U32)sizeof(AVIO_FRM_HDR)
 + pFrmHdr[0]->StrmSize[0];
 Vid.pBufCurr += (WORD)VidFrmSize;
 Vid.BufDataCnt -= VidFrmSize;

 AudFrmSize = (U32)sizeof(AVIO_FRM_HDR)
 + pFrmHdr[1]->StrmSize[0];
 Aud.pBufCurr += (WORD)AudFrmSize;
 Aud.BufDataCnt -= AudFrmSize;
 }
 } while (bDataRead && Count--);


 bInUse = FALSE;
 return TRUE;
}
// Read newly captured frames from an AVK Group Buffer
// into one of the application's host RAM buffers
static BOOL ReadGrpBuf(CAPT *pCapt, BOOL *pbDataRead)
{
 // Only refill the buffer if it is empty
 if (!pCapt->BufDataCnt)
 {
 // Retrieve a buffer of frames.

 if ((AvkRet = AvkGrpBufRead(pCapt->hGrpBuf, HOST_BUF_SIZE,
 pCapt->pBufHead, &pCapt->BufDataCnt, AVK_ENABLE)) != OK)
 return DispAvkErr(AvkRet, "AvkGrpBufRead");

 // Set data-read flag if we read any data.

 *pbDataRead = pCapt->BufDataCnt == (U32)0 ? FALSE : TRUE;

 // Point back to start of buffer.

 pCapt->pBufCurr = pCapt->pBufHead;
 }
 return TRUE;
}
// Update and lose an AVSS file using AVKIO.
BOOL CloseAvioFile()
{
 if (bAvioFileExists == TRUE)
 {
 // Update the file's header with current information that
 // AVKIO keeps in the Avio summary header.

 if ((AvioRet = AvioFileUpdate((AVIO_SUM_HDR FAR *)&Avio, 0)) < 0)
 return DispAvioErr("AvioFileUpdate");

 // Close the file.

 if ((AvioRet = AvioFileClose((AVIO_SUM_HDR FAR *)&Avio)) < 0)
 return DispAvioErr("AvioFileClose");

 bAvioFileExists = FALSE;
 }
 return TRUE;
}















July, 1992
THE QUICKTIME/AVK CONNECTION


Building a beautiful relationship




William Fulco


William is the founder and the chief scientist of New Video Corp., designers
of DVI hardware and software for the Macintosh. He can be contacted at 220
Main Street, Suite C, Venice, CA 90291.


Quicktime for the Macintosh and the Audio Video Kernel (AVK) for the PC both
provide time-oriented, data-handling operating-system extensions.
Consequently, there's considerable overlap in their functionality. And because
both are modular and multilayered, there are ways to craft a marriage between
them (even though there are some integration problems in the initial releases
of both).
QuickTime, Apple's "media-integration architecture" extension to the Macintosh
operating system, provides a uniform API so that Mac programs can operate on
time-varying data like audio, video, and animation data. QuickTime is composed
of over 500 calls and traps in three extensions: the Component Manager, which
allows runtime binding of code modules into applications programs (similar to
Windows DLLs); the Image Compression Manager, which handles image-data
compression/decompression (similar to AVK's MVD); and the Movie Toolbox, the
primary API which contains all the calls needed to record and play dynamic
media (similar to the Windows MCI).
The primary QuickTime data type is the "Movie" (equivalent to the AVK's stream
group), which is composed of zero or more "tracks" (substreams), each of which
are associated (via pointers) with "media" (the AVSS file) that reference some
digital data source of a single type (that is, a file that contains raw
time-varying data). Each media is associated with just one track, and each
track with one and only one media.
In QuickTime 1.0, each media is either an audio, video, animation, or still
image, and has associated with it a built-in media handler that understands
how to operate on this media when asked. A media also contains all the
necessary housekeeping information to support these operations--media type,
duration, time scale, quality, file references, and so on.
While there is a one-to-one mapping of track (pointers) and media, several
media may be stored in or share a single raw-data file. It is these indirect
references from tracks to media to raw data that allow the integration of AVSS
files into the QuickTime environment as native "MooV" (pronounced movie)
files.


General Structure of QuickTime/AVK Marriages


The first way to integrate QuickTime and AVK is to port AVK to the Macintosh
as a separate set of QuickTime components, loosely supplanting the Image
Compression Manager, and to integrate the standard Macintosh APIs by
(re)writing some of the most commonly used interface components (for instance,
StandardMovieController or SequenceGrabThing) to use these new AVK
image-compression components. The advantage of this approach is that not only
are AVSS files fully and transparently supported, but all AVK facilities are
supportable within the QuickTime framework, including the ability to schedule
many simultaneous audio, video, and graphics streams and to multitask
arbitrary microcode functions.
The disadvantages are that many of the higher-level QuickTime components need
to be rewritten and maintained--and this is not a trivial task. You must also
preserve full compatibility between these new components and their standard
QuickTime siblings. There are also the problems associated with using
non-standard APIs for things that aren't rewritten to support the AVK
components.
Another, and by far the best, way to handle QuickTime/AVK integration is by
writing an AVSS file media-handler component for the Movie Toolbox calls to
use. This media handler would digest native AVSS files and decide what calls
to what components (that is, CODECs) get made. In this form of integration,
AVD driver routines can be integrated within the CODEC Component structure, as
well as providing possible extensions to the current CODEC architecture. While
this disposes of much of AVK's high-level functionality, it has the advantage
of being totally transparent to all current and future QuickTime applications
and handles the sticky QuickTime 1.0 problem with non-Macintosh sound streams.
The disadvantage of this approach is that QuickTime 1.0 doesn't support
interchangeable, component-sized media handlers. The only media-handlers it
supports are the built-in MooV file audio and video modules.
This notwithstanding, the next best approach is the way we did it at my
company (New Video Corp., Venice, California)--port the AVD drivers and
interfaces for New Video's EyeQ board and gain access to this functionality
via the QuickTime CODEC API. This enables QuickTime to perform the stream
synchronization and uses the lower levels of AVK (the MVD/DoMotion microcode
engine), to perform the dirty work of decompressing, displaying, and playing
audio from AVSS files. Again, the catch is that the only way to get data out
of a native AVSS file is to figure out how to make the QuickTime built-in
media handlers believe that the AVSS file is a MooV file.


AVSS Goes to the MooVs


Unlike AVK, QuickTime allows movie files to contain references (called
"aliases") to media in other files. This facilitates very high-speed editing
and manipulation of movies, because there's no copying of large amounts of
data. It also allows media to reside on read-only devices like CD-ROM and
still permit editing of movie tracks.
The key to our AVK/QuickTime integration is the use of these aliased MooV
resources to convince QuickTime that the "raw" audio and video data of an AVSS
file are, in fact, part of a standard QuickTime MooV file. The generation of
these aliases is accomplished with the EyeQ Convert program. This program is
similar to the standard QuickTime ConvertToMovie program used to convert
various other Macintosh file types to QuickTime movies. EyeQ Convert reads an
AVSS file frame directory and does an AddMediaSampleReference() call from the
Movie Toolbox on each frame, using the raw AVSS file pointer, this frame's
offset, and a pointer to the MooV resource file being created. The result of
this program is a small--typically 10K--MooV resource that just points to a
vanilla AVSS file, possibly on a read-only CD-ROM.
The most important ramification of supporting native AVSS files in the
QuickTime environment is especially apparent in mixed computer installations.
With this architecture, a PC or PS/2 application will see an undisturbed AVSS
file and be able to use it in the standard way, while a Mac application
looking at the MooV file associated with this AVSS file will see a vanilla
QuickTime movie. When this file is played with QuickTime the "raw" video and
audio data from the AVSS file is passed to the appropriate CODEC, which
invokes AVD, which decodes the data and displays it.
The biggest problem with this last approach is that QuickTime 1.0 only knows
about Macintosh audio streams, so the EyeQ Convert process must either: hide
the AVSS audio frames in the video data so the CODEC can sort out the audio
and video on-the-fly (playing the audio on the EyeQ board's DSP while
displaying the video on its Intel i750); or generate a Macintosh 8-bit PCM
audio stream from the 16-bit ADPCM4e AVSS audio stream and store this stream
on a new track associated with the MooV file.
The latter approach has the advantage of providing complete Macintosh control
of audio for this MooV, but the disadvantage of storing lower audio quality
(8-bit mono at a 22-KHz sample rate vs. 16-bit stereo at a 32-KHz sample
rate).


Do You Need an EyeQ Board to Use QuickTime/AVK to Play DVI AVSS Files?


While AVK was designed with a hardware-assisted (i750) virtual-production
studio model in mind, QuickTime was designed as a general-purpose multimedia
extension to the Mac/OS, and did not presuppose the availability of hardware.
Standard QuickTime movie files are typically compressed with either the AVC
(Apple video compressor) CODEC component or the RLE (run length encoding)
animation CODEC. In AVK, the compression algorithms (RTV or PLV), are
implemented in i750 microcode and invoked by the DoMotion microcode scheduler.
With QuickTime/AVK, these microcode functions are invoked from a generic Mac
DVI CODEC so that they fit into the QuickTime environment. In addition to this
MacDVI CODEC, New Video has implemented a software-only RTV decompressor CODEC
that will decompress standard RTV AVSS files at lower resolution (128x12O) and
at lower frame rate (10-15 FPS) than the EyeQ-assisted Mac DVI CODEC.















July, 1992
AUDIO COMPRESSION


Digitized sound requires its own compression algorithms


 This article contains the following executables: ACOMP.ARC


John W. Ratcliff


John is president of THE Audio Solution and director of product development
for Milliken Publishing. He is the author of 688 Attack Sub and can be
contacted at 747 Napa Lane, St. Charles, MO 63303.


As we create more powerful applications software, making audio data part of
the user interface becomes more and more desirable. Voice e-mail, spoken
context-sensitive online help, training systems, educational software, and
games are all prime applications of this technology.
However, the amount of disk storage space digitized sound requires is a
barrier to the widespread adoption of sound. Standard compression techniques
using tools such as ARC or ZIP, for instance, will achieve only about a 10
percent compression rate on audio data. Clearly, compressing digitized sound
requires algorithms crafted to match the nature of audio data itself.
Consequently, this article presents a compression algorithm called "ACOMP"
that yields better than 6:1 compression on human voice, and between 1.5:1 and
3:1 on music, while maintaining almost full fidelity.
Digitized sound is produced by a process called "analog-to-digital" (A/D)
conversion, in which sound waves are converted into a digital number that
represents the volume (amplitude) of the sound. On a computer, a single
digital sample is usually recorded in one byte as an 8-bit number between 0
and 255, where 0 is the quietest and 255 the loudest; a good sound sample
should just fit within this range. Many repeated samples must be taken to
record an appreciable amount of sound. Acceptable human voice, for example,
requires between 7000 and 9000 samples per second (7-9 KHz). Note that this is
the same method by which sound is stored on a compact disc. However, a compact
disc stores 44,000 samples per second, each 16 bits in resolution. At this
resolution, one minute of sound takes four Mbytes of storage! Even with the
much lower resolution of 9-KHz, 8-bit sound, one minute of speech takes over
half a megabyte. ACOMP lets you store that same minute of sound in as little
as 100K.


How Audio Compression Works


The human ear doesn't recognize sound by volume. You can take an entire sound
sample, round it off to the closest multiple of 16 (reduce it to 4-bit data),
and having achieved 2:1 compression, still hear it. What the human ear does
hear is the frequency response within that data. A compression algorithm that
closely approximates the data points' original values, but which cannot track
frequencies within that sound, will distort the frequency response, rendering
it virtually unrecognizable to the human ear.
Almost all audio-compression algorithms incorporate two basic techniques:
silence encoding and delta modulation. Silence encoding is extremely important
for human speech. If you examine a waveform of human speech, you will see
long, relatively flat pauses between the spoken words. Silence encoding
represents these pauses in a single byte of data as a pause duration rather
than storing each individual data point over and over. When performing silence
encoding, the user usually specifies an acceptable threshold value, typically
+/-1 or 2. As long as the data samples fluctuate by no more than this amount,
the whole section is considered a pause. This helps achieve extremely high
data-compression rates for human voice. The trade-off is that you lose some of
the highest-frequency sounds. Some of the "s"s, for example, begin to lose
clarity.
The other method of compressing audio data uses a technique called "delta
modulation." While ordinary compression algorithms seek an efficient method of
recording the input data stream itself, delta modulation tries to find an
efficient method of recording the changes in the data stream. By modeling the
changes from one data sample to the next, we accurately maintain the frequency
response of the input data, while balancing exact accuracy of amplitude data.
(By modeling the changes in the data, we are taking the first derivative of
the waveform, which contains velocity information.) Any small errors
introduced into the reconstructed data are heard as slight static. The less
accurately we model the data stream, the more static we hear. However, the
frequency response of the input data stream is never lost, because whenever
the original sound sample goes up, we go up, and when it goes down, we go
down.


How Delta Modulation Works


Delta modulation is a simple concept. In an 8-bit input data stream which
yields numbers between 0 and 255, you would need numbers between -255 and +255
to store the values that represent each data-point change. This would require
nine bits--one more than the original sound sample! However, you don't really
need nine bits. Instead, you can use four bits to represent numbers from -8 to
+8 (skipping 0). If one sample were 0 and the next 230, you might store a +7
with a multiplier of 32. This would produce a value of 224. This isn't the 230
value it should be, but the exact value is less important than the frequency
response, which has been maintained. So you have stored an 8-bit value in just
four bits, but this gives you only a 2:1 compression rate. It doesn't model
the waveform well because each data point can be off by as much as +/-31 from
its actual value. The result is a lot of static.
When working with delta modulation, you can control two variables: bit
resolution and the multiplier.
Bit resolution is the number of bits used to represent the delta modulation.
The fewer you use, the better compression rate you achieve. With 1-bit delta
modulation, you achieve 8:1 compression but may do a poor job of modeling the
data.
The multiplier is the constant multiplier that controls the size of the delta
mod. Ideally, the constant multiplier should roughly match the mean value the
input data stream is changing from one sample to the next, divided by the
total bit resolution.
After writing a number of audio-compression routines that use different bit
resolutions and multipliers, I concluded that the best audio-compression
routine should use the best bit size and the best multiplier value that best
matches the input data stream at any given time. (In short, rather than
picking one set of rules to apply on the source data, the algorithm should
adapt its modeling technique to match the input data stream itself). When no
bit size or multiplier will model the data as accurately as you wish, then you
store that data out in raw format, causing a resynchronization to occur, and
assuring that the sound quality will not be compromised. When using ACOMP, the
user specifies the mean acceptable error. ACOMP will then try every multiplier
and bit size possible at every point in the input data stream, ensuring that
the mean error never exceed that value. The result yields high compression
rates with almost no static. If your application can live with a little more
static, you can pump the mean error value up to 6 or 10 and get vastly higher
compression rates. (The mean error value indicates that on the average, no
reconstructed data sample is off by more than +/- that amount from the
original data sample.)
To accomplish this task, ACOMP needs to represent the following:
Silence Encoding, the duration of a pause in the original sample.
Resynchronization, a raw data sample to avoid introducing error.
Bit Size, the number of bits used to represent the delta mod (1, 2, or 4).
Multiplier, the size of the delta mod from one to the next, valid 1-16.
Additionally, the ACOMP header needs to indicate the frequency, of the sound
sample and the frame size used for delta modulation. Figure 1 details the
ACOMP data format, while Figure 2 describes ACOMP in pseudocode. The program
AC.ASM (Listing One, page 96) is the source to the compression routine itself.
The program UC.ASM (Listing Two, page 99) contains all the code necessary to
decompress an ACOMP-compressed audio sample. This decompression algorithm
involves a lot of bit-twiddling, something almost all assembly-languages excel
at. (Available electronically are assembly language macro headers; a
C-prototype header file for UC.ASM; a linkable, object-module version of
UC.ASM; a C-prototype header for DOSCALLS.H; C-callable procedures into DOS
INT 21 functions; a linkable, object-module version of DOSCALLS.ASM;
executable versions of ACOMP and UCOMP; documentation of the ACOMP file format
and extended ACOMP file definitions; demonstration batch files; demonstration
sound files; a C version of the decompression program, and an IBM
internal-speaker sound driver.)
Figure 1: ACOMP file format.


 BYTE 0-1: Length of audio sample, decompressed size, in 8086 low/high
 format
 BYTE 2-3: Recording frequency of audio sample, low/high format
 BYTE 4: Frame size, 8-248
 BYTE 5: Squelch size used at compression time
 BYTE 6-7: Maximum error used at compression time
 BYTE 8: Initial audio sample
 BYTE 9: First header byte

 Bit 7: RESYNC. If bit 7 is on, then the bottom 7 bits of
 this byte represent a resynchronization value,
 rounded to the closest multiple of 2. Take the

 bottom 7 bits, left shift them once, and this 8-bit
 value represents the resynchronized data-stream value.
 Bit 6: SQUELCH. If bit 5 is on and bit 7 is off, then this
 is a silence-squelching command byte. The bottom 6
 bits represent a repeat count of the current data
 value.
 Bit 4-5: Delta-mod bit size. If bit 7 and bit 6 are off,
 then these 2 bits represent the delta-mod size used
 to delta modulate the frame of data. 01 represents
 1-bit modulation, 10 represents 2-bit delta
 modulation, and 11 represents 4-bit delta
 modulation.
 Bit 0-3: Represents delta-mod multiplier value. Equal to the
 value of this nibble plus 1. Valid multipliers are
 1-16.

 If you receive a delta-modulation header byte, record the
 delta-modulation bit size and the multiplier value. Then grab
 bytes of delta-mod data until a full frame of data samples has
 been exhausted. The format for delta modulation values are as
 follows:

 1-bit delta mod: 0 --> -1*multiplier
 1 --> +1*multiplier
 2-bit delta mod: 00 --> -2*multiplier
 01 --> -1*multiplier
 10 --> +1*multiplier
 11 --> +2*multiplier
 4-bit delta mod: 0000 --> -8*multiplier
 0001 --> -7*multiplier
 0010 --> -6*multiplier
 0011 --> -5*multiplier
 0100 --> -4*multiplier
 0101 --> -3*multiplier
 0110 --> -2*multiplier
 0111 --> +1*multiplier
 1000 --> +1*multiplier
 1001 --> +2*multiplier
 1010 --> +3*multiplier
 1011 --> +4*multiplier
 1100 --> +5*multiplier
 1101 --> +6*multiplier
 1110 --> +7*multiplier
 1111 --> +8*multiplier


For each data sample, the delta-mod value is added to its predecessor. You
must make certain that the byte doesn't under- or overflow. Once the frame of
data samples has been exhausted, you go back up to the top, looking for the
next header byte. You do this until the entire audio file has been
decompressed.
Figure 2: Pseudocode for the ACOMP algorithm.

 While there are data samples left to process:

 If current data can be silence encoded, then do so.
 else
 (For 1 Frame)

 Try every possible multiplier value 1-16 using 1-bit
 modulation, keeping the best.

 If the best is within the error threshold, store frame

 as 1 bit.
 else
 Try every possible multiplier value 1-16 using 2-bit
 modulation, keeping the best.

 If the best is within the error threshold, store frame as
 2 bit.
 else
 Try every possible multiplier value 1-16 using 4-bit
 modulation, keeping the best.

 If the best is within the error threshold, store frame
 as 4 bit.
 else
 Store a resynchronization value.

Note that ACOMP is far from a real-time compression algorithm. Because it does
an exhaustive search for the best possible bit resolution and delta-modulation
value for every frame of data, it can take a substantial amount of time to
compress a sound file. However, you can decompress the resulting audio data in
near real-time. A faster compression algorithm could be developed by using
some frequency analysis on the input data stream rather than performing an
exhaustive search.
The data definition for ACOMP-compressed audio supports compressed audio in
sizes no greater than 64K in length. Because sound files can exceed this, an
extended ACOMP file format has been defined; see Figure 3. This format is
composed of a collection of individual ACOMP-compressed data frames. The
default buffer size is 64K, but you can select a different buffer size using
the command-line switch /b. If the input data extreme exceeds the current
size, then an extended ACOMP audio file (comprised of n frames of audio data)
is produced. One advantage to this technique is that the decompression code
need only allocate memory in the size of this buffer to decompress even a very
large audio file.
Figure 3: Extended ACOMP file format: Supports n frames of ACOMP-compressed
data.

 // ABX file format:
 // Bytes 0-1: int TotalFrames; Total number of ACOMP frames in
 file.
 // Bytes 2-5: long int TotalSize; Total size of original source file.
 // Bytes 6-7: unsigned int bufsize; Frame buffer size used to compress
 in.
 // Bytes 8-9: Unsigned int freq; Playback frequency of audio file.
 // .... ABH HEADERS[TotalFrames] Array of headers indicating all
 // audio frame data.
 typedef struct
 {
 long int fileaddress; // Address in file of this audio section.
 unsigned int fsize; // compressed file size.
 unsigned int usize; // uncompressed file size.
 } ABH;

A standardized file format for compressed digitized sound is a valuable
resource. ACOMP is efficient enough at compressing human speech that it
actually becomes practical to send voice data over a normal modem
communication channel. Applications for this kind of audio compression extend
to anywhere you would want to send voice data across a communications channel,
or incorporate voice data into a computer game, online help system, or
multimedia application.


_AUDIO COMPRESSION_
by John W. Ratcliff


[LISTING ONE]

;; AC.ASM -> ACOMP assembly language compressor. Written by John W. Ratcliff,
;; 1991. Uses Turbo Assembler IDEAL mode and makes HEAVE use of macros. This
;; algorithm performs an exhaustive search for the best delta mod for each
;; section of the waveform. It is very CPU intensive, and the algorithm can
;; be a little difficult to follow in assembly language.

 IDEAL ; Enter Turbo Assembler's IDEAL mode.
 JUMPS ; Allow automatic jump sizing.
 INCLUDE "prologue.mac" ; Include usefull assembly lanugage macro file.

SEGMENT _TEXT BYTE PUBLIC 'CODE'
 ASSUME CS:_TEXT
;; This macro computes the amount of error in a frame of data, at this
;; bit resolution and this delta mod location.

Macro DoError BITS
 LOCAL @@NO
 mov bx,[COMP] ; Current delta mod value.
 mov bh,BITS ; Current bit resolution.
 push [MINERR] ; Pass minimum error so far.
 push [PREV] ; Pass previous data point.
 call ComputeError ; Compute error for frame.
 add sp,4 ; Balance stack.
 cmp dx,[MINERR] ; Less than previous minimum?
 jge @@NO ; no, don't update.
 mov [MINERR],dx ; Save as new miniume
 mov [BESTPREV],ax ; Best previous location.
 xor ah,ah ;
 mov al,bl ; Get best delta modution.
 mov [BEST1],ax ; save it.
 mov al,bh ; Get best bits.
 mov [BEST2],ax ; Save it.
@@NO:
 endm

SQLCH equ 64 ; Squelch bit.
RESYNC equ 128 ; Resynchronization bit.

DELTAMOD equ 00110000b ; Bit mask for delta mod bits.

ONEBIT equ 00010000b ; Bit pattern for one bit delta mod.
TWOBIT equ 00100000b ; Bit pattern for two bit delta mod.
FOURBIT equ 00110000b ; Bit pattern for two bit delta mod.

;; This macro echos a message to the text screen, so that we can
;; monitor the progress of the compression algorithm.
Macro Note MSG
 push ax
 lea ax,[MSG]
 push ax
 call Notify
 add sp,2
 pop ax
 endm

 public _CompressAudio
;;This the the ACOMP compression procedure.
;;int far CompressAudio(unsigned char far *shand,
;; Address of audio data to compress.
;; unsigned char far *dhand,
;; Destination address of compressed data.
;; unsigned int slen, Length of audio data to compress.
;; int squelch, Squelch value allowed.
;; int freq, Playback frequency of audio data.
;; int frame, Frame size.
;; int maxerr); Maximum error allowed.
Proc _CompressAudio far
 ARG SHAN:DWORD,DHAN:DWORD,SLEN:WORD,SQUELCH:WORD,FREQ:WORD,
 FRAME:WORD,MAXERR:WORD
 LOCAL PREV:WORD,COMP:WORD,MINERR:WORD,BEST1:WORD,BEST2:
 WORD,BESTPREV:WORD = LocalSpace
 PENTER LocalSpace
 PushCREGS


 lds si,[SHAN] ; Get source address.
 les di,[DHAN] ; Get destination address.
 mov cx,[SLEN] ; Get length of audio data.

 mov ax,cx ; Get length of audio sample into AX
 stosw ; Store it.
 mov ax,[FREQ] ; Get frequency of recording.
 stosw ; Store it.
 mov ax,[FRAME] ; Get the frame size.
 stosb ; Store it.
 mov ax,[SQUELCH] ; Get squelch size
 stosb ; Store it.
 mov ax,[MAXERR] ; Get maximum error allowed.
 stosw ; Save it.
 xor ax,ax
 lodsb ; Get first data sample
 mov [PREV],ax
 stosb ; Store first data sample.
 dec cx ; Decrement sample size count.
 jz @@DONE

@@SQU: mov ah,0Bh ; Test keyboard status.
 int 21h
 or al,al
 jz @@NOK
 mov ah,08h ; If a key was pressed get that key
 int 21h ; value, and see if it was the
 cmp al,27 ; escape key.
 jne @@NOK
 xor ax,ax ; If escape, return to caller, with abort.
 jmp @@EXIT
@@NOK:
 xor ax,ax
 mov dx,[SQUELCH] ; Get squelch value.
 push cx ; Save remaining data count.
 push si ; Save si.
@@CK1: lodsb ; Get data byte.
 sub ax,[PREV] ; Difference from previous data sample?
 jns @@CK2 ; if positive leave it alone.
 neg ax ; Make it positive.
@@CK2: cmp ax,dx ; Is it within the squelch range?
 jg @@NOS ; yes, keep checking!
 loop @@CK1 ; Keep going.
 inc si ; Plus one, this last one counts.
@@NOS: pop ax ; Get back start SI
 mov dx,si ; DX contains current address.
 sub dx,ax ; Compute number of squelch bytes encountered.
 dec dx ; Less, last non squelch byte.
 cmp dx,3 ; At least three?
 jle @@NOSQ ; no, don't squelch it.
@@SQS: cmp dx,63 ; Is it under 63?
 jle @@SEND
 mov ax,(63 + SQLCH) ; Send count.
 sub dx,63 ; Less the 63 count we just sent.
 stosb ; Write squelch byte out.
 jmp short @@SQS ; Continue.
@@SEND: mov ax,dx ; Get remaining count.
 or ax,SQLCH ; Or squelch bit on.
 stosb ; Send squelch count out.

 dec si ; Back up to last data point.
 pop ax ; Pull CX off of stack, use current count.
 Note msg0
 jmp short @@NXT
@@NOSQ: mov si,ax ; Replace where source was.
 pop cx ; Get back remaining data count.

@@NXT: jcxz @@DONE ; Exit if done.
 cmp cx,[FRAME] ; Below current frame size?
 jae @@GO ; no, go ahead.
@@FIN: lodsb ; Get raw sample.
 shr al,1 ; Down to closest aproximated value.
 or al,RESYNC ; Add resync bit to it.
 stosb ; Store out.
 loop @@FIN ; Keep sending final bytes.
 jmp @@DONE ; exit, after sending final bytes.

@@GO: mov [MINERR],07FFFh
 push cx
 mov cx,[FRAME] ; Set CX to frame size.

 mov [COMP],1
@@ALL1: DoError 1 ; Try one bit mode, +/-1.
 inc [COMP]
 cmp [COMP],17 ; Try delta comp values clean up to 16!!
 jne @@ALL1

 mov ax,[MINERR]
 cmp ax,[MAXERR]
 jle @@BCMP ; Not good enough...
 mov [COMP],1
@@ALL2: DoError 2 ; Try two bit mode, +/-1.
 inc [COMP]
 cmp [COMP],17 ; Try delta comp values clean up to 16!!
 jne @@ALL2

 mov ax,[MINERR]
 cmp ax,[MAXERR]
 jle @@BCMP
 mov [COMP],1
@@ALL4: DoError 8 ; Try four bit mode, +/-1.
 inc [COMP]
 cmp [COMP],17 ; Try delta comp values clean up to 16!!
 jne @@ALL4

 mov ax,[MINERR] ; Get what the minimum error was.
 cmp ax,[MAXERR] ; Minimum error > maximum error?
 jle @@BCMP ; no, then send frame.
 pop cx ; Get back CX
 lodsb ; Get data sample.
 and al,(NOT 1) ; Strip off bottom bit.
 xor ah,ah
 mov [PREV],ax ; New previous.
 shr al,1 ; /2
 or al,RESYNC ; Or resync bit on.
 stosb ; Store it out into data stream.
 Note msg1
 loop @@SQU ; Go check squelching.
 jmp @@DONE ; Done, if this was last data sample.

@@BCMP: mov bx,[BEST1] ; Get best comp.
 mov ax,[BEST2] ; Get best bit size.
 mov bh,al ; Into BH
 mov ax,32000
 push ax
 push [PREV] ; Pass prev.
 call ComputeError ; Re-compute error term.
 add sp,4
 mov [PREV],ax ; New previous.
;; Now time to store results!
 mov bx,[BEST1] ; Get best comp.
 cmp [BEST2],1 ; 1 bit?
 jne @@NXT1
 call Fold1Bit ; Fold 1 bit data.
 Note msg2
 jmp short @@IN ; Reenter.
@@NXT1: cmp [BEST2],2 ; 2 bit data?
 jne @@NXT2
 call Fold2Bit
 Note msg3
 jmp short @@IN
@@NXT2:
 call Fold4Bit
 Note msg4
@@IN: mov ax,[FRAME]
 pop cx ; Get back CX
 add si,ax ; Advance source
 sub cx,ax ; Decrement data count.
 jnz @@SQU ; Continue, if not at end.

@@DONE:
 mov ax,di ; Size of compressed file.
 les di,[DHAN]
 sub ax,di ; Difference.

@@EXIT:
 PopCREGS
 PLEAVE
 ret
 endp

;; Compute error: Registers on entry are:
;; DS:SI -> source data.
;; CX -> number of bytes to compute error term in.
;; DX -> total error incurred.
;; BL -> delta comp size.
;; BH -> maximum bit size value, positive or negative.
;; Exit: CX,DS:SI stay the same.
;; DX -> total error term.
;; AX -> new previous.
Proc ComputeError near
 ARG PREV:WORD,MINERR:WORD
 LOCAL CUR:WORD = LocalSpace
 PENTER LocalSpace

 push cx
 push si
 push di ; Save destination address.
 xor dx,dx ; Initally no error.


@@CERR: lodsb ; Get a data byte.
 xor ah,ah ; Zero high byte.
 mov [CUR],ax ; Save as current sample.
 sub ax,[PREV]
 cmp bl,1
 je @@ND
 idiv bl ; Divided by delta mod size.
@@ND: or al,al
 js @@DON ; Do negative side.
 jnz @@CNT ; If not zero then continue.
 inc al ; Can't represent a zero, make it one.
@@CNT: cmp al,bh ; > max representative size?
 jle @@OK ; no, it fit as is.
 mov al,bh ; Make it the max representative size.
 jmp short @@OK ;
@@DON: neg al ; Make it positive.
 cmp al,bh ; > max representative size?
 jbe @@K2 ; no, use it.
 mov al,bh ; Make it the max representative size.
@@K2: neg al ; Make it negative again.
@@OK:
 stosb ; Store data value out.
 imul bl ; Times delta comp value.
 add ax,[PREV] ; Add to previous data point.
 js @@CS ; Do signed case.
 cmp ax,255 ; Did it over flow?
 jle @@K3 ; No, then it fit byte sized.
 mov ax,255 ; Make it byte sized.
 jmp short @@K3 ; Re-enter
@@CS: xor ax,ax ; Close as we can get, underflow.
@@K3: mov [PREV],ax ; This is our new aproximated value.
 sub ax,[CUR] ; Less actual value.
 jns @@K4 ; if positive then fine.
 neg ax ; Take absolute value.
@@K4: add dx,ax ; Add into total error.
 cmp dx,[MINERR] ; Greater than minimum error allowed?
 jg @@OUT
 loop @@CERR
@@OUT: mov ax,[PREV] ; Current previous data point.
 pop di ; Restore destination address.
 pop si ; Reset SI back to start.
 pop cx ; Reset CX back to start.
 PLEAVE
 ret
 endp
Macro BuildByte
 LOCAL @@HOP1,@@HOP2
 lodsb
 or al,al ; Is it signed?
 jns @@HOP1
 shl ah,1 ; Rotate.
 jmp short @@HOP2
@@HOP1: stc
 rcl ah,1
@@HOP2:
 endm
;; Fold 1 bit data.
;; ES:DI -> points to data ready to fold out.

;; CX-> frame size.
;; BL-> contains delta size.
Proc Fold1Bit near
 push ds
 push si
 push di ; Header byte address.
 push es
 pop ds ; DS=ES
 mov si,di ; Source and dest.
 inc di ; skip past header byte.
@@FOLD: xor ah,ah ; Dest byte to be built, zero it.
 BuildByte
 BuildByte
 BuildByte
 BuildByte
 BuildByte
 BuildByte
 BuildByte
 BuildByte
 mov al,ah
 stosb ; Store it out.
 sub cx,8 ; Less the 8 samples just folded up.
 jnz @@FOLD ; Continue.

 pop si ; Get back header byte address.
 mov al,bl ; Get delta comp size.
 dec al ; Less one.
 or al,ONEBIT ; Or the One Bit mode flag on.
 mov [ds:si],al ; Store header byte.

 pop si
 pop ds
 ret
 endp

;; 2 Bit Format: 00 -> -2
;; 01 -> -1
;; 10 -> +1
;; 11 -> +2
Macro BByte
 LOCAL @@HOP1,@@HOP2
 lodsb
 or al,al ; Is it signed?
 jns @@HOP1
 add al,2 ; Adjust it.
 jmp short @@HOP2
@@HOP1: inc al ; Plus 1 to fit into format size.
@@HOP2: shl ah,1
 shl ah,1
 or ah,al ; Place bits into byte being built.
 endm
;; Fold 2 bit data.
;; ES:DI -> points to data ready to fold out.
;; CX-> frame size.
;; BL-> contains delta size.
Proc Fold2Bit near
 push ds
 push si
@@F2:

 push di ; Header byte address.

 push es
 pop ds ; DS=ES
 mov si,di ; Source and dest.
 inc di ; skip past header byte.
@@FOLD: xor ah,ah ; Dest byte to be built, zero it.
 BByte
 BByte
 BByte
 BByte
 mov al,ah
 stosb ; Store it out.
 sub cx,4 ; Folded up 4 samples.
 jnz @@FOLD ; Continue.

 pop si ; Get back header byte address.
 mov al,bl ; Get delta comp size.
 dec al ; Less one.
 or al,TWOBIT ; Or the One Bit mode flag on.
 mov [ds:si],al ; Store header byte.

 pop si
 pop ds
 ret
 endp
;; Four bit format:
;; 0 -> -8
;; 1 -> -7
;; 2 -> -6
;; 3 -> -5
;; 4 -> -4
;; 5 -> -3
;; 6 -> -2
;; 7 -> -1
;; 8 -> +1
;; 9 -> +2
;;10 -> +3
;;11 -> +4
;;12 -> +5
;;13 -> +6
;;14 -> +7
;;15 -> +8
Macro Adjust4bit
 LOCAL @@HOP1,@@HOP2
 lodsb
 or al,al
 jns @@HOP1
 add al,8 ; Adjust it.
 jmp short @@HOP2
@@HOP1: add al,7 ; Adjust it.
@@HOP2:
 endm
;; Fold 4 bit data.
;; ES:DI -> points to data ready to fold out.
;; CX-> frame size.
;; BL-> contains delta size.
Proc Fold4Bit near
 push ds

 push si

 push di ; Header byte address.

 push es
 pop ds ; DS=ES
 mov si,di ; Source and dest the same.
 inc di ; skip past header byte.
@@FOLD: Adjust4bit ; Get first sample.
 ShiftL al,4 ; Into high nibble.
 mov ah,al ; Into AH
 Adjust4bit ; Get next nibble.
 or al,ah ; One whole byte.
 stosb ; Store it out.
 sub cx,2 ; Folded up 4 samples.
 jnz @@FOLD ; Continue.

 pop si ; Get back header byte address.
 mov al,bl ; Get delta comp size.
 dec al ; Less one.
 or al,FOURBIT ; Or the One Bit mode flag on.
 mov [ds:si],al ; Store header byte.

 pop si
 pop ds
 ret
 endp
msg0 db "SQUELCH"
msg1 db "RESYNC "
msg2 db "1 BIT "
msg3 db "2 BIT "
msg4 db "4 BIT "

Proc Notify near
 ARG MSG:WORD
 PENTER 0
 PushAll

 push cs
 pop ds
 mov ax,0B800h
 mov es,ax
 mov si,[MSG]
 xor di,di
 mov ah,1Fh
 mov cx,7
@@SND: lodsb
 stosw
 loop @@SND

 PopAll
 PLEAVE
 ret
 endp

 ENDS
 END







[LISTING TWO]

;; UC.ASM -> Uncompress ACOMP compressed audio data.
;; Written by John W. Ratcliff, 1991.
;; Uses Turbo Assembler IDEAL mode.

 IDEAL ; Enter Turbo Assembler IDEAL mode.
 JUMPS ; Allow automatic jump sizing.

 INCLUDE "prologue.mac" ; Include common useful assembly macros.

SMALL_MODEL equ 0 ;: true only if trying to generate near calls

 SETUPSEGMENT ; Setup _TEXT segment.

Macro CPROC name
 public _&name
IF SMALL_MODEL
Proc _&name near
ELSE
Proc _&name far
ENDIF
 endm

SQLCH equ 64 ; Squelch byte flag
RESYNC equ 128 ; Resync byte flag.

DELTAMOD equ 00110000b ; Bit mask for delta mod bits.

ONEBIT equ 00010000b ; Bit pattern for one bit delta mod.
TWOBIT equ 00100000b ; Bit pattern for two bit delta mod.
FOURBIT equ 00110000b ; Bit pattern for two bit delta mod.


base dw ? ; Base address inside translate table.


TRANS db -8,-7,-6,-5,-4,-3,-2,-1,1,2,3,4,5,6,7,8
 db -16,-14,-12,-10,-8,-6,-4,-2,2,4,6,8,10,12,14,16
 db -24,-21,-18,-15,-12,-9,-6,-3,3,6,9,12,15,18,21,24
 db -32,-28,-24,-20,-16,-12,-8,-4,4,8,12,16,20,24,28,32
 db -40,-35,-30,-25,-20,-15,-10,-5,5,10,15,20,25,30,35,40
 db -48,-42,-36,-30,-24,-18,-12,-6,6,12,18,24,30,36,42,48
 db -56,-49,-42,-35,-28,-21,-14,-7,7,14,21,28,35,42,49,56
 db -64,-56,-48,-40,-32,-24,-16,-8,8,16,24,32,40,48,56,64
 db -72,-63,-54,-45,-36,-27,-18,-9,9,18,27,36,45,54,63,72
 db -80,-70,-60,-50,-40,-30,-20,-10,10,20,30,40,50,60,70,80
 db -88,-77,-66,-55,-44,-33,-22,-11,11,22,33,44,55,66,77,88
 db -96,-84,-72,-60,-48,-36,-24,-12,12,24,36,48,60,72,84,96
 db -104,-91,-78,-65,-52,-39,-26,-13,13,26,39,52,65,78,91,104
 db -112,-98,-84,-70,-56,-42,-28,-14,14,28,42,56,70,84,98,112
 db -120,-105,-90,-75,-60,-45,-30,-15,15,30,45,60,75,90,105,120
 db -128,-112,-96,-80,-64,-48,-32,-16,16,32,48,64,80,96,112,127

CPROC GetFreq ; Report playback frequency for an ACOMP file.

 ARG SOURCE:DWORD
 PENTER 0
 push es
 les bx,[SOURCE]
 mov ax,[es:bx+2]
 pop es
 PLEAVE
 ret
 endp

;; DX contains PREVIOUS.
;; AH contains bit mask being rotated out.
;; BX up/down 1 bit value.
Macro Delta1
 LOCAL @@UP,@@STORE
 shl ah,1 ; Rotate bit mask out.
 jc @@UP
 sub dx,bx
 jns @@STORE
 xor dx,dx ; Zero it out.
 jmp short @@STORE
@@UP: add dx,bx
 or dh,dh
 jz @@STORE
 mov dx,255
@@STORE:mov al,dl ; Store result.
 stosb
 endm

;; BX-> base address of translate table.
;; DX-> previous.
;; AL-> index.
Macro DeModulate
 LOCAL @@HIGH,@@OK
 xlat [cs:bx] ; Translate into lookup table.
 cbw ; Make it a signed word.
 add dx,ax ; Do word sized add, into previous.
 jns @@HIGH
 xor dx,dx ; Underflowed.
@@HIGH: or dh,dh ; Did it overflow?
 jz @@OK
 mov dx,255 ; Maxed out.
@@OK: mov al,dl
 stosb
 endm


;;unsigned int far UnCompressAudio(unsigned char far *source,unsigned char far
*dest);
;; UnCompressAudio will decompress data which was compressed using ACOMP
;; into the destination address provided. UnCompressAudio returns the
;; total size, in bytes, of the uncompressed audio data.
CPROC UnCompressAudio
 ARG SHAN:DWORD,DHAN:DWORD
 LOCAL SLEN:WORD,FREQ:WORD,FRAME:WORD,BITS:WORD = LocalSpace
 PENTER LocalSpace
 PushCREGS

 lds si,[SHAN] ; Get source segment
 les di,[DHAN] ; Get destination segment


 lodsw ; Get length.
 mov [SLEN],ax ; Save length.
 mov cx,ax ; Into CX
 lodsw ; Frequency.
 mov [FREQ],ax ; Save frequency
 lodsb ; Get frame size.
 xor ah,ah ; Zero high byte
 mov [FRAME],ax ; Save it.
 lodsb ; Get squelch, and skip it.
 lodsw ; Get maximum error, and skip it.
 lodsb ; Get initial previous data point.
 stosb ; Store it.
 xor ah,ah ; zero high byte.
 mov dx,ax ; Save into previous word.
 dec cx ; Decrement total by one.
 jz @@DONE ; Exit
 mov ah,al ; AH, always the previous.
@@DCMP: lodsb ; Get sample.
 test al,RESYNC ; Resync byte?
 jz @@NOTR ; no, skip.
 shl al,1 ; Times two.
 mov dl,al ; Into previous.
 xor dh,dh ; Zero high word.
 stosb ; Store it.
 loop @@DCMP ; Next one.
 jmp @@DONE

@@NOTR: test al,SQLCH ; Squelch byte?
 jz @@FRAM ; no, then it is a frame.
 and al,00111111b ; Leave just the count.
 push cx ; Save current countdown counter.
 mov cl,al ; get repeat count
 xor ch,ch ; zero high byte of CX
 mov bx,cx ; Repeat count in DX
 mov al,dl ; Repeat of previous.
 rep stosb ; Repeat it.
 pop cx ; Get back remaining count.
 sub cx,bx ; Less.
 jnz @@DCMP ; Keep going.
 jmp @@DONE

@@FRAM:
 mov bx,ax ; command byte into BX
 and bx,0Fh ; Multiplier being used.
 ShiftL bx,4 ; Times 16.
 add bx,offset TRANS ; Plus address of translate table.
 and al,DELTAMOD ; Leave just delta mod.
 push cx
 mov cx,[FRAME] ; Get frame size.
 cmp al,ONEBIT ; In one bit delta mod?
 jne @@NEXT1 ; no, try other.
 ShiftR cx,3 ; /8
 mov bl,[cs:bx+8] ; Get up amount
 xor bh,bh ; Zero high byte.
@@GO: lodsb
 xchg al,ah ; Place prev in AL, Bit mask in AH
 Delta1
 Delta1

 Delta1
 Delta1
 Delta1
 Delta1
 Delta1
 Delta1
 mov ah,al
 loop @@GO
 jmp @@RENTER

@@NEXT1:cmp al,TWOBIT ; In two bit delta mod mode?
 jne @@NEXT2
 add bx,6 ; Point at +- 2 bit's in table.
 shr cx,1
 shr cx,1 ; 4 samples per byte.
@@GOGO: lodsb
 ShiftR al,6
 DeModulate
 mov al,[ds:si-1]
 ShiftR al,4
 and al,3
 DeModulate
 mov al,[ds:si-1]
 ShiftR al,2
 and al,3
 DeModulate
 mov al,[ds:si-1]
 and al,3
 DeModulate
 loop @@GOGO
 jmp short @@RENTER
@@NEXT2:shr cx,1 ; Two samples per byte.
@@GO2: lodsb ; Get sample.
 ShiftR al,4
 DeModulate
 mov al,[ds:si-1]
 and al,0Fh
 DeModulate
 loop @@GO2

@@RENTER:
 pop cx
 sub cx,[FRAME]
 jnz @@DCMP ; Continue decompress

@@DONE:
 mov ax,[SLEN] ; Uncompressed length.

 PopCREGS
 PLEAVE
 ret
 endp


 ENDS
 END


































































July, 1992
PERSONAL SUPERCOMPUTING SEAMLESS PORTABILITY


A hardware-independent "virtual computer" is the key




Ian Hirschsohn


Ian holds a BSc in Mechanical Engineering and an MS in Aerospace Engineering.
He is the principal author of DISSPLA and cofounder of ISSCO. He can be
reached at Integral Research, 249 S. Highway 101, Suite 270, Solana Beach, CA
92075.


There's a misperception that if you write in C for UNIX, your code will be
completely portable. But no matter how vanilla flavored you try to code in
Fortran or C, there's always something peculiar to each system that requires
custom coding. It may be graphics, I/O, memory limitations, or some other
dependency. Even if the source code is meticulously written to be 99 percent
portable, the remaining 1 percent causes the most grief.
Through the process of porting the massive DISSPLA graphics package between
different platforms, I became painfully aware of the costs and effort of
transferring code. Consequently, this article addresses the concept of
seamless portability, or the ability to transfer programs between different
computers without relinking or recompiling the code. As in last month's
article, I'll use the PORT system as evidence that this can be accomplished.
(Recall that PORT is a software environment somewhat analogous to Desqview
with the Phar Lap DOS-Extender.) While last month I looked at high-performance
RISC systems and described PORT executing on a 386/486 PC with plug-in i860
RISC card(s), this month I'll describe PORT on a 386SX and examine its
potential for other environments.


The Portability Equation


The effort of porting seems to increase exponentially with the number of
platforms you are targeting--just two platforms means two copies of the
source, two copies of the corrections, two copies of the corrections to the
corrections, and so on. Even with rigorous bookkeeping, however, one of these
fixes usually fails to be transferred to the other platform, or old versions
of routines become linked with updated versions of others. The resulting bugs
can take days to find. There are also bugs (like those that depend on
transient memory contents) on one platform that can't be reproduced on others
or those from one developer's code that show up in another programmer's work
in team-developed, multi-megabyte programs.
From my experience, it's the last 5 percent of the bugs that take 95 percent
of the entire conversion time. Murphy's Law is absolute in software porting
and makes a mockery of even the most conservative timetable. Pundits not
accustomed to life in the trenches may claim that rigorous diagnostics
eliminate these bugs. While comprehensive test data and validation programs
are indispensable, it is almost impossible to check every case in a program of
any size. Finally, the most insidious bugs tend to occur at customer
sites--with pathologic data on jobs due yesterday.
These are the tribulations I found with Fortran. C has the potential for even
more interesting bugs: corrupted pointers, mismatched argument types, and
uninitialized heap variables can while away a week or two. To add spice, the
effects are often completely different from one platform to another--sometimes
from one execution to another.
Other all-too-common traps include the following:
The original programmer leaves the company, along with documentation and test
programs.
Programmers tend to use language extensions peculiar to the compiler on which
they develop.
Programmers on a specific platform gravitate to hardware dependencies (address
formats and I/O protocols) that make porting a nightmare.
Large programs make use of third-party subroutine libraries.


Practical Solutions


At the bit level, binary operations carried out on one processor can be
emulated on just about any other. At the other end, applications are almost
totally aloof from the nuances of computer architecture. Computer languages
such as Pascal and Fortran (and, to some degree, C) are designed to be machine
independent. Unfortunately, no program exists in a vacuum, and unless the
system utilities are also identical, interaction with the program will be
different on two platforms. UNIX is the closest candidate to a portable
system, but no two implementations of UNIX (that I know of) are identical,
even to the application software.
Assuming the hardware differences can be resolved, it remains to design a
complete, portable system. But to be commercially viable, the system must
first have acceptable performance, which means being competitive with
native-code compilers and their I/O throughput. Secondly, it must be
nonintrusive. (Compatibility with existing systems is a market reality.)
After years of designing device-independent graphics, we found that all
graphics can be reduced to moves, draws, and fills. Distilling all axes, maps,
curves, fonts, and complex features down to this simple set of primitives
enabled us to support hundreds of diverse graphics devices. Each device had
its own specific "device driver" to translate the primitives into
device-specific commands. This strategy showed no limitation to either
high-level features or use of the devices. We therefore asked ourselves
whether application software could be reduced to "adds, multiplies, and
divides." In other words, could the higher-level software be reduced to a set
of efficient computation primitives that is machine independent, with a
processor-dependent "device driver" for each platform? The answer was "yes,"
as PORT illustrates.
More Details.


The Virtual Computer


As I pointed out last month, Cray's CDC 6600 architecture was the archetype
for almost all supercomputers. Serendipitously for portability, it isolates
the divergent needs of computation, I/O, and the host system. To capitalize on
Cray's model, PORT views its host as a virtual computer via an architecture
defined by PORT, not any specific hardware. Each target processor has a
machine-specific interface program analogous to the graphics "device drivers"
mentioned above. The virtual computer is divided into two fundamental
processors, the computation processor (CP) and the peripheral processor (PP);
see Figure 1. Like Cray's CDC 6600, the CP does no I/O and the PP does no
significant computation. The CP and the PP communicate with each other through
a memory-mapped mailbox.
The physical implementation of the CP and PP are transparent to PORT. In my
previous article, the CP was implemented using the i860 RISC microprocessor
and the PP via a 386/486 PC. In many PORT installations, the CP is implemented
in 32-bit protected mode on the 386/486, and the PP uses 16-bit real mode on
the same processor. The operation of PORT in the multiprocessor and
single-processor environments is identical--the only difference is
performance.
PORT with all options is almost one million lines of extended Fortran
developed by a team of programmers over a ten-year period. It is a
full-featured system with a vast array of utilities, debuggers, libraries, and
services far beyond just a compiler plus environment. By comparison, the CP is
about 6500 lines of assembly for the 386/486 (5000 for the i860), and the core
of the 386/486 PP adds around another 12,000 lines of code. PORT is an open
architecture, and the development of CP/PP versions for other platforms or
even the PC is encouraged. Assembly language is not mandatory; a
quick-and-dirty CP/PP can readily be coded in C (10,000 lines ballpark). The
beauty of the CP/PPPORT separation is that individual modules can later be
optimized into assembly, one by one.
The PORT compiler, editor, linker, file management, virtual-memory system,
libraries, graphics, and so on are all oblivious to the actual CP and PP
implementations. Programs on one platform can be immediately executed on
another without changing a single line of code, recompiling, or even relinking
because the whole PORT system is aloof from the hardware. To transfer PORT to
another platform, it is necessary only to write a CP and PP for it. For
example, an i860 plug-in VME card to the Sun SPARC just needs a PP for Sun's
UNIX. Likewise, a MIPS 4000 plug-in card to a 386 PC only needs a CP version.


The Metacode Approach


The PORT Fortran/C compiler reduces the source code to a machine-independent
"metacode". Although there currently is only one compiler for PORT, nothing
prevents the writing of other compilers (even for other languages). Last month
I pointed out that the metacode is tuned to the needs of Fortran/C, but its
machine-level requirements are generic, and the metacode is extensible. Any
compiler that outputs the PORT metacode can coexist in PORT.
UNIX and PORT differ in one key respect. UNIX compilers output the native
instruction set for each platform. In addition, each UNIX implementation is
internally customized to the architecture of that platform. PORT produces a
machine-independent instruction set and hardware-independent I/O protocols.
The platform is transparent to the whole of PORT, not just to the application
source code. Details of the PORT metacode will be described more fully in a
subsequent article. Here, I'll describe the salient features of the metacode
as they pertain to portability.
Each meta-instruction of the metacode is a 64-bit word specifying A = B op C.
For example, A = B+C, A = B*C, if(B>C) go to A, and call A (Blist, Count). The
indirect addressing modes are specific to higher-level languages rather than
conveniences of the hardware designers. For instance, A(I) = B(J,K)**N(L+50)
is a single meta-instruction with A(I), B(J,K) and N(L+M) intrinsic indirect
address modes. PORT local addresses are relative to the start of the current
subroutine instruction block or data block, not a base segment or other
hardware artifice.
Other aspects of the PORT metacode key to portability are that:
The metacode does not have "registers." Register organization is highly
machine dependent.

PORT addressing is by 64-bit words, and operands are 64 bit. PORT follows wide
memory-bus mainframe and supercomputer conventions rather than old
microcomputer byte conventions. Bytes are treated as fields, with strings
assigned as multiples of 64-bit words and usually manipulated eight bytes at a
pop.
PORT is mainframe 64-bit big endian, not PC little endian. The PC 8Ox86
numbers bytes right to left within a word, so the test IF (IVAL='ABCD') THEN
usually fails, because if IVAL is transferred from a byte array, it has the
contents DCBA. On most mainframes, the Mac 680x0, and SCSI, bytes are numbered
from left to right, which is more convenient for higher-level languages.
PORT implements software virtual memory that is independent of any hardware
assist. Classical virtual memory as implemented on the IBM 370, VAX, and other
machines is heavily dependent on a hardware-translate look-aside buffer and
other assists. PORT achieves v/m without any of that.
Space does not permit a full discussion of the PORT metacode relating to
portability, but the above items hopefully provide a feel for the way PORT
answers to the needs of higher-level languages rather than contorting the
applications software to fit the whims of the hardware designers. In some
implementations, this forces the CP to twist through gyrations internally.
(For example, a 64-bit integer has to be emulated using a double 32-bit
integer on the 386/486.) Since there is only one instance of each instruction
in the CP in the program, the overhead occurs in only one place.


Keeping I/O Simple


DOS and UNIX I/O internals are positively Byzantine. Not only does the DOS
file-allocation table (FAT) result in two potential disk references for every
actual reference (one for the FAT section), but corrupting a link in its chain
can cause loss of disk data. You can't really fault DOS or UNIX too much--they
were developed when loaded machines were a PC/XT with two floppy drives or a
PDP 11 with a 20-Mbyte hard disk. Unfortunately, these systems still regard
even gigabyte hard drives as oversized floppies.
I/O is the pacing factor in data-intensive applications. Give the device
interface maximum flexibility, and it will reward you with an
order-of-magnitude performance improvement. PORT I/O is oriented toward large
hard disks and multi-megabyte files. The PP has just one disk-I/O service:
Read or write a 32-Kbyte page. The file-management section of PORT divides the
pages into directories and records. All the PP has to do is move a 32-Kbyte
block. This simplicity extends to screen output, keyboard input,
serial/parallel ports, tape I/O, and others. There is only one PP service to
write a line of text to the screen, one to read a line from the keyboard, and
so on.
In all, there are just 20 PP services covering device I/O, date/time, windows,
graphics, and other requirements. Providing an interface to these PP services
implements a PP on a new platform. The PORT CP presents each PP request as a
5x64-bit word block in common memory. The structure contains the service code
along with any relevant parameters and addresses such as buffer locations.
This simple mechanism is easier to port than interrupt protocols and message
packets. The gyrations used by the PP program to honor a PP request are
transparent to PORT. Whether it uses direct ROM BIOS, Int 21h services,
Windows services, or UNIX APIs is entirely up to the PP implementor.


High-Level Operatives


If the metacode simply implements low-level primitives such as add, subtract,
and multiply, it will be demolished by native-code compilers. (This is what
happened to UCSD Pascal.) The overhead to decode each meta-instruction becomes
the pacing factor.
PORT's trick is to implement a rich suite of high-level operatives--SQRT, SIN,
LOG, A**B, EXP, ACOS, and all other intrinsics are direct PORT
meta-instructions. For example TH=ATAN2 (X,Y) is a single instruction. PORT
extends this concept to other frequently used operatives. For instance,
Y=ZZPOLY(COEFFS,X) is a direct PORT instruction that evaluates a polynomial
expansion. Complex-number operations are also direct meta-instructions. Decode
overhead is a small fraction of the execution time for high-level operatives.
Native-code compilers have the advantage on A=B, but they execute most
high-level intrinsics such as A=TAN(B) via procedure calls, which carry a
substantial stack push/pop overhead. Here the metacode has the advantage
because the decode overhead for A=TAN(B) is the same as for A=B. A metacode
enjoys a bonanza on floating-point functions like ZZPOLY, where the CP can
make maximal use of the math coprocessor registers and have the 386 compute in
parallel.
The metacode goes on the offensive in block operatives. Consider the statement
CALL ARYMOV(A(I),100000,B(J)), which copies 100,000 64-bit words from A(1) to
B(J) as a single meta-instruction. The CP employs the 386/486 instruction REP
MOVSD, which is an order of magnitude faster than even a native-code Fortran
DO loop or a C for loop. The PORT metacode provides operatives for block copy,
initialize, search, checksum, and others. It also provides direct
meta-instructions for all string operatives (copy, concatenate, search, and so
on). The metacode is currently being extended to fast Fourier transforms,
matrix multiply, vector scale/translate, and others.


Debugging Metacode


As mentioned, it's the last 5 percent of bugs that typically pace the entire
software timetable. A key factor in the PORT metacode design was to
incorporate the maximum number of checks possible. (I'll detail these checks
in future articles.) Suffice to say that they include bounds checks on all
array references, pointer validation, uninitialized variable checks, invalid
floating-point numbers, incorrect loop limits, and invalid strings. These
checks are active at all times, in all programs (including the PORT system
itself) without exceptions.
I cannot overemphasize how invaluable these checks have been in both software
development and in wringing out versions of CP and PP. Invariably, the CP
metacode decoder for a new platform has obscure bugs. The constant checking by
the subsequent meta-instructions has proved that the corrupted results do not
migrate too far before a fault occurs. For a native-code compiler to output
these checks on every instruction would make the executable image too
unwieldy. Without these checks, however, nightmare bugs are a certainty. Most
compilers have a debug option, but the worst bugs often occur in release
versions of the code, and all too often they mysteriously disappear with debug
active.
A significant feature of the CP/PP separation is that the PP itself can be an
important debugger. When a serious error occurs under DOS or UNIX, the machine
can hang, leaving only postmortem debugging as an option. If PORT goes off
into the weeds, the PP is still alive on the host system and can probe even
the most intimate level of PORT. This makes checking out PORT on a new
platform much easier.


The Minimal Case


Last month I described the implementation of PORT on a 386/486 PC with plug-in
multiple i860s. The emphasis there was on RISC-processor performance. Now
let's examine PORT in an environment at the opposite end of the spectrum--a
low-cost 386SX PC with four Mbytes RAM, math coprocessor, and 60-Mbyte hard
disk. The CP and PP are both executed by the 386SX. To maximize performance
the CP executes in 32-bit protected mode and turns the 3-Mbyte extended memory
into the common memory. The CP is simply a 45-Kbyte assembly program that
reads 64-bit numbers from extended memory as 32-bit pairs, and performs the
operation specified by a bit field in each. Basically, the CP program just
rattles pairs of 32-bit numbers around in extended memory. The CP itself does
not have to reside in extended memory. By residing in lower 640K, transfer
from the CP to PP is simplified, eliminating the need for a DOS extender.
The PP is just a 16-bit real-mode assembly program that reads a 40-byte block
from extended memory and calls on ROM BIOS and DOS interrupt services to
execute the I/O request. Both the CP and PP are procedures in a PORT.EXE
executable that run in 200 Kbytes of lower memory. PORT takes over the
extended DOS partition on the hard disk. If the primary extended partitions
are each allocated 30 Mbytes, then DOS occupies the lower half of the disk and
PORT the upper half.
Because PORT has its own file management, it is not tied to the DOS Int 21h
file services. Direct ROM BIOS 13h (or direct SCSI commands) are an order of
magnitude faster. Not surprisingly, PORT's disk I/O is many times faster than
that of DOS. The current PP for DOS even handles its own bad-track
redirection. The PP doesn't just clean up a few sectors; it sweeps up a whole
disk, track by track.
A direct benefit of the no-exception, 32-Kbyte disk block is that the
DOS-based PP can implement highly efficient disk caching. If the PP finds more
than eight Mbytes of extended memory, it turns the rest into a cache pool as a
simple multiple of 32-Kbyte pages. Pages can be transferred from cache via the
blistering fast 386/486 32-bit REP MOVSD instruction. (In a dual-processor
implementation, the PP caching proceeds in parallel with the CP computation.)
Bear in mind that PORT utilizes virtual memory, so the amount of RAM available
merely affects speed. Beyond eight Mbytes, the RAM tends to be wasted and is
more profitably employed as cache, but the division is user modifiable.
Although PORT has its own file management, it provides subroutines and
utilities to read, write, and manipulate DOS files. Of course, any use of this
feature is nonportable. It does, however, make PORT fully compatible with
network use and DOS-based applications. Frequently-used files are generally
copied from DOS files into their faster PORT equivalents. (As long as the
application uses PORT files, it remains seamlessly portable.) The 16-bit
realmode implementation of the PP allows PORT to be 100 percent compatible
with DOS, and you can move freely between the two by executing PORT.EXE.


Conclusion


You may feel PORT's obsession with 64 bit to be excessive, but already most
RISC microprocessors are 64 bit, the 80387/486/Weitek math coprocessors target
64 bit--and there's no doubt that the 80586+ will use 64 bit. Likewise, use of
32-Kbyte (soon 64-Kbyte) disk blocks may seem excessive, but disk-transfer
time is becoming insignificant compared to (mechanical) seek time.
High-perfomance RISC processors are proliferating, and the metacode approach
is ideal for realizing their potential--particularly with multiple RISC
processors. (It's even rumored that the 80586 will provide RISC on-chip.)
UNIX has done much to legitimize portability, but each implementation retains
a strong affinity to its platform. A hardware-independent "virtual computer"
is critical to cost effectively porting multi-megabyte applications.


Bibliography


Amdahl, G.M. "Validity of the Single Processor Approach to Achieving
Large-Scale Computing Capabilities." AFIPS Spring Joint Computer Conference
Proceedings (Volume 30, 1967).
Bowles, K.L., S.D. Franklin, and D.J. Volper. Problem Solving Using UCSD
Pascal. Berlin: Springer-Verlag, 1984.


Portability vs. Performance


Output of a metacode from a compiler is nothing new. Ken Bowles's UCSD Pascal
generated a machine-independent Pcode that was popular in the early '80s.
Metacodes have been used for achieving machine independence, but past
implementations had one big disadvantage--performance (or rather the lack of
it). Last month, I showed how a metacode has an advantage on RISC processors.
How does the PORT implementation of it stack up on a CISC processor?

Table 1 compares the Dhrystone and Linpack performance of the 386/33 and
486/33 PC vs. the IBM RS/6000, Sun SPARC SLC, Silicon Graphics Indigo, and the
HP 9000 series 720 superworkstation. Accordingly, the HP, 720 is far ahead of
the pack and supposedly an order of magnitude faster in floating point than
the 486.
Table 1: RISC performance under popular benchmarks (Personal Workstation, June
1991). Higher numbers are faster.

 Dhrystone Linpack
 2.0/2.1 Single Double
 w/register (32-bit) (64-bit)
 -------------------------------------------------------------------

 CISC
 486/25 via DOS extender (typical) 26,300 1.16 1.08
 486/33 via DOS extender (typical) 34,000 1.50 1.40
 RISC
 i860/33 (Microway Number Smasher) 29,819 1.23 1.11
 SPARCstation SLC 18,255 2.25 1.20
 Silicon Graphics Iris 25D 24,630 2.62 1.35
 Motorola 88000/25 (Everex 8825) 50,033 1.67 1.02
 MIPS 3000/33 (Magnum 3000/33) 56,012 6.48 4.80
 IBM RS/6000 (POWERstation 320) 45,454 8.15 7.29
 HP 9000 series (model 720, 50 MHz) 86,335 17.0 14.4

Table 2 compares PORT on a 386/33, 486/33, and PC+i860/33 vs. the HP 720 using
the DISSPLA (and equivalent GSL) manual sample plots; see Figure 2. This is an
extension of a similar table presented last month and shows that the 20-MHz
386 with the 33-MHz i860 under PORT is not far behind the 50-MHz HP 720. Note
that the 486/33 is not outpaced by the order of magnitude predicted by the
Dhrystone and Linpack results. The DISSPLA timings reflect the composite
performance of the entire system, including large program execution, I/O
service, and graphics output. The operative word is "composite"--popular
benchmarks reflect the performance of a processor on a few tight loops in a
vacuum, not the through-put of a real-world massive program. Native RISC code
(on the RS/6000 and HP 720) has a tremendous advantage when it can iterate on
small loops, but the DISSPLA sample plots reflect a normal program whose loops
have frequent calls and branches.
Table 2: PC+i860 vs. HP 9000 series 720 using DISSPLA sample plots. Times are
in seconds.

 CA-DISSPLA/ PORT G.S.L. CA-DISSPLA
 GSL Manual Vectors Filled 386 486 i860/33+ on HP 720
 reference no. Polygons 33 MHz 33 MHz 386/20 50 MHz
 ----------------------------------------------------------------------

 DM3004/B31-3 3,373 0 18 8 6 3.5
 DM4003/B31-7 9,366 137 21 10 8 2.6
 DM7001/B31-11 18,526 0 28 13 10 4.2
 DM7004/B31-22 22,002 215 110 53 39 37.2
 DM8002/B31-27 15,215 161 55 26 19 13.0

It may seem incongruous that the 486 is not that much slower than the PC+i860
in the DISSPLA plots. I found that the times were identical for the Hyperspeed
i860 card plugged into a 486/33 and a 386/20, indicating that the
2-Mbyte/second ISA bus was saturated, (Figure 2 is an example of dominant
graphics I/O.) Keep in mind that the 486 is a high-performance RISC processor
internally with eight Kbytes of immediate cache. A tight 32-bit program like
PORT CP which heavily uses 32-bit registers tends to exploit this RISC
affinity.
I must emphasize that PORT's graphics subroutine library is a vastly rewritten
version of DISSPLA and probably more efficient than CA-DISSPLA, but the two
produce identical output. Futhermore, PORT operates all 64 bit and uses
software virtual memory. PORT checks array bounds on every array reference,
tests for uninitialized variables on every arithmetic/compare instruction, and
performs a host of other checks not executed by the HP 720.
Which system is faster is not the issue. The point is that the PORT metacode
approach and CP/PP architecture hold their own. A "virtual computer" does not
have to be a dog of an interpreter, as is widely believed.


Amdahl's Law


Actually, the performance of the PORT metacode approach can be predicted from
Amdahl's Equation. According to Amdahl, the average instruction time is T[av]
= F*T[f] +(1-F)* T[s], where T[s] is the time of the slowest instructions,
T[f] is the time of the fastest instructions and F is the fraction of slowest
instructions. For a typical 486 native-code compiler, T[f] is one microsecond
on average for A=B, I = J+K, and other simple integer operations. T[s]
reflects the time for floating-point operations. Accounting for call and
math-coprocessor overhead, 50 microseconds is a reasonable value for T[s].
Even in a highly floating-point intensive application we can reasonably assume
80 percent of the instructions are fast integer ops, so: Native T[av] =
0.8*1+(1-0.8)*-50=10.8 microsec.
Now assume that the Port metacode is five times slower than the native
compiler in integer and other fast ops (due to its decode overhead, 64 bit,
and so on), but it can reasonably shave 20 percent off the floating-point ops
by eliminating stack overhead and using coprocessor more efficiently. In this
case, T[f] equals 5 microseconds and T[s] equals 40 microseconds, so: PORT
T[av] =0.8* 5+(1-0.8)*40= 12 microsec. This is not far from the native
compiler.
The key result of this exercise is that shaving just 20 percent off the
slowest instructions can make an enormous difference. Amdahl's Law states that
no matter how much you work on the fastest process, the slowest process
ultimately dominates.
What about compilers, editors, and other pure-integer programs? Here the block
operatives, string operatives, and more efficient disk I/O can have an even
greater impact. The savings are not a few percentage points, but orders of
magnitude. (Our experience at SUPERSET was that in a key subroutine, replacing
a loop or two with block operatives could make a whopping difference.) The
DISSPLA timings indicate that the combination of all effects can substantially
outweigh the decode overhead.
--I.H.















July, 1992
PORTING UNIX TO THE 386 THE FINAL STEP


Running light with 386BSD




William Frederick Jolitz and Lynne Greer Jolitz


Bill was the principal developer of 2.8 and 2.9BSD and was the chief architect
of National Semiconductor's GENIX project, the first virtual-memory,
microprocessor-based UNIX system. Prior to establishing TeleMuse, a market
research firm, Lynne was vice president of marketing at Symmetric Computer
Systems. They conduct seminars on BSD, ISDN, and TCP/IP. Send e-mail questions
or comments to ljolitz@cardio.ucsf.edu. (c) 1992 TeleMuse.


Over the past couple of months, we've discussed the minimal code and
methodology required to fill in the missing pieces of the incomplete Net/2
tape, leading to an operational 386BSD kernel. Still, the step from a few
changes to a real system is great, as evident by the items that had to be
provided with this kernel: bootstraps, file systems, an installation
mechanism, binaries of utilities, and documentation.
However, we decided it was time to step away from the kernel and make all of
386BSD available and accessible, so that it could become the generic research
and educational platform we envisioned when we wrote "386BSD: A Modest
Proposal," the software specification for 386BSD, in mid-1989. So, on March
17, 1992 we launched 386BSD Release 0.0.


386BSD Release 0.0--Liftoff


386BSD Release 0.0 consisted of:
One distribution installation floppy.
One 8-floppy multivolume compressed TAR-format source distribution (31 Mbytes
uncompressed).
One 6-floppy multivolume compressed TAR-format binary distribution (28 Mbytes
uncompressed).
One 360K MS-DOS difference floppy (for those who want to do all the work
themselves).
Release notes regarding installation procedures, manifests, and
registration/bug report forms. All of 386BSD Release 0.0 was released under
freely redistributable and modifiable terms (with attribution to the authors
maintained--a small and reasonable request for so much).
With the assistance of several dedicated network volunteers (among them John
Sokel, Dan Kionka, and members of the Silicon Valley Computer Society), 386BSD
Release 0.0 was made widely available via the Internet. Within one month, an
estimated 100,000 sites had obtained 386BSD Release 0.0. (Several networks,
particularly in Australia, melted down over the transmission traffic and had
to be regulated.) The enthusiastic response from Internet, BBSs, and various
user groups (through which copies were widely distributed) has far exceeded
our expectations.
Another pleasant surprise was the number of software contributions, bug fixes,
and suggestions from early users. People were eager not only to supply their
code and knowledge, but to help others get their systems running too.
Finally, it was gratifying to learn that our little system, bugs and all, was
still capable of being complete enough to be used for the rest of its own
development. There's little question that, in less than a month, 386BSD
Release 0.0 was an unqualified success.


386BSD Release 0.1--The Second Stage


386BSD Release 0.1 is the most recent release as of this writing. It consists
of:
A single distribution installation floppy, referred to as "Tiny 386BSD."
One 15-floppy multivolume compressed TAR-format source distribution.
One 10-floppy multivolume compressed TAR-format binary distribution.
Installation notes, manifests, and registration/bug report forms.
The major Internet sites from which you can download 386BSD 0.1 are
agate.berkeley.edu and reyes.stanford.edu. Additionally, we're available to
answer questions and provide some support on CompuServe (CIS#76703,4266) and
in the UNIX/BSD conference on Bix.
For limited program development, a 386/486 system should contain at least two
Mbytes of RAM and a 40-Mbyte hard disk. To make full use of the source tree
and generate new software distributions, a 200-Mbyte disk is recommended. (You
can also use 386BSD's version of NFS to obtain space via the Ethernet off a
central shared server.) Performance obviously improves with faster processors
and more memory.
Just in case anyone is wondering, the entire 386BSD system was created on a
386SX laptop with three Mbytes of RAM and one 100-Mbyte disk.


What's in 386BSD Release 0.1


Thanks to many knowledgeable users, 386BSD 0.1 is a more robust version of
386BSD, supporting broader combinations of PC hardware and simpler
installation procedures. 386BSD also provides for more features and
functionality.
386BSD 0.1 also contains many utilities which can be used in development work
(see Table 1), including a C compiler, C++ compiler, loader, network protocol
family (TCP/IP), and so forth. 386BSD also contains a complete set of Internet
networking facilities (including NFS), a program development environment for
gigabyte-sized programs, document-preparation and text-editing tools, and
database mechanisms. And finally, it can rebuild itself from its own source
tree.
Table 1: 386BSD Release 0.1 utilities.

 apropos env lpr pwd timed
 ar eqn lprm query timedc
 arp expand lptest ranlib tip
 as expr ls rcp tn3270
 bad144 false m4 rdist touch

 badsect find machine rdump tput
 basename finger mail reboot tr
 biff fmt mailstats renice trace
 cal fold make restore traceroute
 calendar from man rlogin troff
 cat fsck mesg rm true
 cc fstat mkdep rmail tsort
 checknr ftp mkdir rmdir tty
 chgrp g++ mkfifo rmt tunefs
 chmod gcc mknod route ul
 chown gdb mkstr routed umount
 chpass genclass more rrestore uncompress
 chroot grep mount rsh unexpand
 cksum groff mountd savecore unifdef
 clear grops mset script uniq
 clri grotty mtree sed unvis
 cmp groups mv sh update
 col halt named shar uptime
 colcrt head newfs showmount users
 colrm hexdump nfsd shutdown uudecode
 column hostname nfsiod size uuencode
 comm id nfsstat slattach vacation
 compress ifconfig nice sleep vipw
 config inetd nld sliplogin vis
 cp init nm soelim w
 cpio install nohup split wall
 cpp kdump nroff strings wc
 csh kill nslookup strip what
 ctags ktrace nsquery stty whatis
 cu last nstest su whereis
 cut ld old swapon which
 date leave pac symorder who
 dd lex pagesize sync whoami
 df ln passwd syslogd whois
 dirname locate paste talk write
 disklabel lock pic tar xargs
 diskpart logger ping tbl xstr
 du login portmap tee yacc
 dump logname printenv telnet yes
 dumpfs lpc printf test yyfix
 echo lpd ps tftp zcat
 elvis lpq psroff time



Qualifying a PC to Run 386BSD


Release 0.1 can be run in as little as a single floppy disk, using the Tiny
386BSD diskette available through the DDJ Careware Project. Send us a
formatted, error-free, high-density 3.5 or 5.25 floppy diskette and an
addressed, stamped diskette mailer, in care of: Tiny 386BSD, Dr. Dobb's
Journal, 411 Borel Ave., San Mateo, CA 94402, and we'll send you the latest
copy. There's no charge, but if you want to slip in a dollar or so to help out
the kids at the Children's Support League of the East Bay, we know they'd
appreciate it. (You can also obtain this software directly from the sites
mentioned above.) In addition to experimenting with a very minimal 386BSD
system prior to loading any software on the hard disk, Tiny 386BSD allows you
to validate 386BSD operation on a PC.
Simply insert the floppy into the drive and boot up the PC. If it boots and
prompts you for a shell command (#), you're ready for installation of the rest
of the system. If it fails, it's time to skull out the PC configuration,
jumpers, BIOS setup menu, and all the other little "bright spots" that make
for interesting compatibility problems. The general rule here is to isolate
the problem by comparing cases that work with those that don't until an
explanation can be formed.
In general, nifty hardware features are the source of most compatibility
problems, along with bizarre hardware combinations that are "on the edge."
Both should be avoided or defeated when they are behaving suspiciously.
Mainstream hardware from patient and understanding firms helps a great deal.
Another problem arises with non-standard "old" equipment that you just happen
to have lying around. Do yourself a favor and leave these particular PCs to
MS-DOS, since other hidden surprises possibly await. In short, avoid the
"pathological" cases wherever possible.


Installing 386BSD Release 0.1


To install the rest of 386BSD, we must allocate a large portion (greater than
or equal to 40 Mbytes) of formatted disk space on a hard disk--either the
entire contents of a disk drive or the remaining contents of a disk drive
after other systems have been partitioned. The process discussed here is an
exact duplicate of the mechanism we devised five years ago for
system-installation procedures on Symmetric Computer Systems' 375 computers.
(This was contributed to Berkeley, and parts appeared in the 4.3BSD Tahoe
release.) Since 386BSD 0.1 is experimental software, we recommend that it be
run with a dedicated disk on a dedicated system, until it matures.
Disk space must be formatted by a utility. IDE and SCSI drives are already
preformatted. ESDI controllers have formatting programs in ROM that can be run
from a MS-DOS Debug utility, and the geometry of the drive can be obtained
from these programs (in terms of cylinders, tracks or heads, and sectors).
ESDI drives use the last cylinder to hold bad block tables; currently these
are not used, and the last cylinder must be ignored. In addition, since 386BSD
uses its own bad block revectoring (a la DEC standard 144), sector sparing
should not be used.

If possible, the formatted drive should have the same low-level geometry of
the hard drive itself. However, drives greater than 1024 cylinders are run in
a logical translation mode by the disk controller to make up for a limitation
in MS-DOS. While this logical formatting will work, the 386BSD file system
will not be as efficient, since its clever rotational placement algorithms
won't mesh with what the physical drive is actually doing.
With the drive geometry information in hand, a disktab entry describing how
386BSD is to use the disk space must be written; Example 1 is a sample disktab
entry. In general, we prefer a 5- to 10-Mbyte root partition, a swap partition
about twice as big as the amount of RAM memory, and a /usr partition that
contains the remainder. Each partition has a size, offset, and type associated
with it. Both size and type are in units of sectors, and each partition is
arranged to start at the beginning of a cylinder, so that the rotational
placement algorithm won't be thrown off by a logical partition offset. We then
use the disklabel command from the floppy-based system (analogous to fdisk) to
install a machine-readable version of the disktab entry onto the hard disk,
along with a bootstrap program.
Example 1: A sample disktab entry.

 cp3100Connor Peripherals 100MB IDE:\
 :dt=ST506:ty=winchester:se#512:nt#8:ns#33:nc:#766:sf:\
 :pa#12144:oa#0:ta=4.2BSD:ba#4096:fa#512:\
 :pb#12144:ob#12144:tb=swap:\
 :pc#202224:oc#0: \
 :ph#177936:oh#24288:th=4.2BSD:bh#4096:fh#512:

Once the hard-disk configuration is completed, we must do a "high-level"
formatting of the 386BSD partitions that will hold files. Analogous to the
MS-DOS format program that creates a blank file structure on a hard disk, the
newfs program is executed off the floppy and initializes the root and user
partitions of the hard disk.
Next, the root partition is made accessible as a file system by means of the
mount command, and the contents are transferred from the floppy-based system
to the hard-disk root file system and dismounted, making the hard disk
bootable. The floppy-based system is shut down gracefully, and the hard-disk
version booted in its stead. At this point, the floppy disk can now be used to
load on the remainder of the system. We mount the empty user file system and
proceed to reload the file system with the tar utility to extract the system
from a multivolume floppy dump.
For those with access, the boot floppy allows the restore to occur over the
network, thus eliminating the need for extracting from floppy dumps.


System Configuration


All the configuration files are located in the /etc directory of the root file
system, including brief notes on setting up the system. The elvis text editor
can be used to modify them as needed. Further configuration and expansion can
be accomplished by loading the sources from another multivolume floppy dump,
and recompiling the system and its utilities.
386BSD documentation, including the installation procedures for the rest of
386BSD (binary and/or source), should be available as part of the online
386-BSD release. If it is not online, contact the sysop or moderator to have
it installed, and send e-mail via CompuServe at 76703,4266.


Perspective: The Importance of Thinking Small


In the spirit of "running light without overbyte," 386BSD is a minimalist
system. This approach has allowed us to easily discuss the important paradigms
effectively leveraged in UNIX and other modern operating systems. Over the
course of this series, our minimalist approach has forced us to cleave to a
basic understanding of the functional core of the system, and not get bogged
down in the minutiae of building obscure utilities. It has also provided us
with a bit of "editorial" oversight on the contents of our system.
Occasionally, it is as necessary to discard an item as it is to create
one--otherwise, we would be "hip deep" in relics held past their prime. In
addition, a minimalist system is an excellent educational and training
platform in the teaching of operating systems, net-working, file systems, C++,
and software management.
Another virtue of our minimalist approach is that the sizes of the source and
the operating binaries was greatly reduced in bulk. Paring down redundant
utilities and source code increases ease of use without loss of functionality.


Where is 386BSD Heading


In subsequent releases, we expect 386BSD to grow in both size of distribution
and stature of function quality, without losing its minimalist design. Some
topics now ripe for exploration are as follows:
Many UNIX systems have adjusted only "fitfully" to life on the PC; it's almost
as if they were immiscible from the very start. Yet many good features
developed in MS-DOS and Windows are missing from the UNIX paradigm. It's a
shame that parochial attitudes keep UNIX systems architects from leveraging
the largest programming environment present in the world today.
File I/O and networking transmission rates seem to be the limiting factor in
UNIX systems performance, especially as the audio/video data demands of
multimedia are starting to become significant. The problem is not with the
hardware, but with the software and overall architectures. As such, PCs and
workstations provide actual data transfer rates at only a fraction of the
ten-Mbyte per second possible with state-of-the-art hardware technologies.
It's hard to believe that, while we're on the verge of 100-MIP processors,
most PCs will be transferring data at less than an order of magnitude faster
than the original, wheezing 8-bit PC.
Light-weight processes formed in a tiny fraction of a millisecond are a
necessary component to experimentation with new programming architectures.
Multiprocessor versions of these processes should allow extensions into the
time and space domains of multi-threaded models.
The ability to explore the file-system metaphor without the need for kernel
programming is an interesting challenge. File systems are a popular area of
study these days, because they are an ideal vehicle for exploring system
performance (bandwidth), integrity (file stability and recovery), distributed
systems (locality and replication), and the central universal abstraction of
the applications program interface (like that in Plan 9; see DDJ, January
1991).


Farewell to the Porting Series--Onward to New Topics


With the completion of 386BSD and its widespread availability, it's time to
bid farewell to our "Porting UNIX to the 386" series. After 17 installments
(and, believe it or not, quite a few shortcuts), we actually finished our
port, and we are happy that people can finally use the system about which we
have spent so long writing.
With each new version of 386BSD, we hope to see it become more affordable,
available, accessible, modifiable, and understandable. 386BSD still has quite
a way to go towards becoming a mature operating system, but already it has
traveled the "useful" portion of the distance.
As such, we intend to explore topics such as networking, which impact not only
386BSD, but modern operating systems in general. While we may revisit 386BSD
in our discussions, it is important to view it in the context of other modern
operating systems approaches on the UNIX side (Plan 9, Mach, Minix, and the
like) and in the broader commercial domain (MS-DOS, Windows-NT, OS/2 2.0, and
so forth). 386BSD is really a microcosm of what these "big" operating systems
are all about.
Given support and encouragement, we also intend to continue the educational
and research direction upon which 386BSD was based, and we will continue to
assist other groups who wish to head in this direction with us. However, the
growth of 386BSD and discussion of new approaches depends on the continued
goodwill and enthusiasm of its user base. Everyone is welcome to participate
in this process.















July, 1992
THE DR. DOBB'S HANDWRITING RECOGNITION CONTEST


Ray Valdes


This month marks the official launch of the Dr. Dobb's Handprinting
Recognition Contest. If you've been following recent issues of Dr. Dobb's
Journal, you'll recall that Ron Avitzur got the ball rolling in the April
issue, by presenting a Macintosh-based handprint recognizer, complete with an
interactive data-collector application. Ron has since written a
platform-independent harness to test recognition engines. This harness works
off stylus data stored in disk files, rather than requiring interactive
digitizing hardware, a pen computer, or a pen operating system.
Before delving into technical details of the harness, here's a quick summary
of contest rules. For this first-ever competition, we're fortunate to be able
to offer an extremely tasty first prize--in the form of a PowerBook 100
generously provided by Apple Computer. The contest begins on June 15th, when
the official version of the DDJ test framework, test data, and contest entry
blank become available electronically. Deadline for submissions is September
15th. We'll announce a winner in our December issue.
Your recognizer can use any platform on which the DDJ test harness runs. The
DDJ harness code assumes only the C standard library. However, even though you
can run the harness on any platform that has a C compiler, we can only test
your code on Macintosh or PC platforms. Assuming your code is portably
written, this should not be a problem.
You must send in both source code and an executable. Any other written
commentary or documentation is also welcome. Source code is for publication
and can be in C (or, on the PC, in any language that can be linked to the OBJ
files of the DDJ test harness).
Submissions will be judged primarily on recognition accuracy. Speed is a
secondary consideration; third is the conciseness and elegance of your
implementation.


How the Harness Works


The test-harness package contains executable, source, object, make, and data
files, as well as a sample recognizer by Ron Avitzur. The READ.ME file
describes all of these in detail.
The DDJ test harness first reads all information from the character-data file
into an in-memory data structure. The character-data file is in binary format.
For each ASCII character, there can be a variable number of character
prototypes (sample characters). Each character prototype, also known as a
gesture, is composed of a variable number of strokes. Each stroke is composed
of a variable number of points. The process of reading in the data therefore
consists of several nested for loops.
After reading in the data, the harness loops through the top-level Char-Data[]
array, which contains pointers to lists of prototypes. During the training
phase, characters are passed to your recognizer's Train() routine. Your
training routine should derive from this data a set of features that will
later be used in the recognition phase.
During the recognition phase, the test harness passes a different selection of
characters to your recognizer's Guess() routine, which can return up to three
guesses per character. Each guess must have an associated weight or confidence
value.
Writing a general-purpose recognizer can be a large and daunting task. For
purposes of the contest, we've constrained the problem in various ways. In the
test data, segmentation of strokes into individual characters has already
occurred. The sample recognizer works a character at a time, as opposed to
using context information (such as a word dictionary). The character set
consists only of alphanumeric characters plus a few punctuation characters.
Input data consists of stylus datapoints from pen-down to pen-up. There is no
proximity information or velocity data, nor are there timestamps associated
with point coordinates.


Hints for Contestants


The sample recognizer included with the test-harness package performs pretty
well, with better than 90 percent accuracy on certain sample data.
Nevertheless, it suffers from a number of limitations which you can improve
upon:
The sample recognizer uses mostly local information (the relationship of one
point to the following) rather than global information. It may be fruitful to
select five important points from different parts of a character and establish
how these points relate to each other. Another approach would be to set up a
coarse 4x8 grid and color in the pixels in the grid which are touched by a
character.
The sample recognizer filters out raw data points into a smaller number of
points from which the features are then derived. Its simplification routine is
straightforward, and currently "cuts off" corners; that is, it does not
distinguish a corner point from any other point.
The sample recognizer normalizes every character to the same square,
discarding potentially useful information about the aspect ratio of the
character. The current recognizer cannot tell the difference between a tall,
skinny character and a short, squat one, assuming the stroke motions are
similar.
The sample recognizer stores all information about each character in a single
"bin." The features for all versions of a character are therefore muddled
together, which might confuse the current recognizer in the case of very
different legitimate ways of writing a particular character.


The Importance of Data


As many researchers have discovered, writing good code is only part of the
problem in building a recognition engine. The rest includes amassing a
suitable collection of test data.
Our sample recognizer works well with our current set of data, but may stumble
on other valid data that it has not previously encountered. For judging the
contest, therefore, we will attempt to run all recognizers on as broad a data
set as possible, including any data that you submit with your entry.






















July, 1992
THE I860 AS A GRAPHICS CONTROLLER


3-D graphics transformations and rendering




Debra Cohen


Debra joined Intel in 1989 after completing her Master's and Engineer's
degrees in EE/CS at MIT. She can be reached through the DDJ offices.


The requirements for a powerful graphics processor include fast spatial
transformations and fast rendering. Fast transformations, which demand fast
matrix manipulations, are necessary because typical three-dimensional graphics
applications translate and rotate objects on a screen in three-dimensional
space. Fast rendering is necessary because applications typically store an
object to be displayed as a collection of vertex locations for polygons
(commonly triangles) which describe the object's surface. Each vertex is
stored along with its corresponding information--color and depth (Z-value),
for example. The set of vertices and their associated information is also
called a "display list."
Before an object can be displayed on a screen, a processor must first flesh
out the attributes for pixels between vertices. This fleshing out of a
complete pixel-by-pixel representation from the vertex information stored in
the graphical database is accomplished by interpolating the vertex attributes
(that is, color and depth values) for all the pixels inside the polygons
determined by the vertices in the database. One of the most common algorithms
for interpolating color values, "Gouraud shading," is simply linear
interpolation.


Fast Matrix Manipulations


Due to its highly parallel architecture, Intel's i860 processor excels at
matrix manipulations. For one thing, the processor can run in dual instruction
mode (DIM), whereby a core instruction (a load or store) can execute at the
same time as a floating-point instruction (a multiply). On the 50-MHz i860 XP
CPU, the 128-bit-wide data-cache-to-register-file path provides sustainable
data throughput of 800 Mbytes per second for accesses to data in the 16-Kbyte
on-chip data cache. For accesses that miss the cache, a 64-bit-wide burst bus
shuttles data in and out of the processor at up to 400 Mbytes/sec.
While the wide data paths continuously feed the floating-point units, the
pipelined architecture ensures that the data is disposed of expeditiously. The
floating-point multiplier sustains one result every clock in single-precision
mode (32-bit results), or one result every other clock in double-precision
mode (64-bit results). The floating-point adder keeps pace, producing one
result per clock in either single- or double-precision mode.
Furthermore, using its dual-operation instructions, the i860 CPU can perform
both floating-point adder and multiplier operations simultaneously. The
multiplier and adder can each sustain a throughput of one result per clock in
single-precision pipelined mode, so the net throughput of dual-instruction
mode and dual-operation mode together is up to three results per clock.


Interpolation Support


Display-list processing requires two types of interpolation: color
interpolation and depth interpolation. With color interpolation, pixel colors
are stored and fed to DACs as red, green, and blue (RGB) values, and each
color component must be interpolated separately. The i860 CPU employs the
faddp.d (floating-point add pixel, double-word length) instruction to
accomplish this. Because faddp.d always operates on 64 bits' worth of pixels
at a time, those 64 bits are interpreted in different ways, depending on the
software-supplied setting of the PS (pixel size) field of the control register
PSR.
The i860 architecture incorporates hardware support for 8-, 16-, and 24- or
32-bit pixels (with 24- and 32-bit pixels being treated identically). For
simplicity, let's illustrate using 32-bit pixels. Sixteen-bit pixels are
similar to 32-bit pixels; 8-bit pixels are more confusing, because there is no
standard for representing or processing them.
We begin by calculating the blue (B), green (G), and red (R) intensities for
the first two pixels (i and i+1) in a triangle scan line; see Figure 1. Each
color intensity is represented as an 8-bit integer portion and a 24-bit binary
fraction for purposes of calculation. Let's assume that we've calculated the
total color delta over the current triangle scan line for each color component
R, G, and B (for example, B_color_delta = B[i+n]-B[i]) and have divided that
color delta by the number of pixels to be interpolated across (pixel_delta =
n). The result of this division, also represented as an 8-bit integer and a
24-bit fraction, is the incremental color delta.
Now we're ready to recursively add the incremental color delta for each color
component to the initial values, so that each successive pixel's RGB values
along the triangle scan line are calculated. Here's where faddp.d helps out by
automating and speeding up the process.
First let's calculate B values for the next two pixels (i+2 and i+3) in the
triangle scan line. To do that, just put the initial B values for the first
two pixels of the triangle scan line (i and i+1) into faddp.d's 64-bit op1,
side by side in the op1 register pair, as shown in Figure 2. You'll need to
use the predefined format of eight bits integer portion and 24 bits fractional
portion to use the instruction properly. Then let op2 = two instances of 2*
(B_color_delta)/pixel_delta, again side by side in the op2 register pair. The
reason the interpolant value is 2* (B_color_delta)/pixel_delta, rather than
simply B_color_delta/pixel_delta, is that you are interpolating from pixel i
to pixel i+2 in one half of the register pair, and from pixel i+1 to pixel i+3
in the other half.
In one clock faddp.d adds the color fields, generating the B values for the
next two pixels. (In fact, like most i860 CPU instructions, all the graphics
instructions execute in just one clock.) The result is placed in the fdest
register pair so that it can be used as the op1 next time around, in order to
generate the B values for pixels i+4 and i+5.
In addition, when PS is set for 32-bit pixels, faddp.d shifts the MERGE
register right by eight bits and then updates certain MERGE fields with the
integer portions of the faddp.d result. That's so that after three
applications of faddp.d--once for R values, once for Gs, and once for Bs--the
RGB values for two pixels will be consolidated ("merged") in the MERGE
register in precisely the arrangement (packed-pixel format) that graphics
hardware typically requires.
After three iterations of faddp.d, one 8-bit field is left unused in the MERGE
register. That field can have any other attribute (such as texture) ORed into
it with the form (floating-point or with merge) instruction. Form also
transfers the MERGE register contents into a floating-point register pair in
preparation for storing to the frame buffer, and it clears the MERGE register
for the next set of interpolations.
With the RGB values for pixels i, i+1, i+2, and i+3 calculated, the next op1
of the faddp.d instruction will be the B values of pixels i+2 and i+3; the B
interpolants in op2 remain the same as they were in the first set of B
interpolations. Likewise, after the B values for pixels i+4 and i+5 are
obtained, their G and R values are interpolated. In this way, the RGB values
for all pixels within a triangle scan line can be quickly and efficiently
calculated.
Sixteen-bit pixels are handled similarly to 32-bit pixels, except that for
purposes of calculation, colors are represented by an integer portion (for
example, Int[Bi]) of six bits and a fractional portion (Frac[Bi]) of ten bits.
As illustrated in Figure 3, one faddp.d sums two sets of four pixels' color
fields (blue, for instance), updates four 6-bit fields of the MERGE register,
and shifts MERGE right by six bits. After two more such instructions, one for
green and one for red, the MERGE register contains RGB values for four pixels
and is ready to be stored out. One difference for 16-bit pixels, however, is
that because there is not room in a 16-bit pixel for six bits each of R, G,
and B intensities, two fields (normally for R and G) are allocated six bits
each, while the third field (for B) is truncated to just four bits during
shifting of the MERGE register. The bits are allocated this way because the
human eye is significantly less sensitive to differing shades of blue than of
red or green.
Because 8-bit pixels are a nonstandard format, color interpolation for them is
often platform dependent. However, because the i860 CPU pixel interpolation
instructions only define operand field sizes, and not their uses, the 8-bit
faddp.d instruction can be easily adapted to a wide variety of
implementations.


Z-value Interpolation


In 3-D graphics applications, objects' surfaces, and the pixels that represent
these surfaces, have depth (Z-values) associated with them. Just like color
values, however, Z-values are only given explicitly for triangle vertices on
objects' surfaces. Z-values for pixels on or inside the triangles must be
interpolated from the vertex values.
Z-values can be either 16 or 32 bits long. To accelerate interpolations, the
graphics instruction faddz (floating-point add with Z merge) interpolates two
16-bit Z-values at a time. Just as in color interpolation, a Z-value
interpolant is recursively added to initial Z-values from pixels at one end of
a triangle scan line to generate the Z-values of pixels along the scan line.
As shown in Figure 4, the interpolation results are stored in a floating-point
register pair. Additionally, the MERGE register is shifted right 16 bits and
then updated with the integer portions of the interpolation sums. That way,
after two successive faddz instructions, the MERGE register contains 16-bit
Z-values for four pixels in a row.
Because 32-bit Z-buffer calculations require more bits of precision than can
be accommodated with faddz, they are more efficiently interpolated using the
64-bit integer add instruction, fiadd.dd.


Z-value Comparisons and Pixel Display


When displaying a 3-D object, not all of its surfaces are to be displayed
simultaneously, or the back of the object (with respect to a viewer) might
overwrite the front. Likewise, in a scene consisting of multiple objects, some
objects' surfaces may obscure other objects. This is why we calculate Z-values
during rendering: once Z-values have been calculated for all the different
objects' surfaces, those Z-values can be used to decide which surfaces to
display. Selecting which pixels to display is known as "hidden surface
removal."
One popular method of hidden surface removal is the Z-buffer approach. The
Z-buffer, an area of main memory, holds the Z-value of each pixel currently
displayed. The Z-buffer serves as a reference against which newly computed
pixels' Z-values can be checked.
If a newly computed pixel's Z-value is smaller (closer to the viewer) than the
Z-value of the pixel already displayed at that pixel's (x,y) coordinates, then
the newly computed pixel is displayed instead of the previous one, and the
Z-buffer is updated with the new pixel's Z-value. If the newly computed
pixel's Z-value is larger than the Z-value of the pixel already displayed at
that pixel's (x,y) coordinates, then the newly computed pixel is not displayed
at all, and the Z-buffer retains its value for the given pixel location.

The i860 CPU has two kinds of special graphics instructions, fzchks/fzchkl
(floating-point Z-buffer check short/long) and pst.d (double-word pixel
store), which expedite the Z-value comparison and subsequent store operations.
Fzchks compares four pairs of 16-bit Z-values in a swoop. Normally one of the
sets of four Z-values is from newly computed pixels; the other set is from the
Z-buffer. Fzchks first shifts the contents of the 8-bit PM (pixel mask) field
in the PSR control register right by four bits. Then it sets one of the
high-order bits of PM for each of the four comparisons that indicates that the
newly computed pixel has a smaller Z-value than the corresponding one stored
in the Z-buffer.
PM is shifted right so that the results of two successive fzchks instructions
accumulate in the 8-bit PM field. The PM field is used by the pst.d
instruction, which examines the contents of PM and stores to the frame buffer
only those pixels within its 64-bit register pair operand that correspond to
set bits in PM. Thus only those pixels which need to be updated in the frame
buffer are actually written out.
Fzchkl (l for long) is identical to fzchks (short) except that it compares two
pairs of 32-bit Z-values at a time, shifts PM right by only two bits, and only
updates the two high-order bits of PM corresponding to the results of the two
32-bit comparisons.


PS and PM Unrelated


Here's the only potentially confusing piece of the puzzle. Although PS and PM
are both used by pst.d, they are unrelated. That is, the number of bits
allotted to pixel size and to Z-value size are unrelated. You can have an
8-bit pixel with a 32-bit Z-buffer, a 32-bit pixel with a 16-bit Z-buffer, or
any other combination you please.
Pst.d stores 64 bits at a time, which represents eight pixels if your pixel
size is 8 bits, but only four pixels if your pixel size is 16 bits, or two
pixels if your pixel size is 32 bits. Although PM presumably has eight bits (8
pixels' worth) of information in it from multiple fzchks/1 instructions, pst.d
only examines the appropriate number of low-order bits of PM. (The
"appropriate" number depends on the pixel size as described in the next
section.) Pst.d also shifts PM right by 8/pixel_size_in_bytes bits, where
pixel_size_in_bytes is determined by PS. That sets up PM for the next pst.d.
Multiple pst.d instructions are executed until eight pixels in a row have been
stored to the frame buffer (or not stored, depending on the contents of PM).


Examples


Assume your pixel size is 8 bits (as determined by the PS field of PSR) and
your Z-values are 16 bits. In order to generate eight pixels' worth of Pixel
Mask information, you must perform two fzchks instructions, which compare four
Z-value pairs at a time. Then you must execute one pst.d, which stores (or
doesn't store, depending on PM) eight 8-bit pixels, exploiting all eight bits
of PM. All eight bits of PM have been "used up," so you must then proceed to
the next round of fzchks instructions before executing another pst.d. This
correlates with the fact that one pst.d shifts PM right by 8/1 = 8 bits--that
is, effectively shifts all eight bits out.
Alternatively, say your pixel size is 16 bits, and your Z-values are 32 bits.
Set up PM with four consecutive fzchkl instructions, each of which compares
two Z-value pairs at a time. Then, because one pst.d only stores (potentially)
four pixels, exploiting only the low-order four bits of PM, you'll need to
execute two pst.d instructions in a row before proceeding to the next fzchkl
instructions. Again, this makes sense because pst.d with 16-bit pixels shifts
PM by 8/2 = 4 bits.


Summary


Because they provide hardware support for rendering as well as fast
transformations, the i860 CPUs are optimal solutions for demanding graphics
applications. Scientific visualization, CAD/CAM, animation, and other
graphics-oriented applications can all benefit from the i860 CPUs' graphics
features, enjoying performance improvements of up to ten times compared to
conventional integer operations.






































July, 1992
PROGRAMMING QUICKTIME


Multimedia to the Macs


 This article contains the following executables: QUICKT.ARC


Aaron E. Walsh


Aaron is a consultant for Boston College, where he is currently engaged with
projects involving client-server programming, interapplication communications,
and e-mail APIs. He can be reached via AppleLink (A03), Bitnet
(Walshag@BCVMS), or Internet (walshag@BCVMS.BC.EDU).


Apple's QuickTime is a system-wide architecture for handling sophisticated
data elements that provides standard access to "time-based" data in typical
Macintosh fashion: Cut, Copy, and Paste. Time-based (or "dynamic") data is any
data type that can be stored and retrieved as values over time. In other
words, if the data changes over time, it is dynamic. Examples of time-based
data are video, sound, animation, or a graph of laboratory data over time.
When activated, time-based data appears to move, or play, just as movies or
music are played in the real world. Using QuickTime, a word processor can
incorporate into its documents video segments that come to life when selected.
These video segments can be copied and pasted into other applications, such as
spreadsheets or databases. When selected, the segment will display video and
play any sound it contains. In short, walking, talking documents have come to
the Macintosh.
In QuickTime, the term "movie" is used to describe a file which contains
time-based data. A movie is simply a repository for dynamic data--sound,
video, animation, or other. Movies may contain multiple sources of dynamic
data: A video sequence combined with music and animation is not uncommon.
As Figure 1 illustrates, each movie contains one or more "tracks," which are
actually pointers to data structures known as "media." A media is responsible
for the video, sound, or animation elements of a movie. When played, a
QuickTime movie resembles its cinematic counterpart: Video, animation, and
sound are played simultaneously. Thus, the media work together.
QuickTime access is not exclusive to the Macintosh, however. The movie-format
specifications have been fully published by Apple, encouraging the development
of QuickTime applications on other platforms. Apple further reinforced the
cross-platform design of QuickTime by providing programming routines which aid
in the transfer of Macintosh movie files to non-Macintosh computers.


QuickTime Toolbox Managers


At the system-software level, QuickTime consists of three major groups of
calls (Managers) that add over 500 new software routines to the Mac's already
considerable repertoire.
Movie Toolbox is a set of high-level system-software calls that load, play,
record, edit, and store dynamic data.
Component Manager provides applications access to external services without
requiring the application to have detailed knowledge of the resource
(digitizer card, VCR, software extension, and so on) providing those services.
These resources, known as "components," are allowed to register their
capabilities at run time. An application can then ask for a device with
specific capabilities (for example, a compression algorithm capable of
lossless compression), and expect the Component Manager to locate and
communicate with a corresponding device, if one exists.
Image Compression Manager (ICM) handles the interaction among the components
responsible for compressing and decompressing image data. The ICM hides the
actual algorithms, allowing developers to take advantage of numerous
compression schemes (JPEG, MPEG, Group 3 Fax) without becoming immersed in
implementation details.


QuickTime File Formats


At the core of QuickTime is its support for two file formats: one to handle
QuickTime data, the other an extension to the PICT file.
The movie format is the standard way in which QuickTime data is stored and
manipulated, created specifically to handle the inherent complexity and large
file sizes of dynamic data. A movie contains one or more tracks, which in turn
access a single media. The media is responsible for handling the raw data
samples (video, animation, or sound) associated with the track. A media's raw
data may be stored in the movie file itself, on separate disk files, CD-ROMS,
remote volumes, or other storage devices. In short, a movie contains one or
more tracks which use media to access raw data samples.
When a track becomes active, the data associated with its media is accessed
and played by QuickTime. Each track describes how its media data is to respond
when activated: playback speed, screen location, default volume, duration, and
so on. These parameters can vary from track to track, providing independent
control over a movie's various data elements.
When a movie is played, the media associated with each track is located and
played according to type: Graphics are displayed on screen; sound is played
through the speaker(s). If the media data represents a compressed image, the
ICM is called upon to decompress the data into usable form. The ICM, in turn,
asks the Component Manager for an appropriate decompression component (which
may exist in either hardware or software). If available, it is used by the ICM
to decompress the media data. The final image is handed to the Movie Toolbox
to be displayed on screen (see Figure 2).
Each data type has an associated "media handler." The media handler is
responsible for randomly accessing media data segments and playing those
segments at a rate specified by the movie track. Each media may have a
time-coordinate system different than that of the movie, in which case the
media handler is responsible for mapping between time-coordinate systems. The
Movie Toolbox shields programmers from details of this level, handling these
interactions transparently when a movie is played.
Essentially, a movie track consists of a pointer to media samples, the
appearance of which, when played, is controlled via track parameters.
Utilizing pointers and parameters, a movie file is easily edited. Setting a
track pointer to reference a media located on disk, rather than copying the
entire media into a movie file, significantly speeds the process.
More Details.
In addition to introducing the new movie file format, QuickTime extends the
functionality of the familiar PICT file format. While additional code is
needed to access movies, applications which view PICT files using the
DrawPicture() QuickDraw routine automatically have access to
QuickTime-compressed PICT files, without need for modification. When
DrawPicture() is called, the ICM is automatically invoked, transparently
decompressing the PICT and passing it to the calling application in the form
it expects.
The Movie Toolbox also provides new Standard File toolbox routines which allow
the user to create and view document previews. The original routines displayed
only the names of files in a selected directory, which meant the user had to
remember the filename when retrieving documents. The new routines display a
preview of the document, typically a small thumbnail PICT image, along with
the traditional document name, as shown in Figure 3. This thumbnail preview
provides the user with a visual representation of the file prior to opening
it.
Previews are not limited to movie and graphic documents. Nongraphic files
might contain previews which give descriptions of their content, size, or
creation/modification dates.
If a file is selected for which no preview is available, the system will
generate one upon request, if possible. To create a new preview for a file,
click on the Create button located on the left side of the dialog box. This
button is enabled only if the preview is missing but can be created by the
system. For detailed information on utilizing previews and preview dialogs,
refer to the sample code and documentation.


Programming QuickTime


Apple defines an application as Quick-Time "literate" if it supports two or
more of the following five features:
Playback of movies using the standard controller component.
Standard File Preview dialog box.
Still-image compression.
Storing data as a QuickTime movie.
Cut, copy, and pasting of movie data.
The two programs accompanying this article illustrate the process of playing
and creating QuickTime movies. Movie-Player (Listing One, page 102) uses the
Movie Toolbox and Component Manager to view an already-created movie.
MovieMaker (Listing Two, page 104) uses the Movie Toolbox, Component Manager,
and Image Compression Manager to create a QuickTime movie.

Both applications use the Gestalt Manager to determine the runtime
environment, ensuring that QuickTime is available. Generically, QuickTime
refers to the Movie Toolbox, the Component Manager, and the Image Compression
Manager. At startup, the QuickTime INIT installs the programming routines for
each manager. Together, the three managers provide a rich environment for
managing time-based data, known simply as QuickTime. If QuickTime is not
installed, ExitToShell() is called, which aborts program execution and returns
to the Finder. Basic Macintosh programming skills are assumed and will not be
covered in this article.


MoviePlayer


MoviePlayer (Listing One) illustrates programming techniques for playing
QuickTime movies. The program presents the user with a preview dialog box to
select an existing movie file from disk. The selected movie is displayed on
screen in a window, which has a standard movie controller attached to it. The
controller can be thought of as a remote-control unit attached to the movie
window, allowing the user to start the movie, pause the movie, play the movie
in reverse, adjust the volume, or jump from section to section. See Figure 4.
Although MoviePlayer is not a full-featured Macintosh application (it does not
have a menu, nor does it allow you to drag the movie window to a new screen
location), it does illustrate the general framework all movie players require:
Use the Gestalt Manager to determine if QuickTime is available.
Initialize the Movie Toolbox, providing access to QuickTime.
Select a movie file using the StandardGetFilePreview().
Open the selected movie file, preparing it for play.
Create a window in which to view the movie.
Use the Component Manager to obtain a standard movie controller.
Attach the controller to the movie display window.
Play/control the movie utilizing the controller.
Dispose of movie data structures when finished playing.
Exit QuickTime, freeing storage allocated by the Movie Toolbox.
Before calling any of the more than 500 QuickTime routines, you must first
ensure that QuickTime is indeed available on the machine running your
application. The most straightforward way of testing for information of this
type is with the Gestalt Manager. Testing for the presence of QuickTime is not
enough, however. It must then be initialized for use with a call to
Enter-Movies(). Only after initialization are you able to call QuickTime
routines.
MoviePlayer then calls StandardGetFilePreview(), prompting the user to select
a movie file from disk. Once selected, the movie file must be prepared for
play. This involves opening the disk file and creating from it a movie in
memory.
To display the movie, you must create a window in which to play it.
Additionally, you may provide a controller to allow user interaction with the
movie. MoviePlayer demonstrates both techniques. The user is able to perform
all the standard actions expected of a QuickTime movie: play, pause,
fast-forward, and reverse.
Once the movie has finished playing, the display window and the controller are
released from memory. The disk file is then closed, and the user is prompted
to select another movie to view. This select-play-dispose loop is repeated
until the user selects Cancel from the StandardGetFilePreview() dialog box. At
this point, the Movie Toolbox is exited, balancing the previous initialization
of QuickTime with EnterMovies(), and the MoviePlayer application is
terminated. (More detailed programmer's notes are available electronically.)


MovieMaker


MovieMaker (Listing Two) is similar to MoviePlayer in that it tests for the
presence of and initializes QuickTime at start-up. It also disposes of
allocated movie components before terminating execution. The major difference
between MoviePlayer and MovieMaker is the main loop, which creates a QuickTime
movie both on disk and in memory. MovieMaker demonstrates how to:
Create a movie file using StandardPutFile() and CreateMovieFile().
Create a movie track.
Create a media associated with the movie track.
Prepare a movie graphics environment.
Allocate a frame buffer using information supplied by the ICM.
Create individual movie frames, compress them, and then add each one to a
media.
Instead of asking the user to select an existing movie file from disk,
MovieMaker prompts the user to specify where on disk he/she would like to
store the movie to be created. Once specified (using StandardPutFile()), an
empty movie file is created.
This empty file is then given a single track and an associated media. At this
point the media references no data; the movie file can be thought of as a
blank VHS tape ready for exposure. However, adding frames to a QuickTime movie
is a little more complicated in the preparation stage when compared to ease of
popping tape into your familiar camcorder: Each frame must be created,
compressed, and added to the movie media one at a time.
Two buffers are allocated for the movie frames. One, known as the Graphics
World (or GWorld) is used to store individual images as they are created with
standard QuickDraw routines. The other, a "frame buffer," is used to store the
compressed version of an image after it is drawn into the GWorld.
Each QuickDraw-generated frame is drawn into the movie GWorld, compressed, and
added to the frame buffer, and finally added to the movie media. These steps
are repeated until the last frame is generated, at which point the movie
preview is created. The movie file is then closed, allocated storage is
released, and the MovieMaker application is terminated.


Conclusion


QuickTime is a scaled technology--it is as powerful as the machine it runs on.
As computers become faster, movie playback speeds will improve,
decompression/compression times will decrease, and movies will seem even more
realistic. Add-on boards are already available that allow a Macintosh to
display QuickTime movies at 30 frames per second, the NTSC broadcast standard
for video.
QuickTime revolutionizes the way people manipulate complex data on their
computers, bringing Cut, Copy, and Paste to dynamic data.


Multimedia Human-Interface Guidelines


Human-interface guidelines provide developers a common set of rules for
developing multimedia software. The end result is an application that is more
intuitive to use, easier to learn, and more quickly mastered than an
application developed without following the Human-interface guidelines. Apple
is developing a set of Human-interface guidelines specifically for QuickTime,
but which have applicability for all multimedia platforms. Among the QuickTime
Human-interface tenets are:
The user should be able to look at a screen and discern which objects are
movies. Movies not in play should look like a PICT image, with the addition of
a controller or a "badge." A badge is a small graphic superimposed on a
stopped movie. It is used to identify a movie when the controller is not
visible, and should disappear when the movie is in play.
When first opened, a movie should display its first frame or poster. In this
state, the movie is not playing. The user is responsible for beginning movie
play.
When played, the movie will start from the current frame. For a newly opened
movie, this should be the first frame or poster; for a paused movie, it will
be the frame at which the movie was paused.
Movies should be cut, copied, pasted, and resized in the same way as PICT
files.
Resizable movies should maintain their original aspect ratios by default.
The controls for playing a movie should be easily accessible to the user. A
mechanism for play and stop should always be available and intuitive to use.
In the absence of a volume-control mechanism, there must at least be a mute
control. A volume-control mechanism is preferred to an off/on mute control,
however.
Generally, single-clicking a movie will select that movie, not play it. A
selected movie may be cut or copied (depending on the application). Resizing,
hiding controls, getting information about the movie, or various other
operations may be applied to a selected movie, depending on the application.

If double-clicking plays a movie, a second click or double-click must suspend
play.
If single-clicking does not select a movie, it may play the movie as long as a
second single-click suspends play.
When printed, a movie will print its currently displayed frame. A movie will
print with controller or badge visible, distinguishing a printed movie from a
printed graphic.
--A.W.



_PROGRAMMING QUICKTIME_
by Aaron Walsh


[LISTING ONE]

/*****************************************************************************
* MoviePlayer Application -- This QuickTime program demonstrates how to open a
* movie file using a file preview dialog, add a movie controller to the
* movie, and play a movie in a window. Author: Aaron E. Walsh
* Developed using Think C 5.0, & QuickTime headers
******************************************************************************/

#include <Movies.h>
#include <QuickTimeComponents.h>
#include <GestaltEqu.h>
#include <Quickdraw.h>
#include <StandardFile.h>
#include <OSEvents.h>

/***** Global Variables ******/
Boolean gSys7Preview; /* is System 7 Preview routine available*/

Movie theMovie; /* info about the movie returned by OpenMovie*/
Rect dispBounds;
MovieController myMovieController; /* controller component for movie*/

WindowPtr movieWindow; /* window to play the movie in */
OSErr error;

/* Variables used in opening a movie file: */
FSSpec mySpec; /* File System record for System 6 */
short resRefNum; /* Resource reference # of selected file */

SFTypeList ypes = {'MooV'}; /* show files of type 'Moov' */
short numtypes = 1;

StandardFileReply fReply; /* Standard File Reply (Sys. 7/QuickTime) */
SFReply oldfReply; /* old style (Sys.6) Standard File Reply */

EventRecord *theEvent;

/****** Prototypes ******/
Boolean QuickTimeCapable(void); /* is system QT capable? */
StandardFileReply GetMovie(void); /* user select movie file to play */
void PlayMovie(void); /* play the movie */
void MakeMovieController(void); /* find controller . */
void ShowMovieController(void); /* attach controller */

/*****************************************************************************
* main() -- Initialize standard Macintosh toolbox managers, check if QuickTime
* is available, execute small loop prompting user for movies to play. Exit

* when user selects "Cancel" from preview dialog.
*****************************************************************************/
main()
{
 /* Initialize Toolbox Managers and data structures: */
 MaxApplZone();
 InitGraf(&qd.thePort);
 FlushEvents(everyEvent, 0);
 InitWindows();
 InitCursor();
if (QuickTimeCapable()) {
 do {
 fReply = GetMovie(); /* prompt user for a movie file to play */
 if (fReply.sfGood)
 PlayMovie(); /* play the selected movie. */
 } while (fReply.sfGood);
 ExitMovies(); }
 else
 ; /* QuickTime is not available. Normally you would
 /* put up an error message for user. */
}

/****************************************************************************
* GetMovie() -- Allow user to select a movie file to play (file of type
* 'Moov') using StandardGetFilePreview routine.
*****************************************************************************/
StandardFileReply GetMovie()
{
 Point where; /* for System 6 preview */
if (gSys7Preview)
StandardGetFilePreview(0, numtypes, types, &fReply); /* Sys 7 preview dialog
*/

else { /* using Sys 6 */
 where.h = where.v = -2; /* center dialog on screen w/"best" display */
 SFGetFilePreview(where, 0l, 0l, numtypes, types, 0l, &oldfReply);
 fReply.sfGood = oldfReply.good;
 if (fReply.sfGood) /* convert the reply record into an FSSpec: */
 FSMakeFSSpec(oldfReply.vRefNum, 0, oldfReply.fName, &mySpec);
 }
return (fReply);
}

/****************************************************************************
* MakeMovieController() -- Uses the Component Manager to locate the default
* movie controller, where it is then displayed at bottom of movie window.
*****************************************************************************/
void MakeMovieController()
{
 Component standardMovieController;
 ComponentDescription controllerDescription;
 ComponentResult theErr;
 Point thePoint;
 Rect controllerBox;
 /* Fill in component descriptor fields. This info is used by the
 * Component Manager to locate a corresponding component. We are
 * looking for the standard movie controller component: */
 controllerDescription.componentType = 'play';
 controllerDescription.componentSubType = 0;
 controllerDescription.componentManufacturer = 0;

 controllerDescription.componentFlags = 0;
 controllerDescription.componentFlagsMask = 0;

standardMovieController = FindNextComponent( (Component) 0,
 &controllerDescription);
 /* Get the controller */
 myMovieController = OpenComponent(standardMovieController);

 if(myMovieController == 0l)
 return; /* return to caller if this is the case */
 /* Place controller in the movie window */
 thePoint.h = movieWindow->portRect.left;
 thePoint.v = movieWindow->portRect.top;

theErr = MCNewAttachedController(myMovieController,theMovie, movieWindow,
 thePoint);
 if (theErr != 0)
 return;
 ShowMovieController();
}

/*****************************************************************************
* ShowMovieController() -- Adjusts size of movie window so movie and movie
* controller are viewable
*****************************************************************************/
void ShowMovieController()
{
 Rect movieBox, controllerBox;
/* Adjust size of movie window to accomodate both movie and movie controller
*/
MCGetControllerBoundsRect(myMovieController,&controllerBox);
/* Adjust movieBox to accomodate controller: */
UnionRect(&movieBox,&controllerBox,&movieBox);
/* Resize movie window: */
SizeWindow( movieWindow,movieBox.right,movieBox.bottom,true);
}

/****************************************************************************
* PlayMovie() -- Opens the appropriate movie file (file of type 'MooV'),
* create a window large enough to fit the movie, and play the movie.
*****************************************************************************/
void PlayMovie()
{
 FSSpec movieFSSpec;
 /* First open the movie file */
 if (gSys7Preview)
 movieFSSpec = fReply.sfFile;
 else
 movieFSSpec = mySpec;
 if ((error = OpenMovieFile(&movieFSSpec, &resRefNum, 0)) != noErr)
 return; /* if error occured, exit PlayMovie() */
 if ((error = NewMovieFromFile( &theMovie,resRefNum, nil, nil,0, nil ))
 != noErr)
 return; /* if error occured, exit PlayMovie() */
 /* Find movie bounds and set top left to 0,0 so */
 /* the movie will be properly postioned in our window */
 GetMovieBox(theMovie, &dispBounds);
 OffsetRect(&dispBounds,-dispBounds.left,-dispBounds.top);
 SetMovieBox(theMovie, &dispBounds);


 OffsetRect(&dispBounds,50,50); /* window rect can't hit menu bar */
 movieWindow = NewCWindow(0L,&dispBounds,0l,true,0,(WindowPtr)-1L,
 false,0L); /* window for our movie*/
 SetPort(movieWindow);
 SetMovieGWorld(theMovie,nil,nil);
 MakeMovieController(); /* routine for creating standard controller */
 /* After setup, play the movie: */
 GoToBeginningOfMovie(theMovie); /* rewind movie to beginning */
 PrerollMovie(theMovie,0,0); /* preload portions of movie */
 SetMovieActive(theMovie,true); /* set movie to active for servicing */

 /* Use controller to play movie until it is finished. Events are
 passed to MCIsPlayerEvent which handles controller events: */
 while ( !IsMovieDone(theMovie)) {
 GetNextEvent(everyEvent, theEvent);
 MCIsPlayerEvent(myMovieController, theEvent);
 }
 /* dispose of storage, and return */
 DisposeMovie(theMovie); /* movie */
 CloseMovieFile(resRefNum); /* reference to movie file */
 CloseComponent(myMovieController); /* movie controller */
 DisposeWindow(movieWindow);
}

/****************************************************************************
* QuickTimeCapable() -- Uses Gestalt Manager to check if QuickTime is
* available at runtime. If not, return error.
*****************************************************************************/
Boolean QuickTimeCapable()
{
 long response;
 /* Test if QuickTime is available: */
 error = Gestalt(gestaltQuickTime, &response);
 if (error != 0) /* error=0 if OK, else error has occured */
 return false; /* if error, return */

/* if no error finding QuickTime, check for ICM so we can use Stand.Preview */
 error = Gestalt(gestaltCompressionMgr, &response);
 if (error != 0)
 return false; /* Can't use Stand.Preview routines */
 error = Gestalt(gestaltStandardFileAttr, &response);
 if (error != 0) /* if not available, we're playing under System 6 */
 gSys7Preview = false;
 else
 gSys7Preview = true; /* use System 7 standard preview */
 error = EnterMovies(); /* Initialize Movie Toolbox & return result */
 if (error != 0)
 return false; /* error initalizing QuickTime */
 else
 return true; /* QuickTimes available, ready to play movies*/
}






[LISTING TWO]


/*****************************************************************************
* MovieMaker Application -- This QuickTime program demonstrates how to create
* a QuickTime movie with associated track and media. The Movie Toolbox,
* Component Manager, and Image Compression Manager (ICM) are demonstrated.
* Author: Aaron E. Walsh -- Developed using Think C 5.0, & QuickTime headers
*****************************************************************************/

#include <Movies.h>
#include <QuickTimeComponents.h>
#include <ImageCompression.h>
#include <GestaltEqu.h>
#include <Quickdraw.h>

/****** defines ******/
#define kFrameX 150 /* x-coord/width*/
#define kFrameY 125 /* y-coord/height */
#define kPixelDepth 32 /* depth for GWorld */
#define kFrameTotal 30 /* total frames in movie */
#define kTimeScale 15 /* desired frames per second */
#define kFrameRate (Fixed) 1<<16 /* fixed point 1.00 = our frame rate */

/****** Types and globals ******/
/* general: */
OSErr error;
Rect frmRect;
/* movie file: */
Movie gMovie; /* our movie, */
Track gTrack; /* track, */
Media gMedia; /* and media */
short resRefNum;
StandardFileReply fReply;
FSSpec movieFSSpec; /* FFSpec reference to movie file */
/* image data: */
char **frameDatabitsH; /* buffer for compressed frames */
ImageDescription **imageDescriptionH;/* image info used by compressor */
PixMap *pixMap,**pixMapH; /* offscreen pixmaps */

/* graphics world: */
GWorldPtr movieGWorld,oldGWorld; /* offscreen grapics worlds */
GDHandle oldGDevice;
/* compressor:*/
long compressedFrameSize; /* size of compressed frame */
CodecType codecType; /* desired codec */
CompressorComponent codecID; /* variation of codecType */
short colorDepth; /* depth to compress image to*/
CodecQ imageQuality; /* desired compression quality*/

/* media sample: */
TimeValue sampTime; /* generated when adding sample to media */

/****** Prototypes ******/
void BuildMovie(void); /* main routine to assemble a movie */
void MakeMovieFile(void); /* create movie file and movie itself */
void MakeMovieGWorld(void);/* allocate offscreen graphics environ */
void AllocateMovieBuffer(void); /* allocate storage for frames*/
void MakeMovieFrames(void); /* loop to create all movie frames */
void AddMovieFrame(void); /* compress & add single frame to media*/
void CleanUp(void); /* free allocated storage, make preview */
void main(void);


/*****************************************************************************
* main() -- Initialize standard Macintosh toolbox managers, check if QuickTime
 capable and call BuildMovie() to create our movie.
******************************************************************************/
void main(void)
{
 /* Initialize Toolbox Managers and data structures: */
 MaxApplZone();
 InitGraf(&qd.thePort);
 FlushEvents(everyEvent, 0);
 InitWindows();
 InitCursor();
 if (QuickTimeCapable()) {
 BuildMovie(); /* create the movie */
 }
}

/*****************************************************************************
* BuildMovie() -- Sets up display window for movie frames, and calls
* appropriate routines for creating movie.
*****************************************************************************/
void BuildMovie(void)
{
 WindowPtr displayWind;
 Rect windRect;
 windRect.left = windRect.top = 0;
 windRect.right = kFrameX;
 windRect.bottom = kFrameY;
 OffsetRect(&windRect,150,50);
displayWind = NewCWindow(0,&windRect,(StringPtr)"\pMovie Window",true,0,
 (WindowPtr)-1,true,0);
 SetPort(displayWind);
 ClearMoviesStickyError(); /* clear any old movie errors */
 while (!GetMoviesStickyError()) {
 MakeMovieFile(); /*create actual movie file on disk */
 MakeMovieGWorld(); /*set up graphics devices for images */
 AllocateMovieBuffer(); /*allocate buffer space */
 MakeMovieFrames(); /*create, compress & add frames */
 CleanUp(); /* create preview & release storage */
 }
 CloseWindow(displayWind); /* close display window */
}

/*****************************************************************************
* MakeMovieFile() -- Create a new movie file (gMovie) with associated track
* (gTrack) and media (gMedia).
*****************************************************************************/
void MakeMovieFile(void) {
/* StandardPutFile prompt; create movie file */
StandardPutFile((StringPtr) "\pCreate Movie File:",
 (StringPtr)"\pNew Movie",&fReply);
 if (!fReply.sfGood)
 return;
 movieFSSpec = fReply.sfFile; /* reference to our movie file*/
error = CreateMovieFile( &movieFSSpec,'MPLA',0,createMovieFileDeleteCurFile,
 &resRefNum,&gMovie);
 if (error) ExitToShell();
/* Create track and media */

 gTrack = NewMovieTrack(gMovie,(long)kFrameX<<16,(long)kFrameY<<16,0);
 error = GetMoviesError();
 if (error) ExitToShell();
gMedia = NewTrackMedia(gTrack, VideoMediaType, kTimeScale, nil,(OSType) nil);
 error = GetMoviesError();
 if (error) ExitToShell();
 error = BeginMediaEdits( gMedia ); /* needed to add samples to media */
 if (error) ExitToShell();
}

/*****************************************************************************
* MakeMovieGWorld() -- Make a GWorld (offscreen graphics world) for movie.
*****************************************************************************/
void MakeMovieGWorld(void) {
 GetGWorld(&oldGWorld,&oldGDevice);/* save old graphics world/device*/
 frmRect.left = frmRect.top = 0; /* setup size of frame*/
 frmRect.right = (short)(kFrameX);
 frmRect.bottom = (short)(kFrameY);
/*create movieGWorld: */
 error = NewGWorld(&movieGWorld,kPixelDepth,&frmRect,nil,nil,0);
 if (error) ExitToShell();
 /* get handle to pixMap of movieGWorld: */
 pixMapH = GetGWorldPixMap(movieGWorld);
/* lock offscreen pixMap in memory:*/
 LockPixels(pixMapH);
/* lock handle to prevent dangling reference:*/
 HLock((Handle)pixMapH);
 pixMap = *pixMapH; /* make pointer (pixMap) to pixel-map*/
}

/*****************************************************************************
* AllocateMovieBuffer() -- Allocate frame buffer according to our requested
* compression level.
*****************************************************************************/
void AllocateMovieBuffer(void) {
long maxCompressedFrameSize; /* Max size of a compressed frame*/

/* compressor info: */
 codecID = anyCodec;
 codecType = (CodecType) 'rpza'; /* use video compression */
 colorDepth = 1; /* compress to 1 bit depth */
 imageQuality = codecNormalQuality;/* quality range is 0x100 to 0x300 */
 imageDescriptionH = (ImageDescription **)NewHandle( 4 );
/* find needed buffer size: */
 error = GetMaxCompressionSize(&pixMap,&frmRect,colorDepth,imageQuality,
 codecType,codecID,&maxCompressedFrameSize);
 if (error) ExitToShell();
/* Allocate frame buffer */
 frameDatabitsH = NewHandle(maxCompressedFrameSize);
 if (!frameDatabitsH) ExitToShell();
 HLock(frameDatabitsH); /* lock handle to buffer */
}

/*****************************************************************************
* MakeMovieFrames() -- Create a unique series of movie frames using simple
* QuickDraw calls. Compress and add each frame to movie (gMovie). Stop when
* max # of frames is reached
****************************************************/
void MakeMovieFrames(void)

{
 long i; /* loop control */
 Rect r2, r3; /* rects used in creating graphics image */

 for(i = 0; i<kFrameTotal; i++) /* loop until max# frames is created */
 {
 if(error!= noErr)
 ExitToShell();

/* Draw a single frame. Uses QuickDraw calls to create graphics image */
 SetGWorld(movieGWorld,nil);
 EraseRect(&frmRect); /* erase whole area to white */
 r2 = frmRect;
 r2.bottom = (short)((long)r2.bottom * i / (kFrameTotal-1));
 InvertRect(&r2);
 r2 = frmRect;
 FillOval(&r2, black);
 InsetRect (&r2,i,i);
 FillOval(&r2, white);
 InsetRect (&r2,i+2,i+2);
 InvertOval(&r2);
 SetRect(&r3,(frmRect.right - frmRect.left) / 2,(frmRect.bottom -
 frmRect.top) / 2,
 (frmRect.right - frmRect.left) / 2,(frmRect.bottom -
 frmRect.top) / 2 );
 InsetRect(&r3,-(i*2),-(i*2));
 FillOval(&r3,white);
/* draw frame into the old Gworld so creation process can be viewed.
* done for visual feedback */
 SetGWorld(oldGWorld,oldGDevice);
 CopyBits((BitMap *) pixMap,(BitMap *) *(PixMapHandle)
 (qd.thePort->portBits.baseAddr),&frmRect,&frmRect,0,0);
/* compress and add current frame to movie: */
 AddMovieFrame();
 }
}

/*****************************************************************************
* AddMovieFrame() -- Compress current frame then add it to our movies media.
* This is done for each frame.
*****************************************************************************/
void AddMovieFrame(void) {
/* compress frame: */
error = CompressImage(pixMapH,&frmRect, imageQuality,,codecType
 imageDescriptionH, StripAddress(*frameDatabitsH) );
 compressedFrameSize = (**imageDescriptionH).dataSize;
 if (error) ExitToShell();
/* add single frame to media:*/
error = AddMediaSample(gMedia, frameDatabitsH, 0L, compressedFrameSize,
 (TimeValue)1, (SampleDescriptionHandle) imageDescriptionH, 1L, 0, &sampTime);
 if (error) ExitToShell();
}

/*****************************************************************************
* QuickTimeCapable() -- Uses the Gestalt Manager to check if QuickTime is
* available at runtime. If not, return error.
*****************************************************************************/
Boolean QuickTimeCapable()
{

 long response;
 /* Test if QuickTime is available: */
 error = Gestalt(gestaltQuickTime, &response);
 if (error != 0) /* error=0 if OK, else an error has occured */
 return false; /* if error, not QuickTime capable */
/* if no error finding QuickTime, check for the ICM */
 error = Gestalt(gestaltCompressionMgr, &response);
 if (error != 0)
 return false;
 error = EnterMovies();/* Initialize Movie Toolbox */
 if (error != 0)
 return false; /* error initalizing QuickTime */
 else
 return true; /* QuickTime capable; ready to play movie */
}

/*****************************************************************************
* CleanUp() -- Create a new movie file (gMovie) with associated track (gTrack)
* and media (gMedia). Return error code if unable to complete process.
*****************************************************************************/
void CleanUp()
{
 short resourceId = 1;
 error = EndMediaEdits( gMedia ); /* finished adding samples */
 if (error) ExitToShell();
error =
InsertMediaIntoTrack(gTrack,0L,0L,GetMediaDuration(gMedia),kFrameRate);
 if (error) ExitToShell();
error = AddMovieResource( gMovie, resRefNum, &resourceId, movieFSSpec.name );
 if (error) ExitToShell();
error = MakeFilePreview(resRefNum, (ProgressProcRecordPtr) -1);
 error = CloseMovieFile( resRefNum );
 if (error) ExitToShell();
 DisposeMovie(gMovie); /* We don't need the movie anymore */
 DisposHandle(frameDatabitsH); /* dispose frame buffer memory */
 DisposHandle((Handle)imageDescriptionH); /* and other storage: */
 DisposeGWorld(movieGWorld);
 ExitMovies();
}
























July, 1992
 GRAPHICS IMPORT FILTERS FOR WINDOWS APPLICATIONS


Programming the Aldus interface


 This article contains the following executables: TWAIN.ZIP TWAIN.MAC
VIEWER.ARC


Evangelo Prodromou


Evangelo Prodromou is a programmer and writer from San Francisco, CA and has
been programming Windows applications for two years. He works for Access
Softek, a graphics filter vendor, located at 2550 9th Street #206, Berkeley,
CA 94710. He can also be reached via CompuServe at 70661,3174.


Graphics support has always been the hallmark of leading-edge software--and
usually the most difficult part to implement. With Windows 3, an integral part
of graphical software development is the ability to import and manipulate a
graphics file created in another application. Table 1 shows the many (some
say, too many) graphics file formats that a user may want to import into a
document, database, or other application, each format with its own method of
storing graphics file information. Some formats, such as the CGM standard,
encompass several sub-formats created by different drawing programs.
Table 1: Graphics file formats.

 Name Ext Source Comment
 -------------------------------------------------------------------------

 Adobe Illustrator file .AI Adobe Illustrator
 OS/2 Bitmap .BMP OS/2 OS/2 standard
 Windows Bitmap .BMP Microsoft Windows Windows standard
 Corel Draw! file .CDR Corel Draw!
 Computer Graphics Metafile .CGM ANSI/ISO standard
 (CGM)
 Freelance file .DRW Lotus Freelance
 Micrografx Drawing file .DRW Micrografx Designer
 AutoCAD Drawing Exchange .DXF Autodesk's AutoCAD
 Encapsulated PostScript .EPS Adobe Standard for
 PostScript
 printers
 Graphic Environment .GEM Digital Research's
 Manager (GEM) Metafile GEM
 General Image File (GIF) .GIF CompuServe Highly compressed
 OS/2 Metafile .MET OS/2 OS/2 standard
 QuickDraw Picture file
 (PICT) .PCT Macintosh QuickDraw
 PC Paintbrush .PCX ZSoft PC Paintbrush
 Lotus 1-2-3 Graphics file .PIC Lotus 1-2-3
 HP Graphics Language
 (HPGL) .PLT Hewlett-Packard Developed for HP
 plotters
 Tagged Image File Format
 (TIFF) .TIF Microsoft/Aldus
 Windows Metafile .WMF Windows Windows standard
 WordPerfect Graphics file .WPG DrawPerfect

It's virtually impossible for an individual developer to account for all
possible formats in an application. Just finding a format's specifications is
difficult and time-consuming enough. After that, writing and testing the code
to parse and display a particular file takes months of development time.
Alternatively, many Windows developers simply license file-import filters from
developers who specialize in the nitty-gritty mechanics of file conversion.
These filters can be used as software components to support a broad range of
file formats without a corresponding increase in development time.
In this article, I describe an easy way to include graphics support in Windows
applications, and then provide a sample application that quickly displays
graphics files.


The Windows Environment



Windows has two features that make modular graphics support feasible:
metafiles and Dynamic Link Libraries (DLLs).
The Windows metafile has two standard graphics formats: the metafile and the
bitmap. The libraries included in the Windows Software Development Kit (SDK)
contain functions to manipulate and store both formats. Graphics in these
forms are consequently easy to incorporate into any Windows program.
Although each format has its advantages, the metafile is easier to display.
Unlike a bitmap, which stores a bit-by-bit copy of the image, the metafile
stores only the commands n cessary to create the image. This limits the device
dependence of graphics display. The Windows Graphic Device Interface (GDI)
assumes the responsibility for display options. The GDI will execute the
graphics commands stored in a metafile only to the extent of the graphics
device's capability. This makes the metafile a powerful tool for graphics
computing.
Windows DLL files can contain code not included in the main executable file.
If an application needs the code in the DLL, the file can be loaded into
memory, used, and then removed. DLLs can be used as program "modules" that
work with the central executable to implement all the application's features.
Fusing the concepts of the DLL and the metafile produces a powerful graphics
tool--the graphics import filter. Filters are DLLs that can translate a
foreign-format graphics file into a Windows metafile. Using the metafile
ensures that the resulting image is independent of the capabilities of the
display device. Using DLLs, on the other hand, means that import functionality
can be developed independently of the main executable, and that it can be
loaded by the application only when necessary.


The Aldus Interface


Aldus Corp. recognized the importance of graphics import filters when the
company ported PageMaker to Windows. For PageMaker 3.0 for Windows, Aldus
required each filter (DLLs licensed from third-parties) to have a uniform
functional interface to process import requests from the main application. In
other words, Aldus built an import "slot" into its application and required
that all filter modules fit perfectly into that slot. This approach eased
Aldus's graphics-support development load and made it possible for different
Aldus products with the same interface to use the same filters.
The Aldus interface is now a de facto standard, having been adopted by many
Windows developers, including Microsoft in Word for Windows and PowerPoint,
Lotus in the Ami Pro 2.0 word processor, and Asymetrix in Tool-Book.
Fortunately, the Aldus interface is easy to program. Minimal changes in an
application's source code can take full advantage of the graphics power behind
import filters.
As an example, I've included a graphics file viewer I designed for use with
the Windows File Manager. By associating a graphics file extension with this
viewer application, VIEWER.EXE, I can quickly look at a particular file just
by double-clicking on its filename. (The complete system is available
electronically.)


Data Types


The Aldus standard includes two data types not typically found in Windows
applications. These are defined in my application's header file VIEWER.H (see
Listing One , page 108), which also has several defined constants and function
declarations.
A FILESPEC structure represents the graphics file that the application wants
the filter to translate. The structure contains the file's full pathname as
well as other important file information: whether it's open for writing, where
the current file pointer is, what its DOS file handle is, and so on.
Applications need to fill in these fields if they operate on the file before
passing it to the filter. (I used the default values in my code.)
The PICTINFO structure describes the resulting GDI metafile returned from a
filter after translation. It includes a memory handle to the metafile and a
RECT structure that describes the tightest-fitting rectangle this file can fit
into. This makes it easier for an application to place or update an imported
picture, because the filter that actually created the metafile returns its
optimal size and shape.


Functionality


VIEWER.C (Listing Two, page 108) is the C source file for the viewer
application. Its heart is the function ImportFile(), which handles the actual
translation of the file to a Windows metafile.
ImportFile() uses the "anonymous call" method to access the functionality of a
given filter. To do this, it uses the standard Windows LoadLibrary() function
to load the filter DLL into memory. LoadLibrary() just needs the full pathname
of the filter file (szFilter in my application) to load it. It returns a
memory handle hLibrary to the library.
Then ImportFile() calls a second Windows function, GetProcAddress(), to return
a pointer to a named procedure in that library. GetProcAddress() needs a
handle to the loaded library and the name of the function. You can see that
I've included literal strings as the names of the functions; all
Aldus-standard filters must have exactly these names for their functions, so I
can use these literal strings with impunity.
This is why a stock set of interface functions is so important. These calls to
GetProcAddress() are the "slot" into which all files must fit.
ImportFile() first tries to locate the function GetFilterVersion() within the
DLL. Aldus has defined a basic interface, version 1.0, and an enhanced, more
powerful interface, version 2.0. (Version 2.0 requires the filter to use
Aldus's proprietary DLLs for memory management. Consequently, Aldus is the
only developer to implement this interface, as in PageMaker 4.0.) The purpose
of GetFilterVersion(), therefore, is to tell the viewer which set of functions
to use. At this writing, only version 2.0 filters contain GetFilterVersion().
If the filter doesn't have this function, GetProcAddress() will return a NULL
pointer. This means that the filter uses the basic version 1.0 interface, and
ImportFile() needs to access those simpler functions.
Version 1.0 filters have three functions: GetFilterInfo(), GetFilterPref(),
and ImportGR(). GetFilterInfo (nPageMakerVersion, lpIni, lphPrefMem,
lphFileTypes) initializes the filter. The argument nPageMakerVersion is an
artifact from the time when only PageMaker used Aldus-standard filters;
because it's no longer necessary, I set the argument to 0 in my code.
lpIni is a string of information stored in the WIN.INI file about the filter,
and lphFileTypes defines the file types the filter is expected to support.
Some applications may want to dynamically create a table of information on all
available filters, but my viewer is a one-shot importer, so I set these two
arguments to NULL also.
If the filter needs to get specific information from the user (for example,
whether to use color or grayscale in the graphics image), GetFilterInfo()
dynamically allocates memory in which to store the user's preferences. It then
copies a handle to that memory to lphPrefMem.
GetFilterPref(hInst, hWnd, hPrefMem, wFlags) displays a dialog box in the
window hWnd to get the user's import preferences. It stores the options in the
memory in hPrefMem--the format varies from filter to filter--to be used during
the import process.
More Details.
To actually translate the file, the application calls hPrintIC,
ImportGR(hPrintIC, lpFileSpec, lpPictInfo, hPrefMem). The application passes
the filter an information context hPrintIC (which describes the capabilities
of an output device, such as a printer, a plotter, or a video display) for the
supported printer, to determine supported fonts and other system-specific
parameters. It also passes the memory block hPrefMem and a FILESPEC structure
pointed to by lpFileSpec representing the file to import. ImportGR() will
translate the file and fill the PICTINFO structure pointed to by lpPictInfo
with the important information about the resultant metafile.
Filters supporting the version 2.0 interface have two additional functions:
GetFilterVersion() (described earlier) and IsThisMyFile(). The 2.0 interface
standard also replaces ImportGR() with the more versatile OutputGR().
IsThisMyFile(lpFileSpec) determines the format of the file defined by the
FILESPEC structure in lpFileSpec. This function makes it easier for the
application to dynamically match files to filters that can support them.
After checking the file, ImportFile() calls GetFilterPref(), as it would for
the version 1.0 function set. In this case, however, GetFilterPref() allocates
its own preference memory. Just as in the version 1.0 interface, this function
will display a dialog box to retrieve the user's import preferences.
Finally, ImportFile() calls OutputGR (hOutputDC, hPrintIC, lpFileSpec,
lpMetafileName, lpPictInfo, hPrefMem, lpInfoProc, bComplete) to translate the
file lpFileSpec to a GDI metafile in lpPictInfo. Much more powerful than the
version 1.0 ImportGR(), OutputGR() can also display or print the metafile to
the device context (which defines a particular output device--a printer, a
plotter, or a window on the video display. Windows prohibits an application
from writing to a device directly; instead, it can send commands to the device
context, which the GDI processes and implements on the physical device).
hOutputDC. Alternately, it can save the metafile to disk as a file named
lpMetafileName. For simplicity, I chose not to use either of these options;
viewer.exe just requests a metafile and takes responsibility for displaying
it.
If OutputGR() is passed a valid pointer to an information procedure
lpInfoProc, it will call that procedure for more information about the current
color palette, available printers, and so on. This usually isn't necessary
unless the calling application is going to place the returned image into an
existing graphics file. I pass a NULL pointer to tell OutputGR() just to use
its default settings.
Once the graphics file has been translated by either ImportGR() or OutputGR(),
ImportFile() frees the memory taken up by the filter DLL hfilter and the
user-preference options, hPrefMem. The metafile is now ready to be displayed.


Implementing File Import


VIEWER.C includes a number of functions necessary to implement the
application. WinMain(), the entrance point, creates the main window and
handles the message loop; MainWndProc() handles the messages for the main
window; and AboutDlgProc() handles the About dialog box. Although a full
discussion of these functions is beyond the scope of this article, I'd like to
explain how each of them supports ImportFile()'s file-import capability.
With the Windows File Manager, a user can "associate" file types by
extension--such as "PCX"--with a particular application; VIEWER.EXE, for
instance. If the user double-clicks on a filename with that extension, the
associated application is launched and the file's full pathname is passed to
it as the lpCmdLine argument in WinMain(). I copy that filename to the
filename field in the FILESPEC structure so it can be used later by
ImportFile().
WinMain() registers and creates the main window. When CreateWindow() is
called, MainWndPro() receives the message WM_CREATE. Here, I check to see if a
filename has been specified. Then, using the local function GetExtFilter(), I
fill the global variable szFilter with the correct filter's full filename.
Available filter DLLs must be listed in the WIN.INI file in the form
file-format=filter file, ext under the heading [GraphicViewer]. For example:
 [GraphicViewer] PCX files=C:\VIEWER\PCXFILT.FLT, PCX
Here, file-format is the file format supported filter file is the full path
and filename of the DLL to use for importing, and ext is the typical extension
of the file type this filter supports. (If you use commercially available
applications such as Word for Windows or PageMaker, you'll probably see
similar headings in your WIN.INI file.)
These entries should give you a good idea of where you can find existing
filter DLLs on your disk. You may have to do some detective work to discover
which ones work for which file formats.
GetExtFilter() checks each entry under [GraphicViewer] using the standard
Windows function GetProfileString(). It fills szFilter with the name of the
filter that matches the file's extension. I then call ImportFile() to
translate the file to a GDI metafile, which finishes the WM_CREATE processing
in MainWndProc(). Flow then returns to WinMain(), which calls ShowWindow() to
show the main window, then UpdateWindow(), which sends the WM_PAINT message to
MainWndProc().
When the main window gets the WM_PAINT message, it prepares its client area to
display the translated image. It then calls PlayMetafile(), another standard
Windows function, to display the image on the screen. (The client area is the
area of the window on which it can draw, that is, the whole window except
title bar, scroll bars, borders, and menu.)
After updating the main window, WinMain() creates a message loop to process
messages for the window. If the user maximizes or resizes the main window,
MainWndProc() will receive the WM_SIZE message. Clicking the mouse on the
"About Viewer... " menu item will cause MainWndProc() to show my copyright
information with the AboutDlgProc().

Besides VIEWER.C and VIEWER.H, other files necessary to construct this
application (the definition file to be used with Microsoft C; the resource
file containing the viewer's menu and the "About" dialog box; and make file
for the nmake utility in Microsoft C) are available electronically, as
described on page 3.
For a more complete explanation of the aforementioned functions, I recommend
the Windows SDK's Guide to Programming or Graphics Programming Under Windows
by Meyers and Doner (Sybex, 1987).


Conclusion


Import and export filters are the first widely used examples of a standardized
component approach to software development. A modular software system lets
users adapt applications to their own needs, however specialized. In the
future, both small and large developers will need to rely on modularity to
confront the wide range of possibilities PCs will provide.


Standards the Twain Shall Meet


The Aldus graphics-filter interface constitutes an important part in the
oncoming standardization of the graphics arena. However, the recently proposed
Twain API specification, devised to provide a uniform interface between
graphics-supporting software and image-capturing hardware, may ultimately play
a leading role as well.
Typically, when users want to include an image derived from hardware such as
hand-scanners, flat-bed scanners, slide scanners, or digital cameras, they
must use a dedicated software package to acquire the image and then save it to
a file on disk. To include the image in another document, the user must then
import the file using a graphics filter or other graphics-support software.
The Twain API aims to circumvent these extra stages. Software that supports
Twain can bring graphics images directly into a document, without the
intermediate step of creating an additional file. Hardware vendors, on the
other side, need only develop Twain-compatible drivers to make their
image-source hardware accessible to many leading graphics-mode software
packages.
Between the application and the source lies an intermediate step, the Twain
Source Manager, implemented as a DLL under Windows and a code resource for the
Macintosh. Applications that use the API can call on the Source Manager to
acquire the images using a user-defined source. The process is similar to
choosing and setting up a printer under Windows or on the Mac. Image files are
returned to the application in device-independent bitmap (DIB) format for
Windows or Picture format for the Macintosh.
Twain was developed by a working group from five players in the image
processing and acquisition field: Aldus, Caere, Eastman Kodak, Hewlett
Packard, and Logitech. Hardware vendors such as Canon and Ricoh and software
developers including Lotus and Micrografx are also currently including support
of the Twain API in their products. Corel, Adobe, and Ventura have also
endorsed the interface.
Additional information on the Twain API is available in the HP Peripherals
Forum on CompuServe, where you'll find the Twain Developer's Disk 1.0 for
Windows (which uses Borland C++ or Microsoft C6 and includes Source Manager,
DC.H file, application "glue" code, sample application, and sample Twain
source). Likewise available is the Twain Developer's Disk 1.0 for Macintosh
(for System 6 or 7 using Think C5 and including the Source Manager, DC.h file,
application "glue" code, sample app, and sample Twain source). You can also
contact the Twain Working Group at 1-800-722-0379 for a technical paper
describing the Twain API (ask for document #9155) and the Twain Toolkit order
form (document #9154).
--E.P.



_GRAPHICS IMPORT FILTERS FOR WINDOWS APPLICATIONS_
by Evangelo Prodromou


[LISTING ONE]

/****************************************************************************

 FILE : Viewer.h

 PURPOSE: header file for graphic viewer

) 1992, Evangelo Prodromou. All rights reserved.
****************************************************************************/

/* Viewer Menu item definitions */

#define IDM_ABOUT 100
#define IDC_STATIC -1

/* general-use string size. */

#define STRINGSIZE 511

/* The following definitions are data types defined by the Aldus
** Interface. */

typedef DWORD FILETYPE;

typedef struct {
 unsigned slippery : 1; /* TRUE if file may disappear. */
 unsigned write : 1; /* TRUE if open for write. */
 unsigned unnamed : 1; /* TRUE if unnamed. */
 unsigned linked : 1; /* Linked to an FS FCB. */
 unsigned mark : 1; /* Generic mark bit. */
 FILETYPE fType; /* The file type. */

#define IBMFNSIZE 124
 short handle; /* MS-DOS open file handle. */
 char fullName[IBMFNSIZE]; /* Device, path, file names. */
 DWORD filePos; /* Our current file posn. */
} FILESPEC, FAR *LPFILESPEC;

typedef short DC;

typedef struct { /* --- PICTINFO for Windows --- */
 HANDLE hmf; /* Global memory handle to the metafile */
 RECT bbox; /* Tightly bounding rectangle in metafile units */
 DC inch; /* Length of an inch in metafile units */
} PICTINFO, FAR* LPPICTINFO;


/* The following types are pointers to functions with the same arguments
** as those found in an Aldus-standard function. They are necessary to
** make anonymous function calls. */

/* Version 1.0 filter functions */

typedef WORD (FAR PASCAL *PFN_INFO) (short, LPSTR, HANDLE FAR*, HANDLE FAR*);
typedef WORD (FAR PASCAL *PFN_IMPORT) (HDC, LPFILESPEC, LPPICTINFO, HANDLE);
typedef void (FAR PASCAL *PFN_PREF) (HANDLE, HWND, HANDLE, WORD);

/* Version 2.0 filter functions */

typedef WORD (FAR PASCAL *PFN_VER) (DWORD, BOOL FAR *, WORD FAR *, WORD FAR
*);
typedef WORD (FAR PASCAL *PFN_ISMY) (LPFILESPEC);
typedef WORD (FAR PASCAL *PFN_PREF2) (HANDLE, HANDLE, HANDLE FAR *, DWORD,
FARPROC, LPFILESPEC);
typedef WORD (FAR PASCAL *PFN_OUTPUT) (HDC, HDC, LPFILESPEC, LPSTR,
LPPICTINFO, HANDLE, FARPROC, BOOL);

/* The following are function declarations for functions local to
** this application. */

int PASCAL WinMain( HANDLE, HANDLE, LPSTR, int );
long FAR PASCAL MainWndProc( HWND, unsigned, WORD, LONG );
BOOL FAR PASCAL AboutDlgProc( HWND, unsigned, WORD, LONG );
BOOL NEAR ImportFile( HWND );
BOOL NEAR GetExtFilter( void );
HDC NEAR GetPrinterIC( void );






[LISTING TWO]

/****************************************************************************

 FILE : Viewer.c

 PURPOSE: Graphics file viewer

 FUNCTIONS:

 WinMain() - calls initialization function, processes message loop
 MainWndProc() - processes messages

 ImportFile() - Converts graphics file to Windows Metafile
 GetExtFilter() - Determines correct filter for chosen file
 GetPrinterIC() - Determines current printer
 AboutDlgProc() - processes messages for "About" dialog box

 COMMENTS:

) 1992, Evangelo Prodromou. All rights reserved.
****************************************************************************/

/* include the general Windows header and the header for this application. */

#include "windows.h"
#include "viewer.h"

/* These strings are used repeatedly. */

char szAppName[ ] = "GraphicViewer";
char szClassName[ ] = "ViewerWClass";
char szMenuName[ ] = "ViewerMenu";

char szString[ STRINGSIZE ]; /* General use string. */
char szFilter[ IBMFNSIZE ]; /* full path name of import filter. */

/* Info struct of the imported metafile. */

PICTINFO PictInfo = { NULL,0,0,0,0,0 };

/* Spec struct for file to import. */

FILESPEC FileSpec = { 0,0,0,0,0,0L,NULL };

HANDLE hInst; /* Handle for this instance. */

/***************************************************************************

FUNCTION: WinMain
PURPOSE: Entrance point of application

ARGS: hInstance : handle to this instance.
 hPrevInstance : handle to previous instance of the application.
 lpCmdLine : command line string
 nCmdShow : show full or iconic? Passed to ShowWindow()

COMMENTS: If no previous instance, registers the window class.
 Saves this instance handle as a global variable.
 Parses command line for a file name to view.
 Creates window for this instance.
 Shows the window.
 Takes and translates messages from the message loop and passes
 them on to MainWndProc().

***************************************************************************/
int PASCAL WinMain(HANDLE hInstance, HANDLE hPrevInstance,
 LPSTR lpCmdLine, int nCmdShow)
{
 MSG msg; /* Windows message structure */
 WNDCLASS wc; /* Window class structure. */
 HWND hMain; /* handle to main window. */


/* If there is no previous instance of the application, fill in the
** window class structure and register the class. */

 if (!hPrevInstance)
 {
 wc.style = NULL;
 wc.lpfnWndProc = MainWndProc;
 wc.cbClsExtra = 0;
 wc.cbWndExtra = 0;
 wc.hInstance = hInstance;
 wc.hIcon = LoadIcon(NULL, IDI_APPLICATION);
 wc.hCursor = LoadCursor(NULL, IDC_ARROW);
 wc.hbrBackground = GetStockObject(WHITE_BRUSH);
 wc.lpszMenuName = (LPSTR) szMenuName;
 wc.lpszClassName = (LPSTR) szClassName;

 if (!RegisterClass(&wc))
 return FALSE;
 }

/* Save the instance handle as a global variable. */

 hInst = hInstance;

/* Copy the command line argument to the filename field of FileSpec. */

 lstrcpy( FileSpec.fullName, lpCmdLine );

/* Create the main window (will send WM_CREATE to the MainWndProc). */

 hMain = CreateWindow( szClassName, szAppName,
 WS_OVERLAPPEDWINDOW,
 CW_USEDEFAULT, CW_USEDEFAULT,
 CW_USEDEFAULT, CW_USEDEFAULT,
 NULL, NULL, hInstance, NULL );

 if (!hMain)
 return (FALSE);

/* Show the window and update it ( sends WM_PAINT to MainWndProc ). */

 ShowWindow(hMain, nCmdShow);
 UpdateWindow(hMain);

/* Get window messages for this instance and send them to the MainWndProc.
** Loops until WM_QUIT message is received. */

 while ( GetMessage( &msg,NULL,NULL,NULL ) )
 {
 TranslateMessage( &msg );
 DispatchMessage( &msg );
 }
 return ( msg.wParam );
}

/***************************************************************************

FUNCTION: MainWndProc

PURPOSE: Processes messages for main window

ARGS: hWnd : handle to main window
 message : windows message
 wParam : extra message info
 lParam : extra message info

MESSAGES: WM_CREATE : If a file was specified on the command line, gets
 the correct filter by extension, and tries to
 import it. If no file specified, or if no matching
 filter is found, asks for a new file.

 WM_PAINT : If a file has been imported, displays the
 resulting metafile. Otherwise, passes the message
 to default.

 WM_SIZE : Invalidate whole client area, and repaint.

 WM_DESTROY : Terminates program.

 WM_COMMAND : IDM_ABOUT : Displays About dialog box.

***************************************************************************/

long FAR PASCAL MainWndProc(HWND hWnd, unsigned message,
 WORD wParam, LONG lParam)
{
 FARPROC lpProc;
 BOOL bError;
 RECT rc;

 switch (message)
 {
 case WM_CREATE:

 /* Assume an error to begin with. */

 bError = TRUE;

 if ( FileSpec.fullName[0] == '\0' ) /* No file name on command line */
 {
 lstrcpy( (LPSTR) szString, "No file name specified." );
 }
 else if ( !GetExtFilter( ) ) /* Unable to find filter */
 {
 lstrcpy( (LPSTR) szString,
 "No filter specified in WIN.INI for file extension." );
 }
 else if ( !ImportFile( hWnd ) ) /* Unable to import */
 {
 lstrcpy( (LPSTR) szString, "Unable to import file." );
 }
 else bError = FALSE; /* Import was a success */

 if (bError)
 {
 MessageBox( hWnd, (LPSTR) szString,
 (LPSTR) szAppName,
 MB_OK MB_ICONHAND );

 DestroyWindow( hWnd );
 }
 break;


 case WM_PAINT: /* Paint window by playing metafile. */

 if ( PictInfo.hmf ) /* A file has been imported. */
 {
 PAINTSTRUCT ps;
 HDC hDC = BeginPaint( hWnd, &ps );

 SetMapMode( hDC, MM_ANISOTROPIC );
 SetWindowExt( hDC, PictInfo.bbox.right - PictInfo.bbox.left,
 PictInfo.bbox.bottom - PictInfo.bbox.top );
 GetClientRect( hWnd, &rc );
 SetViewportExt( hDC, rc.right - rc.left, rc.bottom - rc.top );

 SetWindowOrg( hDC, PictInfo.bbox.left, PictInfo.bbox.top );
 PlayMetaFile( hDC, PictInfo.hmf );

 EndPaint( hWnd, &ps );
 }
 break;

 case WM_SIZE: /* Invalidate, and paint full window. */

 GetClientRect( hWnd, &rc );
 InvalidateRect( hWnd, &rc, TRUE);
 UpdateWindow( hWnd );
 break;

 case WM_DESTROY: /* Send WM_QUIT message to message loop to end. */

 PostQuitMessage( hWnd );
 break;

 case WM_COMMAND: /* Creates About dialog box. */

 if ( wParam == IDM_ABOUT )
 {
 lpProc = MakeProcInstance( AboutDlgProc, hInst );

 DialogBox( hInst, "AboutBox", hWnd, lpProc );

 FreeProcInstance( lpProc );

 break;
 } /* Otherwise, fall through to default. */

 default:

 return ( DefWindowProc( hWnd, message, wParam, lParam ) );
 }
 return (NULL);
}

/***************************************************************************
* *

* FUNCTION: ImportFile *
* PURPOSE : Translate a graphic file to a metafile using a filter *
* *
* ARGS : HWND hWnd - handle to main window. *
* RETURN : TRUE if file import is successful, otherwise FALSE. *
* *
* COMMENTS: Loads a filter DLL into memory, and uses it's import *
* functionality to translate a graphic file to a metafile. *
* Uses Aldus-standard interface functions, version 1 or 2. *
* *
***************************************************************************/

BOOL NEAR ImportFile(HWND hWnd)
{
 HANDLE hFilter = NULL, hPrefMem = NULL;
 WORD wFilterResult = -1;
 HDC hPrintIC = NULL;

/* Version 1.0 Filter functions */

 PFN_INFO lpfnGetFilterInfo = NULL;
 PFN_IMPORT lpfnImportGR = NULL;
 PFN_PREF lpfnGetFilterPref = NULL;

/* Version 2.0 Filter functions */

 PFN_VER lpfnGetFilterVersion = NULL;
 PFN_ISMY lpfnIsThisMyFile = NULL;
 PFN_PREF2 lpfnGetFilterPref2 = NULL;
 PFN_OUTPUT lpfnOutputGR = NULL;

/* Load appropriate filter. */

 hFilter = LoadLibrary( (LPSTR) szFilter );

/* Try to find "GetFilterVersion." */

 lpfnGetFilterVersion = GetProcAddress( hFilter,
 "GetFilterVersion" );

 if ( !lpfnGetFilterVersion ) /* This is a v 1.0 filter. */
 {

 lpfnGetFilterInfo = GetProcAddress( hFilter,
 "GetFilterInfo" );

 if ( lpfnGetFilterInfo )
 {
 wFilterResult = (*lpfnGetFilterInfo)( 2,
 NULL,
 &hPrefMem,
 NULL );
 }

/* Call filter's GetFilterPref function, which creates a "filter preference"
** dialog box to set import options. */

 lpfnGetFilterPref = GetProcAddress( hFilter,
 "GetFilterPref" );


 if ( lpfnGetFilterPref )
 {
 (*lpfnGetFilterPref)( hInst, hWnd, hPrefMem, 1 );
 }

/* Call filter's ImportGR function to convert the file to a Windows
** metafile (information is stored in FileSpec). */


 lpfnImportGR = GetProcAddress( hFilter, "ImportGR" );
 if ( lpfnImportGR )
 {
 hPrintIC = GetPrinterIC();

 wFilterResult = (*lpfnImportGR)( hPrintIC,
 &FileSpec,
 &PictInfo,
 hPrefMem );
 }
 }

 else /* This is a v. 2.0 or higher filter. */
 {

/* Ensure that current file is compatible with current filter. */

 lpfnIsThisMyFile = GetProcAddress( hFilter,
 "IsThisMyFile" );

 if ( lpfnIsThisMyFile )
 {

 (*lpfnIsThisMyFile) ( &FileSpec );

 }

/* Get user's import preferences. */

 lpfnGetFilterPref2 = GetProcAddress( hFilter, "GetFilterPref" );
 if (lpfnGetFilterPref2)
 {
 wFilterResult = (*lpfnGetFilterPref2) ( hInst,
 hWnd,
 &hPrefMem,
 FileSpec.fType,
 NULL,
 &FileSpec);
 }
/* Convert chosen file to a metafile. */

 lpfnOutputGR = GetProcAddress( hFilter, "OutputGR" );
 if (lpfnOutputGR)
 {
 hPrintIC = GetPrinterIC();

 wFilterResult = (*lpfnOutputGR) ( NULL,
 hPrintIC,
 &FileSpec,

 NULL,
 &PictInfo,
 hPrefMem,
 NULL,
 FALSE);
 }
 }

/* Free the memory allocated to the filter DLL and the preference memory.*/

 FreeLibrary( hFilter );
 GlobalFree( hPrefMem );

/* Set the window title to the file name, or return false. */

 if ( wFilterResult == 0 )
 {
 SetWindowText( hWnd, FileSpec.fullName );
 return TRUE;
 }
 else
 {
 return FALSE;
 }
}

/***************************************************************************
* *
* FUNCTION: GetExtFilter *
* PURPOSE : Get filename of filter appropriate for current file. *
* *
* ARGS : none *
* RETURN : TRUE if able to find filter, otherwise FALSE. *
* *
* COMMENTS: Gets all filter names listed under [GraphicViewer] *
* heading in WIN.INI. Checks which one entry supports *
* files with the same extension as FileSpec.fullName. *
* *
***************************************************************************/

BOOL NEAR GetExtFilter( void )
{
 PSTR pDesc, pExt, pSupExt;
 int nLen = lstrlen( FileSpec.fullName );
 char szItem[ IBMFNSIZE + 4 ];

 /* Set pExt to last char in FileSpec.fullName. */

 if (!nLen) return FALSE;

 pExt = FileSpec.fullName + nLen - 1;

 while ( *(pExt - 1) != '.' )
 {
 pExt--;
 if (pExt == FileSpec.fullName) return FALSE;
 }

 /* get all profile string entries (description of filter). */


 nLen = GetProfileString( szAppName, NULL, NULL, szString, STRINGSIZE );

 /* start with the first description. */

 pDesc = szString;

 /* while we still have a string to check... */

 while ( pDesc < szString + nLen )
 {
 /* get the entry for this filter
 ("[filter file name],[ext]") */

 GetProfileString( (LPSTR) szAppName, (LPSTR) pDesc,
 NULL, (LPSTR) szItem, IBMFNSIZE + 4 );

 /* if one exists, and its extension matches the file
 ** extension... */

 strcpy( szFilter, strtok( szItem, "," ) );
 pSupExt = strtok( NULL, ", " );

 if( lstrcmpi( (LPSTR) pExt, (LPSTR) pSupExt ) == 0 )
 {
 *(pSupExt-1) = '\0';
 return TRUE;
 }
 else
 {
 /* move on to next filter. */
 pDesc += lstrlen( (LPSTR) pDesc ) + 1;
 }
 }

 /* if we get here, we couldn't find one. Make sure szFilter is blank.*/

 szFilter[0] = '\0';

 /* Report failure. */

 return FALSE;
}

/***************************************************************************
* *
* FUNCTION: GetPrinterIC *
* PURPOSE : Get information context for active print device. *
* *
* ARGS : none *
* RETURN : handle to printer's information context or NULL if fail. *
* *
* COMMENTS: Gets device description from [windows] heading in *
* WIN.INI. Breaks down the description into device name, *
* driver name, and output port. Uses CreateIC() to get *
* an information context. *
* *
***************************************************************************/


HDC GetPrinterIC( void )
{
 PSTR pDevice, pDriver, pOutput;
 HDC hReturn = NULL;

/* Get the information on the current printer, listed under "windows" in
** WIN.INI as "device=[device name],[driver],[output port]". */

 GetProfileString("windows", "device", NULL,
 (LPSTR) szString, STRINGSIZE);

 if ( ( pDevice = strtok( szString, "," ) ) &&
 ( pDriver = strtok( NULL, ", " ) ) &&
 ( pOutput = strtok( NULL, ", " ) ) )
 {
 hReturn = CreateIC( (LPSTR) pDriver, (LPSTR) pDevice,
 (LPSTR) pOutput, NULL );
 }

 return ( hReturn );
}

/***************************************************************************
* *
* FUNCTION: AboutDlgProc *
* PURPOSE : Handles messages for AboutBox dialog box. *
* *
* ARGS : Standard callback arguments. *
* RETURN : N/A. *
* *
* COMMENTS: Closes AboutBox when OK button, Enter or ESC are pressed.*
* *
***************************************************************************/


BOOL FAR PASCAL AboutDlgProc(HWND hDlg, unsigned message,
 WORD wParam, LONG lParam)
{
 switch (message)
 {
 case WM_INITDIALOG: /* Beginning. No functionality. */

 return (TRUE);

 case WM_COMMAND:

 switch (wParam)
 {
 case IDOK: /* OK or Enter were pressed. */
 case IDCANCEL: /* Esc was pressed. */

 EndDialog(hDlg, TRUE);
 return (TRUE);
 }
 break;
 }
 return (FALSE);
}
































































July, 1992
PROGRAMMING PARADIGMS


Multimedia and the Art (or Science?) of UI Design




Michael Swaine


Last month I claimed that a new approach to user-interface design was needed,
and cited developments like multimedia and virtual reality as the forces
behind the need.
My inspiration in this was Brenda Laurel. Laurel is currently one of the
principals in Telepresence Research, a research and development company
focusing on technology that makes people feel physically present in remote or
computer-generated environments. Before that, she was a software designer and
programmer; marketeer; producer; researcher; protegee of Alan Kay; and
human-interface consultant to Apple Computer, LucasArts Entertainment, and the
School of Computer Science at Carnegie-Mellon University. Laurel edited a fat
book on human-interface design, The Art of Human-Computer Interface Design
(Addison-Wesley, 1990 and henceforth herein, TAOHCID), with contributions from
most of the recognized and/or self-styled human-computer interface experts.
Although the book was originally conceived as an in-house Apple project, by
the time Laurel got through with it, half the contributors were from outside.
This month, I discuss that and another Laurel book, Computers as Theater
(Addison-Wesley, 1991). But first, a quick look at yet another Addison-Wesley
book on interface design, Tog on Interface (1992) by Bruce "Tog" Tognazzini.


Tog Soup


Tog is an Apple veteran, employee number 66, and was, as of the publication of
this book, Apple's Human-Interface Evangelist. Tog on Interface is largely
drawn from the column he wrote for Apple's developer publication, Apple
Direct, so if you've read those columns, you've read a lot of this book.
There is much new material, though, some drawn from other sources and some
augmenting the columns. But while the columns may be augmented, they have not
been updated: Some issues discussed here are no longer issues. This is the
right approach, I think. The book is not intended to be the latest word on
Apple's human-interface guidelines; there are better sources for that
information.
The book is something more broadly useful: the thoughts of a human-interface
expert who has thought long and hard about the issues, and who has actually
spent time working with both users and developers. A lot of what's in the book
should be useful to Windows developers or anyone who writes software.
I particularly liked Tog's advice on user testing on the cheap, his data on
Fitt's Law and menus, and his critique of Ashlar Vellum.
The chapter on Vellum touches on a subject that will come up again later in
this column: agents. Vellum is a CAD program that solves the user problem of
locating specific targets, like the midpoint of a line, precisely. Vellum aids
the user in this problem by the use of what Tog (but not Ashlar) calls an
"agent." Ashlar calls it the Drafting Assistant, but Tog points out that it
satisfies Alan Kay's basic definition of agents: "computer processes that act
as guide, coach,and...amanuensis."


Blowing up Interface


Brenda Laurel is an interface iconoclast.
In TAOHCID she immediately tosses up the prevailing notion of interface as "a
discrete tangible thing that we can draw, design, implement, and attach to a
bundle of functionality." Her intention, she announces, is "to explode that
notion."
Not all the book's contributors contribute to the pyrotechnics; some of the
contributions help to define the target. Thomas Erickson of Apple's Advanced
Technology Group (ATG), for example, starts off the book with an illuminating
explanation of how one of the more mystifying aspects of the Mac interface
came to be. Countless critics have marvelled at the method Apple designers hit
upon for the user to cause a disk to be ejected: by throwing it in the trash.
Most of these critics have probably said or thought something like, "What
could they have been thinking of?" Erickson tells all.
But the book quickly dives into the problems of interface design. Don Norman,
author of The Design of Everyday Things (Doubleday, 1989), says that the best
designs come when the necessary knowledge, which includes programming and
graphic-design knowledge, is incorporated in the same person. Since
programmers and graphic artists seem to think in different ways, this is not
always practical; Erickson shows one way to fake it. Even if you don't think
of yourself as creative, he says, there is hope in what he calls design by
"symmetry."
Symmetry is a very powerful concept, but it isn't likely to turn a
right-brained person into a left-brained one, or vice-versa. Scott Kim, who a
few years ago set out quite deliberately to be the programmer-artist Norman
imagines, probes into the thinking styles of software developers and graphic
artists and comes up with a surprising conclusion: It isn't the big conceptual
or goal-defining notions like problem-solving and communication that separate
the programmers from the painters; it's the mundane survival skills of coding
and making pictures. Laurie Vertelney of Apple's ATG pitted a programmer
against a graphic artist on the task of redesigning the user interface of a
familiar piece of software, and reports on the results.
Apple product engineer Annette Wagner gives a graphic artist's view of a
conversation with the development team, and fantasizes about the ideal
prototyping tool. It brought back my own experience when I used to sit in
cover meetings for Dr. Dobb's. I would struggle to put my programming notions
into some form that would evoke something for Art Director Mike Hollister.
Mike, who had had less experience in dealing with programmers back then, would
struggle to pull some useful idea out of me. Our prototyping tool was
Associate Art Director Joe Sikoryak. Joe, an accomplished cartoonist, would
quickly sketch what he thought I was talking about and show it to us. Usually
it showed how graphically ridiculous my idea was, but even that moved the
brainstorming process along. Wagner's prototyping tool seems to me like an
attempt to bottle Joe after teaching him to program.


Boris and Natasha


But the bulk of the book consists of articles that are not only about
user-interface design but also at least partly relevant to developing
interfaces for multimedia.
Game designer Chris Crawford shows that many of the innovations in
user-interface design that will be heralded as multimedia takes off will have
been implemented commercially years earlier by game designers. Sometimes the
ease of use of game products doesn't translate to general-purpose computers:
Don Norman compares Nintendo machines with current computers, to the detriment
of computers, but it's not even clear how you would get to Nintendo-like ease
of installation on general-purpose computers.
Apple Fellow Alan Kay interprets Marshall McLuhan. "Message receipt is really
message recovery; anyone who wishes to receive a message embedded in a medium
must first have internalized the medium so it can be 'subtracted' out to leave
the message behind." "The medium is the message" means to Kay that, in order
to decode the message, you have to become the medium. Scary thought.
The idea of computer as medium rather than as vehicle inspired Kay to
introduce kids to computers before they take driver's ed. Maybe it's his work
with kids that has kept him young and flexible; despite the fact that the Mac
embodies a lot of his own ideas, Kay compares the continuing tweaking of the
desktop interface with a biological organism attempting to live in its own
waste products, a possibly apt but certainly unkind analogy.
Kay, an able analogist and metaphorist, is down on metaphor. "Metaphor," he
says, "is a poor metaphor for what needs to be done." Agents now, he likes
agents.
Agents run through the book like they do through an old Rocky & Bullwinkle
episode, but movies also figure heavily in the plot. Ted Nelson explains "the
right way to think about software design;" a way that touches on "the movie
analogy." Joy Mountford of Apple's Human Interface Group reprises "the
metaphor of filmmaker," and Paul Heckel's The Elements of Friendly Software
Design (Sybex, 1991), with its copious analogies between filmmaking and
software making, gets cited or quoted repeatedly in TAOHCID. These writers,
though, are all talking about a movie metaphor or analogy. In fact, computers
really are becoming tools for moviemaking. What about computer-controlled
video and animation, and what about virtual reality? Laurel's contributors
generally believe in these technologies; they think that multimedia has a lot
to contribute to the user's experience.
Ronald Baecker and Ian Small of the Dynamic Graphics Project, University of
Toronto, explain why animation should be thought of as a serious element in
the design of the user's experience, rather than as a toy. Animation, they
say, should be thought of as "movements that are drawn" rather than as
"drawings that move." It's the dynamic aspect of the information that
animation captures. One example they present is the animation of algorithms,
showing how a sieve sieves, a bubble sort bubbles.
Gitta Solomon gives a corresponding argument for the virtues of color, and
Gordon Kurtenbach, who was an Apple summer intern, and Eric Hulteen, ATG, do
the same for gestures. No, we don't need to wave our hands to dismiss files,
but gestures would provide a richness and naturalness of input that would be
invaluable to composers, choreographers, sculptors, and designers of
simulations. Joy Mountford and William Gaver, another summer intern, give the
arguments for speech input and nonspeech audio output, and Chris Schmandt of
MIT's Media Lab follows up with descriptions of several experiments in sound
output: Phone Slave, the Conversational Desktop, and the Grunt system.
Grunt is interesting for what it shows about the possibilities of extremely
limited speech recognition. Grunt responds strictly on the basis of the
duration of the utterance, and in its domain of discourse this proves to be
adequate. Users regard interaction with Grunt as remarkably natural, believe
it or not.


Bow Ties Optional


As I mentioned, the themes of agents and multimedia run through this book, but
they might not to be two distinct themes at all. Agents and multimedia could
end up being aspects of a single approach to the design of the user
experience, according to some of the contributors.
Nick Negroponte, director of MIT's Media Lab, talks about moving beyond
direct-manipulation interfaces to interfaces that let users delegate tasks.
"The desktop metaphor," he says, "is subject to serious change, soon." What
Negroponte pictures is an interface populated with agents, but not necessarily
the anthropomorphic bow-tied Phil of Apple's Knowledge Navigator videos.
Negroponte sees the personalization of agents as a matter of user taste: "If
you want your agents to wear bow ties, they will. If you prefer talking to
parallelopipeds, fine."
Which sets Laurel up to give her views on agents. Laurel defends
anthropomorphism, in a limited sense. Her argument is that as a species we
have a large investment in some pretty subtle skills for predicting behavior
from knowledge of character, and that it is entirely possible to give computer
processes enough aspects of character that we can use these skills to predict
their behavior.

There is a working example of some of this in Apple's Guides project. In this
system, highly anthropomorphic entities serve as travel agents to aid users in
navigating through a large educational hypermedia database. Tim Oren, ATG,
discusses Guides in the book.
Beyond multimedia is the realm of virtual reality: datagloves, heads-up
displays. Myron Kruger of Artificial Realities and Scott Fisher, Laurel's
partner in Telepresence, sketch the benefits of VR and telepresence, while
Autodesk president John Walker and author Howard Rheingold talk seriously
about cyberspace (aka the Gibson Interface).
All of which leads--where? To the call for the creation of a new medium,
according to Tim Oren. Or perhaps the recognition that what we already have is
a medium, with all the associated problems like media bias.


It's Just a Stage


In Computers as Theater, Laurel makes that assumption.
Ted Nelson has maintained for decades that software design has much in common
with making movies. Since 1967 he has been pointing out techniques and
concepts that software design can profitably borrow from film. Paul Heckel
developed the parallels between the two fields in The Elements of Friendly
Software Design, maintaining that software is mainly about communication, and
the communication medium from which we can draw the most useful analogies is
film. But another human-interface expert draws parallels with a more ancient
art form: theater.
In Computers as Theater, Laurel presents a radically different way of looking
at the interaction of humans and computers; so radical that she is entirely
serious when she says that "the concept of interface itself is a hopeless
hash" and that "we might to better to throw it out and begin afresh." Her
model is theater, and she means to apply the model more seriously than Heckel
or Nelson apply their movie metaphor. Because for Laurel, computers as theater
is not a metaphor at all, but a model. Where Heckel presents a collection of
D.W Griffith's moviemaking techniques and shows how they apply to
computer-interface design, Laurel digs deeper, presenting a well-developed
theory of the structure of drama and showing how it fits the structure of
human-computer interaction.
The theory she presents is the theory of drama: Aristotle's poetics.
Aristotle's theory has stood the test of time and is still used as a tool for
understanding and criticizing the structure of drama. It has survived, the
argument goes, because of its generality; its not a theory of acting or stage
direction or any specifically dramatic activities, but a general theory of the
structure of things interactive. What Laurel does with it is to derive a
poetics of (computer-based) interactive form. That's her interface model, and
it's extremely basic.
Because the model is so basic, it can be hard to see how it applies in
specific cases, while in other cases can be too easy to slip into the obvious
analogy. Laurel herself relies too heavily, I think, on examples that are too
superficially cinematic, games in particular. But if I get her right, she's
not talking about adding drama to software, but rather about a particular
theory of the structure of interaction, a theory that ought to apply equally
to the design, analysis, and criticism of software.
The example that Laurel is most prepared to offer is a rich source of data on
interactivity and narrative techniques in human-computer interaction, but it
is probably not ideal for exemplifying her theory. That's the Guides project
she worked on at Apple, an experiment in the use of agents. In this project,
the user is aided in exploring a rich database like a history of the American
West by agents. The agents of the Guides project, unlike the frequently
proposed on-line agents that would scan the nets for material of interest to
you, represent their own interests. Each agent--the settler, the native
American--is intensely personalized, down to the digitized photo of a person
in appropriate costume. When consulted, they offer suggestions about next
moves in the database and share anecdotes, both based on their own carefully
construct biases and perspectives. When the user is examining material of
little or no interest to a particular agent, its picture will change,
slumping, falling asleep. Users can also construct their own agents.
The Guides project is an intriguing examination of subjectivity and point of
view and the use of some very human and natural modes of interaction and the
use of narrative structure--but I don't see that it says all that much about
Laurel's theory. Possibly no current work in human interface can say much
about the model, since it is a model of something that doesn't exit yet.
An interaction should be designed, according to Laurel, to have an
identifiable beginning, middle, and end, and to take up no more than a few
hours. These could be considered the minimum standards for getting into
Laurel's game, and nobody is really designing software this way. In
particular, adventure games traditionally violated this rule, packing stupid
puzzles into the games to keep them going as long as possible. Games whose
only exit is the symbolic death of the player violate another Laurel
prerequisite: that an interactive session ought to leave you with something of
value.
If nobody is playing Laurel's game yet, its not possible to point to good or
bad examples. All current software is pretty much irrelevant. Computers as
Theater, unlike the other books discussed here, is not immediately practical
as a guide for developing software. It presents a new approach to thinking
about software, and asks for a deeper intellectual commitment than the other
books do.
But assuming she's on the right track, it should pay off big for those ready
to make the investment.












































July, 1992
C PROGRAMMING


Of Jazz, C++, and D-Flat Controls


 This article contains the following executables: DFLT12.ARC D12TXT.ARC


Al Stevens


I have just returned from the Borland Developer's Conference in Monterey,
California. This is the conference to attend if you use Borland products and
like parties. The atmosphere is laid back, yet the technical information flows
fast and free. The only stuffy part of the conference is the dreary Microsoft
bashing that permeates every Borland presentation. Other than for that, we had
a great time. I sat on an evening panel of distinguished authors who advised
the attendees how to become writers and what to expect from writing in the
field of computer literature. The session followed the Borland wine and cheese
party and was marked by jet lag, beer on tap, and an occasional difference of
opinion. A good time was had. A party at the Monterey aquarium featured a
string quartet, a flute and bass jazz duo with Philippe Kahn on flute, buffet
banquets of all kinds of food, and, of course, lots of live fish under glass
to look at. Another evening ended with Philippe's Turbo Jazz band conducting a
sit-in jam session. After listening to about an hour of bone-jarring fusion, I
took a turn at the Steinway and asked about playing some real jazz. We did,
and the room swung. Philippe has found his voice in the flute.
Grady Booch addressed the conference about the methods in his book,
Object-Oriented Design with Applications (Benjamin Cummings, 1991). This book
is the authoritative contemporary work on object-oriented analysis, design,
and programming. It very nicely filled in most of the blanks in my experience
and corrected a few misconceptions. After reading the first two chapters, I
drew some conclusions. First, I must reluctantly admit that if I had not
already learned what object-oriented programming is by using C++, I would not
have understood the book. Not that the book is hard to read. Quite the
opposite, in fact. However, a programmer needs some personal experience to
relate to the concepts and examples in any explanation of OOP, and although
Booch's is the best, it still remains that the paradigm shift cannot be
taught. It can only be learned. My second conclusion is that you can probably
teach object-oriented programming to non-programmers as if nothing else
existed. If they have no paradigm already in place, no shift is required, and
the shift is the real problem -- not the paradigm.


D-Flat++


I guess it's going to be called that. I'm well into the first layer of
D-Flat++, and I am forming some positive opinions about the suitability of C++
as an alternative to the event-driven, message-based paradigm of Windows,
D-Flat, and others. The first thing you will see when you compare similar
processes in the two D-Flat libraries is that the C++ code is a lot less
complex. For one thing, when you build member functions into classes, the
compiler manages all the pointer dereferencing. You don't see it in your code,
and the results are much cleaner.
You will see another obvious improvement in DF++'s string handling. Everybody
builds a string class, and so did I. Where D-Flat has long sequences of
specialized strcpys, strcats, *cp++, memsets, and so on, DF++ simply
instantiates a string, concatenates it with another string, pulls a substring
out of it, and so on. The code is easier to read, and there are a lot fewer
lines of it. The notational improvements are dramatic.
C++ influences a program's design. You tend to use more classes because
encapsulation protects data items from other parts of the program. You feel
more secure about the integrity of data variables knowing that encapsulation
will prevent you from making unbridled accesses to hidden members from distant
parts of the program. A properly designed C++ program will rarely hear the
telltale slap on the forehead of a programmer who just uncovered some
long-forgotten use of an innocent variable.
If you haven't tried C++ yet, you might not see much point to it. If you have,
I'm singing to the choir. That's the consensus. C programmers do not see the
advantages of C++ until they have plunged, whereupon they become converts.
This is no evangelization of OOP, however. I'll leave that to others. This is
an endorsement of the C++ language's facility to build abstract data types,
hide information, and improve program notation. Call it what you will, but I'm
a third of the way through, and so far the notation is clearer, the code is
smaller, and there are no external variables in D-Flat++ other than the ones
that are static members of classes, and it would be unseemly to build any.
Grady Booch holds that if you are using C++ merely as an improved C, then you
are missing and perhaps even abusing the power of the language. I am coming
through experience to agree with that opinion.


Lovelier the Second Time Around


I have long believed that you should build a software system twice. The first
one is for teaching you how the system ought to work. You throw it away and do
the job right the second time. We seldom get that opportunity because the bean
counters would never allow it. Version 2 always builds on version 1, which has
already been paid for, correct or not. But D-Flat++ is not a C++ shroud
wrapped around the D-Flat C library. It is a complete rewrite in C++ because
one of the objectives is to use the advantages of the C++ language to build a
user-interface API. So, knowing that I am going to rewrite all the code
anyway, when I prepare to implement a particular feature, I stop to consider
how that feature gave me trouble in D-Flat, and I design the trouble away. The
result is a second chance to do the job right. There are no bean counters in
charge of D-Flat, so I get to make those decisions. Programmers 1, bean
counters 0.


D-Flat Message Boxes et al.


Back to the old days. Last month I described how D-Flat implements the
dialog-box window class. A dialog box consists of the window itself and a
number of control windows into which the user enters data or commands. In
months past you learned about the edit-box and list-box window classes, both
of which can be control windows on a dialog box. There are several more
control window classes, which are derived from other classes and which
implement the buttons and boxes that you can put on a dialog box. The dialog
boxes in the memopad example application use all of these control windows. I
will discuss each one by describing the source file that implements it. They
all have the usual format for a class's source file. The window-processing
module and supporting functions are in a stand-alone C-source file.


The Box


Listing One, page 142, is box.c, the source file that implements the box
window class. Its purpose is to draw a rectangle with a label around other
controls on the dialog box. The box is simple. It does its job unobtrusively
by refusing to accept the focus and passing mouse messages to its parent
window, the dialog box that hosts it. The border message displays the box's
text label over the top edge of the border starting in the first position past
the upper-left corner.
Listing Two, page 142, is button.c, the source file that implements
pushbuttons. A pushbutton displays as a single line of text with a different
color than its parent window and with a shadow made of half-height block
characters from the graphics character set. When the user presses the button,
either with the mouse or the keyboard, the program displays the button in a
pushed configuration and then waits for the key or button release to paint the
button in its original configuration. Then it sends the button's associated
COMMAND message to the parent window.
Listing Three, page 142, is checkbox.c, the source file that implements the
check-box control. A check box records a toggled option setting. It displays
with the text characters [X] when the option is toggled on and with [ ] when
it is off. The user toggles the setting by clicking the check box with the
mouse or by selecting it with the keyboard and pressing the space bar. The
keyboard cursor displays where the X goes when the check box is selected to
provide the user a visual clue. The CheckBoxSetting function returns True if
the specified check box is toggled on and False if it is not. The setting for
a dialog box persists between uses of the dialog box, so the program can call
the function at any time.
Listing Four, page 142, is combobox.c, the source file that implements the
combo-box control. A combo box is a combination of a single-line edit box and
a drop-down list box, thus its name. The combo-box class is derived from the
edit-box class. When the program creates the control, the combo box's
CREATE_WINDOW message creates the associated list box, but does not display
it. The PAINT message adds a down-pointing arrow at the end of the single-line
edit box. That token is the scroll button that the user clicks to drop down
the list box. If the user clicks the token or presses the down-arrow key while
the combo box has the focus, the program sends the SETFOCUS message to the
list box, which then displays itself. The ListProc function is the
window-processing module for the combo box's drop-down list box. When the user
moves the list box's selection cursor, the list-box class sends itself the
LB_SELECTION, which this module intercepts. It sends the text of the current
selection to the edit-box component of the combo box. The application program
calls PutComboListText to add lines of text to the drop-down list box after it
opens the dialog box.
Listing Five, page 143, is msgbox.c, the code that implements a generic
message dialog box and some specialized ones. Several macros in dflat.h call
the GenericMessage function to display and process canned message boxes, among
them the MessageBox, the YesNoBox, the ErrorBox, the CancelBox, and the
InputBox. Each of these has its own window-processing module, to process
messages in its own way. The MomentaryMessage function displays a message box
that the caller must close. This process allows a program to post a "please
stand by" message box while some lengthy process is under way. The program
closes the window when the process is done. The slider box and the watch icon,
described soon, are two other ways to tell the user that a time-consuming
process is going on.
Listing Six, page 147, is radio.c, the code that implements the radio button.
Besides the window-processing module which paints the radio button and
processes its keyboard and mouse actions, the source file includes the
PushRadioButton function which selects a specified radio button and the
RadioButtonSetting function which tests the on/off state of a specified radio
button. Both functions accept a pointer to the dialog box and the command code
associated with the radio button.
Listing Seven, page 147, is slidebox.c, the code that implements the slider
box. A slider box is a temporary display that a program uses to show the user
the progress of a lengthy process. The program calls the SliderBox function
with the length in characters of the slider box, a title for the slider box's
dialog window, and a text message to display above the slider box itself. The
function builds and displays the slider box dialog and returns its WINDOW
handle to the caller. The caller then sends frequent PAINT messages with a
percentage of completion expressed in the second parameter. The value will
range from 0 to 100. Until the last value is sent, the PAINT message returns a
true value. When the PAINT message arrives with a percent that is equal to or
greater than 100, the window closes, and the PAINT message returns a false
value. If the user selects the Cancel command button before the process is
complete, the window closes and the PAINT message returns a false value. The
memopad example program uses a slider box to display the progress of a
printing file.
Listing Eight, page 148, is spinbutt.c, the code that implements the spin
button. A spin button is a one-line list box with up and down arrows at the
extreme right of the text. The user can scroll through the values with the up-
and down-arrow keys or by clicking the up- and down-arrow characters. At any
given time, the spin button control represents the currently displayed value.
The print-setup dialog box in the memopad example program uses spin buttons to
control the print margins.
Listing Nine, page 148, is text.c, the source code that implements the static
text-display control on a dialog box. The text control is used to label other
controls, and it can have a highlighted character to indicate the Alt key
combination that selects the associated control.
Listing Ten, page 148, is watch.c, the source file that implements the
wristwatch icon. A program can call the WatchIcon function to change the mouse
cursor into a tiny window that resembles a watch dial as well as text mode can
do it. The WatchIcon function returns the WINDOW handle of the icon. The
program can send the icon window a CLOSE_WINDOW message to return to the
normal mouse cursor. The watch icon is D-Flat's version of the Windows
hourglass cursor.


How to Get D-Flat Now


The D-Flat source code is on CompuServe in Library 0 of the DDJ Forum and on
M&T Online. If you cannot use either online service, send a formatted 360K or
720K diskette and an addressed, stamped diskette mailer to me in care of Dr.
Dobb's Journal, 411 Borel Ave., San Mateo, CA 94402. I'll send you the latest
version of D-Flat. The software is free, but if you'd care to, stuff a dollar
bill in the mailer for the Brevard County Food Bank. They help the homeless
and hungry. We call it DDJ's program of "careware." If you want to discuss
D-Flat with me, use CompuServe. My ID is 71101, 1262, and I monitor the DDJ
Forum daily.



Trouble Right Here...


Professor Harold Hill mounts the base of the town-square statue to alert the
citizenry of an impending disaster. To the citizens the trouble is real or
imagined, depending on how they feel about pool tables and their social
consequences. Professor Hill's cure was a boy's band. We've got trouble right
here, too. Please observe me if you will....
Time was when C compilers had no debuggers. They just compiled the best code
they could, and we used printf and getchar to look at variables and set
breakpoints -- to debug. It's hard to believe that we ever debugged that way,
but we did. It was slow, but it worked. Along came source-level debuggers, and
we were hooked. You'll never catch me going back to the old ways, says I.
There was a cost, however. The compilers had to put some debugging data into
the executable files so the debuggers could associate the executable code with
the source code. That's OK, they said, the debugging information is passive
except when you are debugging. It adds to the size of the executable file, but
there is no performance penalty when you are not debugging, and compiling
without debugging information merely strips the inert data from the executable
file without changing the effects of the executing code. Sounds like a good
idea to me.
Not long ago I got the early incarnations of D-Flat++ running with Borland C++
3.0. The executable file was 175K, and I wanted to see how it looked without
debugging information. I compiled without it and was pleased to see that the
executable file was now only 48K. So far, so good. But when I tried to run the
program, it blew up right away, leaving bug droppings all over the phosphor. I
put the debugging information back in, and the program ran fine. What to do?
How can I debug a bug that is only a bug when the debugger is not there? How,
indeed? printf and getchar, oops, I mean cout and cin, that's how. Rats. I bit
that bullet, and lo and behold discovered that the compiler's peekb function
was reading bad values from BIOS RAM. My program thinks the screen dimensions
are 0 by 118. That will never do.
To continue the quest, I recompiled to assembly language and learned that the
compiler's inline peekb function was compiling into bum code. OK, so BC++ 3.0
has a bug of its own. Stuff happens. They'll fix it. But wait. Why does the
program work when the debugging code is there? Isn't debugging information
supposed to be nonintrusive? I compiled to assembly language with debugging
information and found that the peekb function call is not expanded inline like
a macro, but that it actually calls a real function named peekb. The compiled
debuggable code is completely different from the debuggerless version. What
kind of debuggery is this?
Pete Becker of Borland set the matter straight, and it's something you should
know about. When you compile with debugging information included, BC++ does
not expand inline functions into inline code unless you use the -vi option.
Therefore, you should do most of your final testing either without debugging
information or with the -vi option enabled. Otherwise you might be
distributing a different program than you tested, and that would be trouble, I
said trouble right here, and pretty soon we'd be needing a boy's band.
Oh think, my friends, how can any compiler ever hope to compete with a slide
trombone...


_C PROGRAMMING COLUMN_
by Al Stevens


[LISTING ONE]

/* ----------- box.c ------------ */
#include "dflat.h"

int BoxProc(WINDOW wnd, MESSAGE msg, PARAM p1, PARAM p2)
{
 int rtn;
 CTLWINDOW *ct = GetControl(wnd);
 if (ct != NULL) {
 switch (msg) {
 case SETFOCUS:
 case PAINT:
 return FALSE;
 case LEFT_BUTTON:
 case BUTTON_RELEASED:
 return SendMessage(GetParent(wnd), msg, p1, p2);
 case BORDER:
 rtn = BaseWndProc(BOX, wnd, msg, p1, p2);
 if (ct != NULL)
 if (ct->itext != NULL)
 writeline(wnd, ct->itext, 1, 0, FALSE);
 return rtn;
 default:
 break;
 }
 }
 return BaseWndProc(BOX, wnd, msg, p1, p2);
}





[LISTING TWO]

/* -------------- button.c -------------- */
#include "dflat.h"
void PaintMsg(WINDOW wnd, CTLWINDOW *ct, RECT *rc)
{
 if (isVisible(wnd)) {
 if (TestAttribute(wnd, SHADOW) && cfg.mono == 0) {

 /* -------- draw the button's shadow ------- */
 int x;
 background = WndBackground(GetParent(wnd));
 foreground = BLACK;
 for (x = 1; x <= WindowWidth(wnd); x++)
 wputch(wnd, 223, x, 1);
 wputch(wnd, 220, WindowWidth(wnd), 0);
 }
 if (ct->itext != NULL) {
 unsigned char *txt;
 txt = DFcalloc(1, strlen(ct->itext)+10);
 if (ct->setting == OFF) {
 txt[0] = CHANGECOLOR;
 txt[1] = wnd->WindowColors
 [HILITE_COLOR] [FG] 0x80;
 txt[2] = wnd->WindowColors
 [STD_COLOR] [BG] 0x80;
 }
 CopyCommand(txt+strlen(txt),ct->itext,!ct->setting,
 WndBackground(wnd));
 SendMessage(wnd, CLEARTEXT, 0, 0);
 SendMessage(wnd, ADDTEXT, (PARAM) txt, 0);
 free(txt);
 }
 /* --------- write the button's text ------- */
 WriteTextLine(wnd, rc, 0, wnd == inFocus);
 }
}
void LeftButtonMsg(WINDOW wnd, MESSAGE msg, CTLWINDOW *ct)
{
 if (cfg.mono == 0) {
 /* --------- draw a pushed button -------- */
 int x;
 background = WndBackground(GetParent(wnd));
 foreground = WndBackground(wnd);
 wputch(wnd, ' ', 0, 0);
 for (x = 0; x < WindowWidth(wnd); x++) {
 wputch(wnd, 220, x+1, 0);
 wputch(wnd, 223, x+1, 1);
 }
 }
 if (msg == LEFT_BUTTON)
 SendMessage(NULL, WAITMOUSE, 0, 0);
 else
 SendMessage(NULL, WAITKEYBOARD, 0, 0);
 SendMessage(wnd, PAINT, 0, 0);
 if (ct->setting == ON)
 PostMessage(GetParent(wnd), COMMAND, ct->command, 0);
 else
 beep();
}
int ButtonProc(WINDOW wnd, MESSAGE msg, PARAM p1, PARAM p2)
{
 CTLWINDOW *ct = GetControl(wnd);
 if (ct != NULL) {
 switch (msg) {
 case SETFOCUS:
 BaseWndProc(BUTTON, wnd, msg, p1, p2);
 p1 = 0;

 /* ------- fall through ------- */
 case PAINT:
 PaintMsg(wnd, ct, (RECT*)p1);
 return TRUE;
 case KEYBOARD:
 if (p1 != '\r')
 break;
 /* ---- fall through ---- */
 case LEFT_BUTTON:
 LeftButtonMsg(wnd, msg, ct);
 return TRUE;
 case HORIZSCROLL:
 return TRUE;
 default:
 break;
 }
 }
 return BaseWndProc(BUTTON, wnd, msg, p1, p2);
}





[LISTING THREE]

/* -------------- checkbox.c ------------ */
#include "dflat.h"
int CheckBoxProc(WINDOW wnd, MESSAGE msg, PARAM p1, PARAM p2)
{
 int rtn;
 CTLWINDOW *ct = GetControl(wnd);
 if (ct != NULL) {
 switch (msg) {
 case SETFOCUS:
 if (!(int)p1)
 SendMessage(NULL, HIDE_CURSOR, 0, 0);
 case MOVE:
 rtn = BaseWndProc(CHECKBOX, wnd, msg, p1, p2);
 SetFocusCursor(wnd);
 return rtn;
 case PAINT: {
 char cb[] = "[ ]";
 if (ct->setting)
 cb[1] = 'X';
 SendMessage(wnd, CLEARTEXT, 0, 0);
 SendMessage(wnd, ADDTEXT, (PARAM) cb, 0);
 SetFocusCursor(wnd);
 break;
 }
 case KEYBOARD:
 if ((int)p1 != ' ')
 break;
 case LEFT_BUTTON:
 ct->setting ^= ON;
 SendMessage(wnd, PAINT, 0, 0);
 return TRUE;
 default:
 break;

 }
 }
 return BaseWndProc(CHECKBOX, wnd, msg, p1, p2);
}
BOOL CheckBoxSetting(DBOX *db, enum commands cmd)
{
 CTLWINDOW *ct = FindCommand(db, cmd, CHECKBOX);
 if (ct != NULL)
 return (ct->isetting == ON);
 return FALSE;
}





[LISTING FOUR]

/* -------------- combobox.c -------------- */
#include "dflat.h"
int ListProc(WINDOW, MESSAGE, PARAM, PARAM);
int ComboProc(WINDOW wnd, MESSAGE msg, PARAM p1, PARAM p2)
{
 switch (msg) {
 case CREATE_WINDOW:
 wnd->extension = CreateWindow(
 LISTBOX,
 NULL,
 wnd->rc.lf,wnd->rc.tp+1,
 wnd->ht-1, wnd->wd+1,
 NULL,
 GetParent(wnd),
 ListProc,
 HASBORDER NOCLIP SAVESELF);
 ((WINDOW)(wnd->extension))->ct->command =
 wnd->ct->command;
 wnd->ht = 1;
 wnd->rc.bt = wnd->rc.tp;
 break;
 case PAINT:
 foreground = FrameForeground(wnd);
 background = FrameBackground(wnd);
 wputch(wnd, DOWNSCROLLBOX, WindowWidth(wnd), 0);
 break;
 case KEYBOARD:
 if ((int)p1 == DN) {
 SendMessage(wnd->extension, SETFOCUS, TRUE, 0);
 return TRUE;
 }
 break;
 case LEFT_BUTTON:
 if ((int)p1 == GetRight(wnd) + 1)
 SendMessage(wnd->extension, SETFOCUS, TRUE, 0);
 break;
 case CLOSE_WINDOW:
 SendMessage(wnd->extension, CLOSE_WINDOW, 0, 0);
 break;
 default:
 break;

 }
 return BaseWndProc(COMBOBOX, wnd, msg, p1, p2);
}
int ListProc(WINDOW wnd, MESSAGE msg, PARAM p1, PARAM p2)
{
 DBOX *db = GetParent(wnd)->extension;
 WINDOW cwnd = ControlWindow(db, wnd->ct->command);
 char text[130];
 int rtn;
 WINDOW currFocus;
 switch (msg) {
 case CREATE_WINDOW:
 wnd->ct = DFmalloc(sizeof(CTLWINDOW));
 wnd->ct->setting = OFF;
 break;
 case SETFOCUS:
 if ((int)p1 == FALSE) {
 SendMessage(wnd, HIDE_WINDOW, 0, 0);
 wnd->ct->setting = OFF;
 }
 else
 wnd->ct->setting = ON;
 break;
 case SHOW_WINDOW:
 if (wnd->ct->setting == OFF)
 return TRUE;
 break;
 case BORDER:
 currFocus = inFocus;
 inFocus = NULL;
 rtn = DefaultWndProc(wnd, msg, p1, p2);
 inFocus = currFocus;
 return rtn;
 case LB_SELECTION:
 rtn = DefaultWndProc(wnd, msg, p1, p2);
 SendMessage(wnd, LB_GETTEXT,
 (PARAM) text, wnd->selection);
 PutItemText(GetParent(wnd), wnd->ct->command, text);
 SendMessage(cwnd, PAINT, 0, 0);
 cwnd->TextChanged = TRUE;
 return rtn;
 case KEYBOARD:
 switch ((int) p1) {
 case ESC:
 case FWD:
 case BS:
 SendMessage(cwnd, SETFOCUS, TRUE, 0);
 return TRUE;
 default:
 break;
 }
 break;
 case LB_CHOOSE:
 SendMessage(cwnd, SETFOCUS, TRUE, 0);
 return TRUE;
 case CLOSE_WINDOW:
 if (wnd->ct != NULL)
 free(wnd->ct);
 wnd->ct = NULL;

 break;
 default:
 break;
 }
 return DefaultWndProc(wnd, msg, p1, p2);
}
void PutComboListText(WINDOW wnd, enum commands cmd, char *text)
{
 CTLWINDOW *ct = FindCommand(wnd->extension, cmd, COMBOBOX);
 if (ct != NULL) {
 WINDOW lwnd = ((WINDOW)(ct->wnd))->extension;
 SendMessage(lwnd, ADDTEXT, (PARAM) text, 0);
 }
}





[LISTING FIVE]

/* ------------------ msgbox.c ------------------ */
#include "dflat.h"
extern DBOX MsgBox;
extern DBOX InputBoxDB;
WINDOW CancelWnd;
static int ReturnValue;
int MessageBoxProc(WINDOW wnd, MESSAGE msg, PARAM p1, PARAM p2)
{
 switch (msg) {
 case CREATE_WINDOW:
 GetClass(wnd) = MESSAGEBOX;
 ClearAttribute(wnd, CONTROLBOX);
 break;
 case KEYBOARD:
 if (p1 == '\r' p1 == ESC)
 ReturnValue = (int)p1;
 break;
 default:
 break;
 }
 return BaseWndProc(MESSAGEBOX, wnd, msg, p1, p2);
}
int YesNoBoxProc(WINDOW wnd, MESSAGE msg, PARAM p1, PARAM p2)
{
 switch (msg) {
 case CREATE_WINDOW:
 GetClass(wnd) = MESSAGEBOX;
 ClearAttribute(wnd, CONTROLBOX);
 break;
 case KEYBOARD: {
 int c = tolower((int)p1);
 if (c == 'y')
 SendMessage(wnd, COMMAND, ID_OK, 0);
 else if (c == 'n')
 SendMessage(wnd, COMMAND, ID_CANCEL, 0);
 break;
 }
 default:

 break;
 }
 return BaseWndProc(MESSAGEBOX, wnd, msg, p1, p2);
}
int ErrorBoxProc(WINDOW wnd, MESSAGE msg, PARAM p1, PARAM p2)
{
 switch (msg) {
 case CREATE_WINDOW:
 GetClass(wnd) = ERRORBOX;
 break;
 case KEYBOARD:
 if (p1 == '\r' p1 == ESC)
 ReturnValue = (int)p1;
 break;
 default:
 break;
 }
 return BaseWndProc(ERRORBOX, wnd, msg, p1, p2);
}
int CancelBoxProc(WINDOW wnd, MESSAGE msg, PARAM p1, PARAM p2)
{
 switch (msg) {
 case CREATE_WINDOW:
 CancelWnd = wnd;
 SendMessage(wnd, CAPTURE_MOUSE, 0, 0);
 SendMessage(wnd, CAPTURE_KEYBOARD, 0, 0);
 break;
 case COMMAND:
 if ((int) p1 == ID_CANCEL && (int) p2 == 0)
 SendMessage(GetParent(wnd), msg, p1, p2);
 return TRUE;
 case CLOSE_WINDOW:
 CancelWnd = NULL;
 SendMessage(wnd, RELEASE_MOUSE, 0, 0);
 SendMessage(wnd, RELEASE_KEYBOARD, 0, 0);
 p1 = TRUE;
 break;
 default:
 break;
 }
 return BaseWndProc(MESSAGEBOX, wnd, msg, p1, p2);
}
void CloseCancelBox(void)
{
 if (CancelWnd != NULL)
 SendMessage(CancelWnd, CLOSE_WINDOW, 0, 0);
}
static char *InputText;
static int TextLength;
int InputBoxProc(WINDOW wnd, MESSAGE msg, PARAM p1, PARAM p2)
{
 int rtn;
 switch (msg) {
 case CREATE_WINDOW:
 rtn = DefaultWndProc(wnd, msg, p1, p2);
 SendMessage(ControlWindow(&InputBoxDB,ID_INPUTTEXT),
 SETTEXTLENGTH, TextLength, 0);
 return rtn;
 case COMMAND:

 if ((int) p1 == ID_OK && (int) p2 == 0)
 GetItemText(wnd, ID_INPUTTEXT, InputText, TextLength);
 break;
 default:
 break;
 }
 return DefaultWndProc(wnd, msg, p1, p2);
}
BOOL InputBox(WINDOW wnd,char *ttl,char *msg,char *text,int len)
{
 InputText = text;
 TextLength = len;
 InputBoxDB.dwnd.title = ttl;
 InputBoxDB.dwnd.w = 4 +
 max(20, max(len, max(strlen(ttl), strlen(msg))));
 InputBoxDB.ctl[1].dwnd.x = (InputBoxDB.dwnd.w-2-len)/2;
 InputBoxDB.ctl[0].dwnd.w = strlen(msg);
 InputBoxDB.ctl[0].itext = msg;
 InputBoxDB.ctl[1].dwnd.w = len;
 InputBoxDB.ctl[2].dwnd.x = (InputBoxDB.dwnd.w - 20) / 2;
 InputBoxDB.ctl[3].dwnd.x = InputBoxDB.ctl[2].dwnd.x + 10;
 InputBoxDB.ctl[2].isetting = ON;
 InputBoxDB.ctl[3].isetting = ON;
 return DialogBox(wnd, &InputBoxDB, TRUE, InputBoxProc);
}

BOOL GenericMessage(WINDOW wnd,char *ttl,char *msg,int buttonct,
 int (*wndproc)(struct window *,enum messages,PARAM,PARAM),
 char *b1, char *b2, int c1, int c2, int isModal)
{
 BOOL rtn;
 MsgBox.dwnd.title = ttl;
 MsgBox.ctl[0].dwnd.h = MsgHeight(msg);
 MsgBox.ctl[0].dwnd.w = max(max(MsgWidth(msg),
 buttonct*8 + buttonct + 2), strlen(ttl)+2);
 MsgBox.dwnd.h = MsgBox.ctl[0].dwnd.h+6;
 MsgBox.dwnd.w = MsgBox.ctl[0].dwnd.w+4;
 if (buttonct == 1)
 MsgBox.ctl[1].dwnd.x = (MsgBox.dwnd.w - 10) / 2;
 else {
 MsgBox.ctl[1].dwnd.x = (MsgBox.dwnd.w - 20) / 2;
 MsgBox.ctl[2].dwnd.x = MsgBox.ctl[1].dwnd.x + 10;
 MsgBox.ctl[2].class = BUTTON;
 }
 MsgBox.ctl[1].dwnd.y = MsgBox.dwnd.h - 4;
 MsgBox.ctl[2].dwnd.y = MsgBox.dwnd.h - 4;
 MsgBox.ctl[0].itext = msg;
 MsgBox.ctl[1].itext = b1;
 MsgBox.ctl[2].itext = b2;
 MsgBox.ctl[1].command = c1;
 MsgBox.ctl[2].command = c2;
 MsgBox.ctl[1].isetting = ON;
 MsgBox.ctl[2].isetting = ON;
 rtn = DialogBox(wnd, &MsgBox, isModal, wndproc);
 MsgBox.ctl[2].class = 0;
 return rtn;
}
WINDOW MomentaryMessage(char *msg)
{

 WINDOW wnd = CreateWindow(
 TEXTBOX,
 NULL,
 -1,-1,MsgHeight(msg)+2,MsgWidth(msg)+2,
 NULL,NULL,NULL,
 HASBORDER SHADOW SAVESELF);
 SendMessage(wnd, SETTEXT, (PARAM) msg, 0);
 if (cfg.mono == 0) {
 WindowClientColor(wnd, WHITE, GREEN);
 WindowFrameColor(wnd, WHITE, GREEN);
 }
 SendMessage(wnd, SHOW_WINDOW, 0, 0);
 return wnd;
}
int MsgHeight(char *msg)
{
 int h = 1;
 while ((msg = strchr(msg, '\n')) != NULL) {
 h++;
 msg++;
 }
 return min(h, SCREENHEIGHT-10);
}
int MsgWidth(char *msg)
{
 int w = 0;
 char *cp = msg;
 while ((cp = strchr(msg, '\n')) != NULL) {
 w = max(w, (int) (cp-msg));
 msg = cp+1;
 }
 return min(max(strlen(msg),w), SCREENWIDTH-10);
}





[LISTING SIX]

/* -------- radio.c -------- */
#include "dflat.h"
static CTLWINDOW *rct[MAXRADIOS];
int RadioButtonProc(WINDOW wnd, MESSAGE msg, PARAM p1, PARAM p2)
{
 int rtn;
 DBOX *db = GetParent(wnd)->extension;
 CTLWINDOW *ct = GetControl(wnd);
 if (ct != NULL) {
 switch (msg) {
 case SETFOCUS:
 if (!(int)p1)
 SendMessage(NULL, HIDE_CURSOR, 0, 0);
 case MOVE:
 rtn = BaseWndProc(RADIOBUTTON,wnd,msg,p1,p2);
 SetFocusCursor(wnd);
 return rtn;
 case PAINT: {
 char rb[] = "( )";

 if (ct->setting)
 rb[1] = 7;
 SendMessage(wnd, CLEARTEXT, 0, 0);
 SendMessage(wnd, ADDTEXT, (PARAM) rb, 0);
 SetFocusCursor(wnd);
 break;
 }
 case KEYBOARD:
 if ((int)p1 != ' ')
 break;
 case LEFT_BUTTON:
 PushRadioButton(db, ct->command);
 break;
 default:
 break;
 }
 }
 return BaseWndProc(RADIOBUTTON, wnd, msg, p1, p2);
}
void PushRadioButton(DBOX *db, enum commands cmd)
{
 CTLWINDOW *ct = FindCommand(db, cmd, RADIOBUTTON);
 if (ct != NULL) {
 SetRadioButton(db, ct);
 ct->isetting = ON;
 }
}
void SetRadioButton(DBOX *db, CTLWINDOW *ct)
{
 CTLWINDOW *ctt = db->ctl;
 int i;
 /* --- clear all the radio buttons in this group on the dialog box --- */
 /* -------- build a table of all radio buttons at the
 same x vector ---------- */
 for (i = 0; i < MAXRADIOS; i++)
 rct[i] = NULL;
 while (ctt->class) {
 if (ctt->class == RADIOBUTTON)
 if (ct->dwnd.x == ctt->dwnd.x)
 rct[ctt->dwnd.y] = ctt;
 ctt++;
 }
 /* ----- find the start of the radiobutton group ---- */
 i = ct->dwnd.y;
 while (i >= 0 && rct[i] != NULL)
 --i;
 /* ---- ignore everthing before the group ------ */
 while (i >= 0)
 rct[i--] = NULL;
 /* ----- find the end of the radiobutton group ---- */
 i = ct->dwnd.y;
 while (i < MAXRADIOS && rct[i] != NULL)
 i++;
 /* ---- ignore everthing past the group ------ */
 while (i < MAXRADIOS)
 rct[i++] = NULL;
 for (i = 0; i < MAXRADIOS; i++) {
 if (rct[i] != NULL) {
 int wason = rct[i]->setting;

 rct[i]->setting = OFF;
 if (wason)
 SendMessage(rct[i]->wnd, PAINT, 0, 0);
 }
 }
 ct->setting = ON;
 SendMessage(ct->wnd, PAINT, 0, 0);
}
BOOL RadioButtonSetting(DBOX *db, enum commands cmd)
{
 CTLWINDOW *ct = FindCommand(db, cmd, RADIOBUTTON);
 if (ct != NULL)
 return (ct->setting == ON);
 return FALSE;
}





[LISTING SEVEN]

/* ------------- slidebox.c ------------ */
#include "dflat.h"
static int (*GenericProc)
 (WINDOW wnd,MESSAGE msg,PARAM p1,PARAM p2);
static BOOL KeepRunning;
static int SliderLen;
static int Percent;
extern DBOX SliderBoxDB;
static void InsertPercent(char *s)
{
 int offset;
 char pcc[5];

 sprintf(s, "%c%c%c",
 CHANGECOLOR,
 color[DIALOG][SELECT_COLOR][FG]+0x80,
 color[DIALOG][SELECT_COLOR][BG]+0x80);
 s += 3;
 memset(s, ' ', SliderLen);
 *(s+SliderLen) = '\0';
 sprintf(pcc, "%d%%", Percent);
 strncpy(s+SliderLen/2-1, pcc, strlen(pcc));
 offset = (SliderLen * Percent) / 100;
 memmove(s+offset+4, s+offset, strlen(s+offset)+1);
 sprintf(pcc, "%c%c%c%c",
 RESETCOLOR,
 CHANGECOLOR,
 color[DIALOG][SELECT_COLOR][BG]+0x80,
 color[DIALOG][SELECT_COLOR][FG]+0x80);
 strncpy(s+offset, pcc, 4);
 *(s + strlen(s) - 1) = RESETCOLOR;
}
static int SliderTextProc(
 WINDOW wnd,MESSAGE msg,PARAM p1,PARAM p2)
{
 switch (msg) {
 case PAINT:

 Percent = (int)p2;
 InsertPercent(GetText(wnd) ?
 GetText(wnd) : SliderBoxDB.ctl[1].itext);
 GenericProc(wnd, PAINT, 0, 0);
 if (Percent >= 100)
 SendMessage(GetParent(wnd),COMMAND,ID_CANCEL,0);
 if (!dispatch_message())
 PostMessage(GetParent(wnd), ENDDIALOG, 0, 0);
 return KeepRunning;
 default:
 break;
 }
 return GenericProc(wnd, msg, p1, p2);
}
static int SliderBoxProc(
 WINDOW wnd, MESSAGE msg, PARAM p1, PARAM p2)
{
 int rtn;
 WINDOW twnd;
 switch (msg) {
 case CREATE_WINDOW:
 AddAttribute(wnd, SAVESELF);
 rtn = DefaultWndProc(wnd, msg, p1, p2);
 twnd = SliderBoxDB.ctl[1].wnd;
 GenericProc = twnd->wndproc;
 twnd->wndproc = SliderTextProc;
 KeepRunning = TRUE;
 SendMessage(wnd, CAPTURE_MOUSE, 0, 0);
 SendMessage(wnd, CAPTURE_KEYBOARD, 0, 0);
 return rtn;
 case COMMAND:
 if ((int)p2 == 0 && (int)p1 == ID_CANCEL) {
 if (Percent >= 100 
 YesNoBox("Terminate process?"))
 KeepRunning = FALSE;
 else
 return TRUE;
 }
 break;
 case CLOSE_WINDOW:
 SendMessage(wnd, RELEASE_MOUSE, 0, 0);
 SendMessage(wnd, RELEASE_KEYBOARD, 0, 0);
 break;
 default:
 break;
 }
 return DefaultWndProc(wnd, msg, p1, p2);
}
WINDOW SliderBox(int len, char *ttl, char *msg)
{
 SliderLen = len;
 SliderBoxDB.dwnd.title = ttl;
 SliderBoxDB.dwnd.w =
 max(strlen(ttl),max(len, strlen(msg)))+4;
 SliderBoxDB.ctl[0].itext = msg;
 SliderBoxDB.ctl[0].dwnd.w = strlen(msg);
 SliderBoxDB.ctl[0].dwnd.x =
 (SliderBoxDB.dwnd.w - strlen(msg)-1) / 2;
 SliderBoxDB.ctl[1].itext =

 DFrealloc(SliderBoxDB.ctl[1].itext, len+10);
 Percent = 0;
 InsertPercent(SliderBoxDB.ctl[1].itext);
 SliderBoxDB.ctl[1].dwnd.w = len;
 SliderBoxDB.ctl[1].dwnd.x = (SliderBoxDB.dwnd.w-len-1)/2;
 SliderBoxDB.ctl[2].dwnd.x = (SliderBoxDB.dwnd.w-10)/2;
 DialogBox(NULL, &SliderBoxDB, FALSE, SliderBoxProc);
 return SliderBoxDB.ctl[1].wnd;
}





[LISTING EIGHT]

/* ------------ spinbutt.c ------------- */
#include "dflat.h"
int SpinButtonProc(WINDOW wnd, MESSAGE msg, PARAM p1, PARAM p2)
{
 int rtn;
 CTLWINDOW *ct = GetControl(wnd);
 if (ct != NULL) {
 switch (msg) {
 case CREATE_WINDOW:
 wnd->wd -= 2;
 wnd->rc.rt -= 2;
 break;
 case SETFOCUS:
 rtn = BaseWndProc(SPINBUTTON, wnd, msg, p1, p2);
 if (!(int)p1)
 SendMessage(NULL, HIDE_CURSOR, 0, 0);
 SetFocusCursor(wnd);
 return rtn;
 case PAINT:
 foreground = FrameForeground(wnd);
 background = FrameBackground(wnd);
 wputch(wnd,UPSCROLLBOX,WindowWidth(wnd), 0);
 wputch(wnd,DOWNSCROLLBOX,WindowWidth(wnd)+1,0);
 SetFocusCursor(wnd);
 break;
 case LEFT_BUTTON:
 if (p1 == GetRight(wnd) + 1)
 SendMessage(wnd, KEYBOARD, UP, 0);
 else if (p1 == GetRight(wnd) + 2)
 SendMessage(wnd, KEYBOARD, DN, 0);
 if (wnd != inFocus)
 SendMessage(wnd, SETFOCUS, TRUE, 0);
 return TRUE;
 case LB_SETSELECTION:
 rtn = BaseWndProc(SPINBUTTON, wnd, msg, p1, p2);
 wnd->wtop = (int) p1;
 SendMessage(wnd, PAINT, 0, 0);
 return rtn;
 default:
 break;
 }
 }
 return BaseWndProc(SPINBUTTON, wnd, msg, p1, p2);

}






[LISTING NINE]

/* -------------- text.c -------------- */
#include "dflat.h"
int TextProc(WINDOW wnd, MESSAGE msg, PARAM p1, PARAM p2)
{
 int i, len;
 CTLWINDOW *ct = GetControl(wnd);
 char *cp, *cp2 = ct->itext;
 switch (msg) {
 case PAINT:
 if (ct == NULL 
 ct->itext == NULL 
 GetText(wnd) != NULL)
 break;
 len = min(ct->dwnd.h, MsgHeight(cp2));
 cp = cp2;
 for (i = 0; i < len; i++) {
 int mlen;
 char *txt = cp;
 char *cp1 = cp;
 char *np = strchr(cp, '\n');
 if (np != NULL)
 *np = '\0';
 mlen = strlen(cp);
 while ((cp1=strchr(cp1,SHORTCUTCHAR)) != NULL) {
 mlen += 3;
 cp1++;
 }
 if (np != NULL)
 *np = '\n';
 txt = DFmalloc(mlen+1);
 CopyCommand(txt, cp, FALSE, WndBackground(wnd));
 txt[mlen] = '\0';
 SendMessage(wnd, ADDTEXT, (PARAM)txt, 0);
 if ((cp = strchr(cp, '\n')) != NULL)
 cp++;
 free(txt);
 }
 break;
 default:
 break;
 }
 return BaseWndProc(TEXT, wnd, msg, p1, p2);
}





[LISTING TEN]


/* ----------- watch.c ----------- */

#include "dflat.h"

static int WatchIconProc(
 WINDOW wnd, MESSAGE msg, PARAM p1, PARAM p2)
{
 int rtn;
 switch (msg) {
 case CREATE_WINDOW:
 rtn = DefaultWndProc(wnd, msg, p1, p2);
 SendMessage(wnd, CAPTURE_MOUSE, 0, 0);
 SendMessage(wnd, HIDE_MOUSE, 0, 0);
 SendMessage(wnd, CAPTURE_KEYBOARD, 0, 0);
 return rtn;
 case PAINT:
 SetStandardColor(wnd);
 writeline(wnd, " @ ", 1, 1, FALSE);
 return TRUE;
 case BORDER:
 rtn = DefaultWndProc(wnd, msg, p1, p2);
 writeline(wnd, "M", 2, 0, FALSE);
 return rtn;
 case MOUSE_MOVED:
 SendMessage(wnd, HIDE_WINDOW, TRUE, 0);
 SendMessage(wnd, MOVE, p1, p2);
 SendMessage(wnd, SHOW_WINDOW, 0, 0);
 return TRUE;
 case CLOSE_WINDOW:
 SendMessage(wnd, RELEASE_MOUSE, 0, 0);
 SendMessage(wnd, RELEASE_KEYBOARD, 0, 0);
 SendMessage(wnd, SHOW_MOUSE, 0, 0);
 break;
 default:
 break;
 }
 return DefaultWndProc(wnd, msg, p1, p2);
}

WINDOW WatchIcon(void)
{
 int mx, my;
 WINDOW wnd;
 SendMessage(NULL, CURRENT_MOUSE_CURSOR,
 (PARAM) &mx, (PARAM) &my);
 wnd = CreateWindow(
 BOX,
 NULL,
 mx, my, 3, 5,
 NULL,NULL,
 WatchIconProc,
 VISIBLE HASBORDER SHADOW SAVESELF);
 return wnd;
}




































































July, 1992
STRUCTURED PROGRAMMING


It's Easy, Class!




Jeff Duntemann, KG7JF


Last week someone said that I had to have gone to Catholic school, because she
could read my signature. Guilty! And proud of it.
I've heard Catholic schools blamed for everything from frigidity to morbid
fear of rulers, and I just don't buy any of it. In eight years I never saw a
nun lift a hand against a child, and I suspect it happens more in urban legend
than it ever did in reality.
Come on, already. The nuns worked for God, and God was the guy who turned
Arius the Heretic to worms before he was even dead or anything. The lesson was
not lost on us.
No. What the poor old dears were guilty of was demanding that we work, and
learn, and excel, without regard to things like self-esteem, which (if they
had ever heard the term) they rightfully assumed was something like
self-abuse. They knew that literacy was possible, and expected without
exception that we would become literate.
Hey, it worked.
I'll admit the road was rocky at times. In third grade, Sister Agnes Eileen
explained what definitions were, in that it was always possible to explain a
word in terms of other words so that no one would mistake its meaning.
"It's easy, class!" she said with that boundless Irish enthusiasm. "Like this:
The definition of 'hose' is, 'a rubber pipe.' How hard can that be?"
Not too, we agreed. So she passed out worksheets with a list of words that we
were to define before the lunch bell.
And what was the first word on the list?
"Love."


Drowning in the Stream


I've been fussing with Turbo Vision streams for a while now, and that same old
feeling keeps coming back, that I felt in Room 1 at Immaculate Conception
Grade School. What's easy in theory is not always easy in practice -- and not
everything sums up as easily as "a rubber pipe."
Streams, for instance. Streams are perhaps the most abysmally documented part
of the abysmally documented Turbo Vision, with the sole exception of the
standard dialogs, which are not documented at all. The Turbo Vision Guide is
guilty of what I might as well call the Rubber Pipe Fallacy: Demonstrating
that something is easy by giving a trivial example, and then entirely avoiding
the issue of what happens when truly useful things need to be done.
Having read the TV Guide explanations and run the code examples, I felt that I
understood how stream I/O was to be done. Then I attempted to add stream I/O
to HCALC.PAS.
Hello, wall.
Oh, I figured it out, with the help of some people who make their living
writing Turbo Vision code. And while I freely admit that I'll probably be glad
someday that I learned it (as the nuns would relentlessly remind us), from
here, well...
Let me see if I can save you some grief.


Filing Objects


An early Turbo Pascal disappointment for people who don't read the fine print
in their manuals is that you can't create a FILE OF OBJECT. It seems a little
arbitrary until you think for a while about the nature of objects and the
nature of traditional Pascal file I/O. Record-oriented I/O is easy. A record
is all data, and you can write the whole thing to disk without fear of
violating any beneath-the-surface connections to other parts of the
application. (You may have gotten a hint of the problems with filing objects
if you ever tried to save a linked list -- or, worse, a more complex data
structure -- from the heap to disk with a hope of later bringing it back to
the heap intact.)
The #1 complication with objects and files is that when objects go to disk,
they don't take their code with them. The threads of connection between object
instances and method code are broken in the act of writing an object to disk
as though it were a record, and reconnecting those threads in bringing back
objects from disk is not trivial.
The #2 complication with objects and files is that we'd like to be able to
read and write objects to disk polymorphically. In other words, if we have a
collection of objects of different types, we'd like to be able to iterate over
the collection and write each object to disk without necessarily knowing its
exact type at run time. And that implies that the file system must support
variable-length records and do it well, because the size of all those
different object types is certainly not going to be identical.
The #3 complication with objects and files is that, especially under Turbo
Vision, object-oriented programming hangs heavily on pointers and objects
linked by pointers, in what can become pretty hairy dynamic structures. If
your application makes heavy use of numerous objects on the heap, you'll end
up reconstructing most of the heap every time you read the application and its
objects from disk. This I find ugly work, rather like standing in the dark,
swinging a hammer at a nail you can't see. Now and then you're bound to miss,
and the misses will hurt.


The Streams Solution


Streams were designed to solve these problems, or at least allow them to be
addressed. Streams actually predate Turbo Vision, and were present in Turbo
Pascal 5.5. Because of scheduling and production constraints, streams didn't
quite make it into the OOP Guide, and many people never looked closely enough
at the example programs to discover them.
A stream is itself an object, encapsulating physical file support with the
ability to wrestle objects out to disk and bring them back alive.
It's a two-way street, however. If they are to be written to a stream, objects
must know about streams, and have methods within them to write themselves to a
stream. It works like this: You instruct a stream to put a specified object
onto itself. The stream then instructs that object to store itself as required
onto the stream.
The stream has a pair of virtual methods called Get and Put. Put takes a
single parameter, which can be a pointer to any object type that descends from
TObject. Put puts an object onto the stream. Get is a function method, which
brings an object back from the stream and returns a PObject pointer to that
object.
To be streamable, an object must (among other things) have a pair of virtual
methods called Load and Store. Store writes the object's data onto the stream
by making a call to a method named Write once for each data item in the
object. Similarly, Load brings the object's data back from disk by reading the
object's fields, each with a separate call to a method named Read. And whose
methods are Read and Write? The stream's, of course.
Assume you have an open stream S, and a pointer PP to some object on the heap.
The following call writes object PP onto stream S:
 S.Put(PP);
Whew. Your application calls the stream's method Put. Put calls the Store
method belonging to the object, and Store calls -- perhaps repeatedly -- the
Write method belonging to the stream.
You make the following call to bring back an object from stream S and allocate
it on the heap as the referent of pointer PP:
 PP:=S.Get;
Here, Get calls the Load method of the next object stored on the stream, and
Load recreates the object it belongs to by allocating space for itself on the
heap and then retrieving the stored values of its various fields by repeated
calls to S.Read. Load can do this because Load is a constructor, and
represents an alternate way to build an object, different from your old
familiar Init constructor but ending up with the same result: a new object on
the heap that wasn't there before.

That's how the stream process works from a height. It's easy, class! Well --
sort of.


Preparation


If you're sharp, you may be asking some pretty pointed questions about now.
Like -- how does the stream know which constructor method to call when Get
fetches the next object from the stream? Don't make the naive mistake of
asking how the stream can call the methods of an object that doesn't really
exist yet. The object doesn't yet exist, but its methods are always in the
code segment, whenever the application that uses the objects is running. The
true question is, how does the stream find the right constructor among the
many in the code segment? The short answer is that it has to peek a little,
and for that essential peeking to happen, you have to set things up just so.
First and most fundamental, to use the stream's mechanisms to store objects as
objects, those objects must be descended from TObject. In other words,
virtually all objects in Turbo Vision are already eligible, because they all
descend directly or indirectly from TObject. However, if you create your own
"mute objects" (my TMortgage object type from HCALC is a good example), you
must explicitly make them descend from TObject.
The reason for this may surprise you. Most of the time you make an object
descend from a particular parent object in order to inherit some particular
methods or fields from the parent object. In this case, what your objects
inherit from TObject is not any specific method or field (in fact, TObject has
no fields of its own) but only an assurance that the first field in the child
object will be the pointer to its virtual method table (VMT).


VMTs First!


The TObject type may well have some other purpose than to guarantee the
position of the VMT pointer, but in truth I've never heard of one. Some quick
recap here on VMTs: All virtual method tables are present in Turbo Pascal's
single data segment, and there is one VMT there for every object type that
contains virtual methods. Every instance of that object type contains a 16-bit
pointer to its VMT in the data segment. This pointer, which we call the VMT
pointer, is nothing more than the offset into the data segment at which the
VMT itself exists.
Consider this: Unless an object has virtual methods, it has no VMT and hence
no VMT pointer. If a child object descends from a parent object without a VMT,
but the child object defines one or more virtual methods, a VMT pointer will
be added to the child object's structure. However, the parent object's fields
will be present in the object's image before the VMT pointer is. In other
words, if you're mapping out an object's fields in memory, the parent's fields
will exist at lower memory addresses than the child object's VMT pointer.
Now, TObject has no fields of its own at all. It does have a virtual method,
however (its Done destructor), and therefore it has a VMT. Because there are
no fields in TObject to come before the VMT pointer, the VMT pointer is right
there at offset 0 from the start of the object. And this will always be the
case in any object that inherits from TObject, because parent fields are
always "ahead" of child fields in the object's image.
Therefore, if some object you define ultimately descends from TObject as the
root of its inheritance tree, your object is guaranteed to have a VMT pointer
at the very start of its image. This is important, because (as I'll explain a
little later) the stream doesn't have the ability to go searching through an
object's fields to find its VMT pointer. The VMT pointer must be in a totally
predictable place -- like at the very start of any object -- to be considered
streamable. This is the reason that all streamable objects must trace back to
TObject as their ultimate ancestor.


The Registration Record


Another requirement for a streamable object type is that it be registered with
the stream. This sounds more exotic than it actually is. When an object type
is registered with a stream, it only means that the stream has obtained a
small amount of information about that object type. This information allows
the stream to connect an incoming object with its methods and its VMT, none of
which go out to disk with the object's fields.
When you define an object type and want to make it streamable, you must also
define a registration record for that object type. This record is usually
created as a typed constant, since once defined, it's generally not altered at
run time. The record's definition is shown in Example 1.
Example 1: The stream-registration record definition.

 TStreamRec = RECORD
 ObjType : Word; { You define a unique code for this field }
 VMTLink : Word; { The offset of the type's VMT in the dataseg }
 Load : Pointer; { The full address of the type's Load method }
 Store : Pointer { The full address of the type's Store method }
 END;

You must define one of these records for each object type you intend to make
streamable. The ObjType field is -- literally -- key; it's a unique code that
you the programmer define, and cannot be present in any other registration
record. It is how the stream tells registration records apart, and how it
identifies the one it needs. Any word-sized value greater than 99 is legal
here. I picked 1100 out of my hat when streamizing HCALC, and started
numbering my registration records from that value.
The VMTLink field contains the offset portion of the registered object type's
VMT pointer. This can be derived by using the built-in TypeOf function, which
returns a 32-bit pointer to the VMT belonging to the object type passed as its
parameter. The segment portion of the pointer is discarded, and only the
offset is used. See Example 2 for the actual syntax I used to derive the
VMTLink value for my TMortgage type. You'll need a typed constant definition
like this for every object type you intend to make streamable. (Note that
Turbo Vision provides its own registration records for all of its provided
types. You create registration records only for object types that you create
from scratch or derive from the "stock" object types.)
Example 2: The registration record for TMortgage.

 CONST
 RMortgage : TStreamRec =
 (ObjType : 1200;
 VMTLink : Ofs (TypeOf (TMortgage)^);
 Load : @TMortgage.Load;
 Store : @TMortgage.Store);

Finally, the Load and Store fields simply contain pointers to the Load and
Store methods of the registered type. The address-of operator @ is used to
derive these pointers; see Example 2.
The stream-registration system contains a serious design pitfall: There's no
promise that the ObjType you select for your own objects won't conflict with
objects you may be using that were designed by others. It's particularly
sticky when you don't have the source to the objects you're using and are
linking from TPUs. You can read the ObjType code from a source-code file, but
I know of no easy way to divine the ObjType codes embedded in a compiled .TPU
file. Keep this in mind, since the compiler will not warn you when two ObjType
codes conflict!


Registering Types


Defining a registration record for an object type is not enough. You must
explicitly pass that record to a registration procedure in order to register
the object type. It's easy enough to do: RegisterType(RMortgage);
The RegisterType procedure is global if you USE the OBJECTS.TPU unit, and it
adds your stream-registration record to a list of such records that it
maintains. Once registered through RegisterType, your object type is
registered for any and all stream objects your application uses.
Another of the countless sources of confusion in Turbo Vision is that while
Turbo Vision provides stream-registration records for all of the "stock"
object types, you must still explicitly register all object types you intend
to write to a stream. This includes all the collections and views and controls
provided with the product, not only those that you subclass and modify!
Fortunately, there are canned routines that gather together the registration
calls from each unit and allow you to register all types defined in that unit
with one call. Look in the interface of each unit for a routine beginning with
Register, such as RegisterApp, RegisterMenus, and so on.
My opinion is that all this rigmarole is totally unnecessary. The runtime
library could easily handle registration by itself, beneath the surface,
including creating its own registration records and assigning unique ObjType
codes. (The RTL is actually the only entity with enough information about an
application to avoid ObjType code collisions.) This is just another area where
Turbo Vision's black box needs to be a great deal blacker.



Going Out to Disk


Since Turbo Vision insists on keeping all of this registration stuff in your
face while you work with streams, you might as well understand what all the
funny numbers do. In particular, knowing how streams work internally can be
essential when you're trying to debug a streams problem that seems like it
came from Mars.
In the first step in the Put process (writing an object onto a stream) the
stream fetches the VMT pointer from offset 0 of the object to be written onto
the stream. This is the benefit of having all streamable objects descend from
TObject -- the stream doesn't have to search for the VMT pointer. The stream
searches its linked list of registration records to find a registration record
containing a VMT reference that matches the VMT pointer in the object. It then
takes the ObjType code from the found registration record and writes this out
to the stream as a sort of "who am I" header value. This header value is
crucial when we go to read the object back into memory, as I'll discuss
shortly.
The VMT pointer cannot itself be used as the ObjType code, in case you suspect
(as I did) that the VMT pointer makes a separate ObjType code unnecessary.
It's true that every object type has its own unique VMT, and thus within a
single application every VMT pointer should in fact be unique. The kicker is
this: You may want to write out a stream of objects from one application and
read them back in to another application. And while the second application
must have the code linked into itself for any objects it reads from a stream,
there's nothing to indicate the order in which those objects were linked into
the second application. All the VMTs have to be there in the first part of the
data segment, but they do not have to be in the same order. Hence, a VMT
pointer is not enough to identify an object uniquely from one application to
another.
The ObjType code-header value serves to say, "What follows is a Widget object
type." The stream then calls the object's Store method to write the individual
data fields of the object to the stream. Store, in turn, uses the stream's
Write method to write the individual fields to the stream.


Who Calls Store?


Another Turbo Vision confusion seed: In a couple of places in the TV Guide,
you're told that you never call the Load or Store methods directly.
Wrong!
If you derive a new object type from an existing object type that has its own
Load or Store methods, your child object's Load and Store methods must call
its parent object's Load and Store methods. Each object takes care of writing
its own fields to the stream -- and Put calls the child object's Store method,
which only writes the fields defined within the child object. If the parent
object (or grandparent object, or further ancestor objects up the line) has
its own fields, it must take responsibility for writing those fields to the
stream.
So when you write a Store method for some object that descends from an object
that has its own Store method, you must call the parent's Store method before
writing any of your own fields to the stream. It's very much like calling the
parent object's constructor before executing your own constructor code, so
that the parent object can get its own house built before you build yours.
So the TV Guide has it half right, sorta. It's true that you never call your
own object's Load or Store methods directly. But it's just as true that you
must call the parent object's Load and Store methods from within your own Load
and Store methods if you expect the stream mechanism to work correctly. I'll
provide a solid code example next month, once I've laid a little more
groundwork.


Bringing an Object Back from Disk


When you call a stream's Get method to bring the next object in from the
stream, Get reads the first word from the stream and assumes it to be the
ObjType code of the object that follows. It looks up the code in its list of
registration records until it finds a registration record containing a
matching ObjType code. Recall that the registration record contains a full
32-bit pointer to the object's Load method. The stream can thus use this
pointer to call the object's own Load without having yet read the object in
from the stream.
The Load method (which is a constructor) allocates space on the heap for the
new object and begins reading it in from the stream, field by field, using the
stream's Read method. And while I don't see it mentioned in the TV Guide
anywhere, the object gets a new VMT pointer when it comes in from disk.
Remember, an object can be read from a stream into any application that
contains the code implementing that object. The VMTs in different applications
may be in a different order and at different offsets from the start of the
data segment. So the newly read object gets a new VMT pointer from the VMTLink
field of its registration record.


Partway There


Well, I'm bumping my head on my word count, and we're a long way from getting
streams under control. This column is at best an overview of how streams
operate -- and as such, it's probably pretty easy to understand. The worst
part of streams is all the little details and the multitude of ways to go
wrong. We'll take on some of those next month, and start to see how
streamability can be added to MORTGAGE.PAS and finally HCALC.PAS.


_STRUCTURED PROGRAMMING COLUMN_
by Jeff Duntemann

[Example 1: The stream registration record definition]


TStreamRec = RECORD
 ObjType : Word; { You define a unique code for this field }
 VMTLink : Word; { The offset of the type's VMT in the dataseg }
 Load : Pointer; { The full address of the type's Load method }
 Store : Pointer { The full address of the type's Store method }
END;






[Example 2: The registration record for TMortgage]

CONST
 RMortgage : TStreamRec =
 (ObjType : 1200;
 VMTLink : Ofs(TypeOf(TMortgage)^);
 Load : @TMortgage.Load;
 Store : @TMortgage.Store);
































































July, 1992
GRAPHICS PROGRAMMING


3-D Shading


 This article contains the following executables: XSHRP20.ZIP


Michael Abrash


This month, we return to X-Sharp, the real-time animation package that we
started developing in January. When last we saw X-Sharp, it had just acquired
basic hidden-surface capability, and performance had been vastly improved
through the use of fixed-point arithmetic. This month, we're going to add
quite a bit more: support for 8088 and 80286 PCs, a general color model, and
shading. That's an awful lot to cover in one column (actually, it'll spill
over into the next column), so let's get to it!


Sub-386 Support


To date, X-Sharp has run on only the 386 and 486, because it uses 32-bit
multiply and divide instructions that sub-386 processors don't support. I
chose 32-bit instructions for two reasons: They're much faster for 16.16
fixed-point arithmetic than any approach that works on the 8088 and 286; and
they're much easier to implement than any other approach. In short, I was
after maximum performance, and I was perhaps just a little lazy.
I should have known better than to try to sneak this one by you. The most
common feedback I've gotten on X-Sharp is that I should make it support the
8088 and 286. Well, I can take a hint as well as the next guy. Listing One
(page 150) contains the latest versions of the FixedMul() and FixedDiv()
functions, in a form that can be conditionally assembled to use either
386-specific or generic 8088 instructions. The complete new version of
FIXED.ASM, containing dual 386/8088 versions of CosSin(), XformVec(), and
ConcatX-forms(), as well as FixedMul() and FixedDiv(), can be found in the
full X-Sharp archive, the availability of which is described further on.
Given the new version of FIXED.ASM, with USE386 set to 0, X-Sharp will run on
any processor. That's not to say that it will run fast on any processor, or at
least not as fast as it used to. The switch to 8088 instructions makes
X-Sharp's fixed-point calculations about 2.5 times slower overall. Since a PC
is perhaps 40 times slower than a 486/33, we're talking about a hundred-times
speed difference from the low end to the high end. A 486/33 can animate a
72-sided ball, complete with shading (as discussed later), at 60 frames per
second (fps), with plenty of cycles to spare; an 8-MHz AT can animate the same
ball at about 6 fps. Clearly, the level of animation an application uses must
be tailored to the available CPU horsepower.
The implementation of a 32-bit multiply using 8088 instructions is a simple
matter of adding together four partial products. A 32-bit divide is not so
simple, however. In fact, in Listing One I've chosen not to implement a full
32x32 divide, but rather only a 32x16 divide. The reason is simple:
performance. A 32x16 divide can be implemented on an 8088 with two DIV
instructions, but a 32x32 divide takes a great deal more work, so far as I can
see. (If anyone has a fast 32x32 divide, or has a faster way to handle signed
multiplies and divides than the approach taken by Listing One, please drop me
a line.) In X-Sharp, division is used only to divide either X or Y by Z in the
process of projecting from view space to screen space, so the cost of using a
32x16 divide is merely some inaccuracy in calculating screen coordinates,
especially when objects get very close to the Z = 0 plane. This error is not
cumulative (doesn't carry over to later frames), and in my experience doesn't
cause noticeable image degradation; therefore, given the already slow
performance of the 8088 and 286, I've opted for performance over precision.
At any rate, please keep in mind that the non-386 version of FixedDiv() is not
a general-purpose 32x32 fixed-point division routine. In fact, it will
generate a divide-by-zero error if passed a fixed-point divisor between -1 and
1. As I've explained, the non-386 version of Fixed-Div() is designed to do
just what X-Sharp needs, and no more, as quickly as possible.


Shading


So far, the polygons out of which our animated objects have been built have
had colors of fixed intensities. For example, a face of a cube might be blue,
or green, or white, but whatever color it is, that color never brightens or
dims. Fixed colors are easy to implement, but they don't make for very
realistic animation. In the real world, the intensity of the color of a
surface varies depending on how brightly it is illuminated. The ability to
simulate the illumination of a surface, or shading, is the next feature we'll
add to X-Sharp.
The overall shading of an object is the sum of several types of shading
components. Ambient shading is illumination by what you might think of as
background light, light that's coming from all directions; all surfaces are
equally illuminated by ambient light, regardless of their orientation.
Directed lighting, producing diffuse shading, is illumination from one or more
specific light sources. Directed light has a specific direction, and the angle
at which it strikes a surface determines how brightly it lights that surface.
Specular reflection is the tendency of a surface to reflect light in a
mirrorlike fashion. There are other sorts of shading components, including
transparency and atmospheric effects, but the ambient- and diffuse-shading
components are all we're going to deal with for a while, with specular shading
not too far in the future.


Ambient Shading


The basic model for both ambient and diffuse shading is a simple one. Each
surface has a reflectivity between 0 and 1, where 0 means all light is
absorbed and 1 means all light is reflected. A certain amount of light energy
strikes each surface. The energy (intensity) of the light is expressed such
that if light of intensity 1 strikes a surface with reflectivity 1, then the
brightest possible shading is displayed for that surface. Complicating this
somewhat is the need to support color; we do this by separating reflectance
and shading into three components each -- red, green, and blue -- and
calculating the shading for each color component separately for each surface.
Given an ambient-light red intensity of IA[red] and a surface red reflectance
the displayed red ambient shading for that surface, as a fraction of the
maximum red intensity, is simply min(IA[red]x R[red], 1). The green and blue
color components are handled similarly. That's really all there is to ambient
shading, although of course we must design some way to map displayed color
components into the available palette of colors, which I'll discuss next
month. Ambient shading isn't the whole shading picture, though. In fact,
scenes tend to look pretty bland without diffuse shading.


Diffuse Shading


Diffuse shading is more complicated than ambient shading, because the
effective intensity of directed light falling on a surface depends on the
angle at which it strikes the surface. According to Lambert's law, the light
energy from a directed light source striking a surface is proportional to the
cosine of the angle at which it strikes the surface, with the angle measured
relative to a vector perpendicular to the polygon (a polygon normal), as shown
in Figure 1. If the red intensity of directed light is ID[red], the red
reflectance of the surface is R[red], and the angle between the incoming
directed light and the surface's normal is theta, then the displayed red
diffuse shading for that surface, as a fraction of the largest possible red
intensity, is min (ID[red] X R[red] X cos(theta), 1).
That's easy enough to calculate -- but seemingly slow. Determining the cosine
of an angle can be sped up with a table lookup, but there's also the task of
figuring out the angle, and, all in all, it doesn't seem that diffuse shading
is going to be speedy enough for our purposes. Consider this, however:
According to the properties of the dot product (denoted by the operator "*",
as shown in Figure 2), cos(theta)=(v*w)/ vxw ), where v and w are vectors,
theta is the angle between v and w, and v is the length of v. Suppose, now,
that v and w are unit vectors; that is, vectors exactly one unit long. Then
the above equation reduces to cos(theta)=v*w. In other words, we can calculate
the cosine between N, the unit-normal vector (one-unit-long perpendicular
vector) of a polygon, and L', the reverse of a unit vector describing the
direction of a light source, with just three multiplies and two adds. (The
reason the light-direction vector must be reversed is explained later.) Once
we have that, we can easily calculate the red diffuse shading from a directed
light source as min(ID[red] x R[red] x (L' * N), 1) and likewise for the green
and blue color components.
The overall red shading for each polygon can be calculated by summing the
ambient-shading red component with the diffuse-shading component from each
light source, as in min((IA[red] x R[red]) + (ID[red0]xR[red]x (L[0]' * N), +
(ID[red1] x R[red] x (L[1]' * N)) +...), 1) where ID[red0] and L[0]' are the
red intensity and the reversed unit-direction vector, respectively, for
spotlight 0. Listing Two, page 152, shows the X-Sharp module DRAWPOBJ.C, which
performs ambient and diffuse shading. Toward the bottom, you will find the
code that performs shading exactly as described by the above equation, first
calculating the ambient red, green, and blue shadings, then summing that with
the diffuse red, green, and blue shadings generated by each directed light
source.


Shading: Implementation Details


In order to calculate the cosine of the angle between an incoming light source
and a polygon's unit normal, we must first have the polygon's unit normal.
This could be calculated by generating a cross-product on two polygon edges to
generate a normal, then calculating the normal's length and scaling to produce
a unit normal. Unfortunately, that would require taking a square root, so it's
not a desirable course of action. Instead, I've made a change to X-Sharp's
polygon format. Now, the first vertex in a shaded polygon's vertex list is the
end-point of a unit normal that starts at the second point in the polygon's
vertex list, as shown in Figure 3. The first point isn't one of the polygon's
vertices, but is used only to generate a unit normal. The second point,
however, is a polygon vertex. Calculating the difference vector between the
first and second points yields the polygon's unit normal. Adding a unit-normal
endpoint to each polygon isn't free; each of those end-points has to be
transformed, along with the rest of the vertices, and that takes time. Still,
it's faster than calculating a unit normal for each polygon from scratch.
We also need a unit vector for each directed light source. The directed light
sources I've implemented in X-Sharp are spotlights; that is, they're
considered to be point light sources that are infinitely far away. This allows
the simplifying assumption that all light rays from a spotlight are parallel
and of equal intensity throughout the displayed universe, so each spotlight
can be represented with a single unit vector and a single intensity. The only
trick is that in order to calculate the desired cos(theta) between the polygon
unit normal and a spotlight's unit vector, the direction of the spotlight's
unit vector must be reversed, as shown in Figure 4. This is necessary because
the dot product implicitly places vectors with their start points at the same
location when it's used to calculate the cosine of the angle between two
vectors. The light vector is incoming to the polygon surface, and the unit
normal is outbound, so only by reversing one vector or the other will we get
the cosine of the desired angle.
Given the two unit vectors, it's a piece of cake to calculate intensities, as
shown in Listing Two. The sample program DEMO1, in the X-Sharp archive (built
by running K1.BAT), puts the shading code to work displaying a rotating ball
with ambient lighting and three spot lighting sources that the user can turn
on and off. What you'll see when you run DEMO1 is that the shading is very
good -- face colors change very smoothly indeed -- so long as only green
lighting sources are on. However, if you combine spotlight two, which is blue,
with any other light source, polygon colors will start to shift abruptly and
unevenly. As configured in the demo, the palette supports a wide range of
shading intensities for a pure version of any one of the three primary colors,
but a very limited number of intensity steps (four, in this case) for each
color component when two or more primary colors are mixed. While this
situation can be improved, it is fundamentally a result of the restricted
capabilities of the 256-color palette, and there is only so much that can be
done without a larger color set. Next month, I'll talk about some ways to
improve the quality of 256-color shading.
Shading problems pretty much vanish in 15-bpp or better modes, such as the
32K-color mode supported by the Sierra Hicolor DAC. I've designed X-Sharp's
color model looking forward to this emerging generation of highly
color-capable adapters, and DEMO1 gives you a taste of how terrific shading
will look on such adapters.
More on colors next month.



Where to get X-Sharp


The full source for X-Sharp is available electronically as XSHARPn.ARC in the
DDJ Forum on CompuServe, on M&T Online, and in the graphic.disp conference on
Bix (as a ZIP file). Alternatively, you can send me a 360K or 720K formatted
diskette and an addressed, stamped diskette mailer, care of DDJ, 411 Borel
Ave., San Mateo, CA 94402, and I'll send you the latest copy of X-Sharp.
There's no charge, but it'd be very much appreciated if you'd slip in a dollar
or so to help out the folks at the Vermont Association for the Blind and
Visually Impaired. Your response so far has been great. I just took a bundle
of checks and money over to VABVI this week, and they were very happy indeed.
Thanks!
I'm available on a daily basis to discuss X-Sharp on M&T Online and Bix (user
name mabrash in both cases).


Graphics Debugging Update


A while back, I lamented that Turbo Debugger had some annoying quirks when it
came to debugging graphics in a dual-monitor set-up (using the -do switch).
However, most of these quirks can be worked around by combining two of TD's
display-handling modes. As I reported, TD blocks manual access (performed via
the I/O menu in the CPU window) to some of the VGA's registers, and sometimes
alters the page flipping and other registers when break-points occur. As it
turns out, however, all that only happens in Smart display mode. If you use
None display mode, TD doesn't touch any VGA registers at all. The drawback to
using None mode is that in None mode, for some reason, program output via DOS
functions goes to the debugging screen rather than the target screen. However,
you can actively switch between Smart and None mode via the Options menu, so a
reasonably complete solution is to start out in Smart mode, then switch to
None mode when you encounter problems in manually accessing the VGA's
registers from TD, or if you think TD is otherwise interfering with the state
of the VGA.


Recommended Reading


Andrew Glassner's Graphics Gems (Academic Press, 1990) is an oddly enjoyable
book. Odd, because there's no overall coherency to the book; it's a collection
of more than 100 largely unrelated contributions by various authors on a
hodgepodge of graphics subjects. Enjoyable, because it's that rarest sort of
graphics programming book: one that you can open at random and start reading
for fun. A good example of the nature of Graphics Gems is a chapter on mapping
RGB colors into a 4-bit color space; this chapter features somewhat arcane
theory, an interesting perspective on color space, and a fast technique for
RGB mapping in 16-color modes. On balance the chapter is a little uneven, but
useful, informative, and interesting -- a description that would serve well
for Graphics Gems as a whole, as well.


_GRAPHICS PROGRAMMING COLUMN_
by Michael Abrash


[LISTING ONE]

; Fixed point multiply and divide routines. Tested with TASM 3.0.
USE386 equ 1 ;1 for 386-specific opcodes, 0 for 8088 opcodes
MUL_ROUNDING_ON equ 1 ;1 for rounding on multiplies, 0 for no
 ; rounding. Not rounding is faster, rounding is
 ; more accurate and generally a good idea
DIV_ROUNDING_ON equ 0 ;1 for rounding on divides, 0 for no rounding.
 ; Not rounding is faster, rounding is more
 ; accurate, but because division is only
 ; performed to project to the screen, rounding
 ; quotients generally isn't necessary
ALIGNMENT equ 2
 .model small
 .386
 .code
;=====================================================================
; Multiplies two fixed-point values together. C near-callable as:
; Fixedpoint FixedMul(Fixedpoint M1, Fixedpoint M2);
FMparms struc
 dw 2 dup(?) ;return address & pushed BP
M1 dd ?
M2 dd ?
FMparms ends
 align ALIGNMENT
 public _FixedMul
_FixedMul proc near
 push bp
 mov bp,sp
if USE386
 mov eax,[bp+M1]
 imul dword ptr [bp+M2] ;multiply
if MUL_ROUNDING_ON
 add eax,8000h ;round by adding 2^(-17)
 adc edx,0 ;whole part of result is in DX
endif ;MUL_ROUNDING_ON

 shr eax,16 ;put the fractional part in AX
else ;!USE386
 ;do four partial products and add them
 ; together, accumulating the result in CX:BX
 push si ;preserve C register variables
 push di
 ;figure out signs, so we can use unsigned multiplies
 sub cx,cx ;assume both operands positive
 mov ax,word ptr [bp+M1+2]
 mov si,word ptr [bp+M1]
 and ax,ax ;first operand negative?
 jns CheckSecondOperand ;no
 neg ax ;yes, so negate first operand
 neg si
 sbb ax,0
 inc cx ;mark that first operand is negative
CheckSecondOperand:
 mov bx,word ptr [bp+M2+2]
 mov di,word ptr [bp+M2]
 and bx,bx ;second operand negative?
 jns SaveSignStatus ;no
 neg bx ;yes, so negate second operand
 neg di
 sbb bx,0
 xor cx,1 ;mark that second operand is negative
SaveSignStatus:
 push cx ;remember sign of result; 1 if result
 ; negative, 0 if result nonnegative
 push ax ;remember high word of M1
 mul bx ;high word M1 times high word M2
 mov cx,ax ;accumulate result in CX:BX (BX not used until
 ; next operation) assume no overflow into DX
 mov ax,si ;low word M1 times high word M2
 mul bx
 mov bx,ax
 add cx,dx ;accumulate result in CX:BX
 pop ax ;retrieve high word of M1
 mul di ;high word M1 times low word M2
 add bx,ax
 adc cx,dx ;accumulate result in CX:BX
 mov ax,si ;low word M1 times low word M2
 mul di
if MUL_ROUNDING_ON
 add ax,8000h ;round by adding 2^(-17)
 adc bx,dx
else ;!MUL_ROUNDING_ON
 add bx,dx ;don't round
endif ;MUL_ROUNDING_ON
 adc cx,0 ;accumulate result in CX:BX
 mov dx,cx
 mov ax,bx
 pop cx
 and cx,cx ;is the result negative?
 jz FixedMulDone ;no, we're all set
 neg dx ;yes, so negate DX:AX
 neg ax
 sbb dx,0
FixedMulDone:
 pop di ;restore C register variables

 pop si
endif ;USE386
 pop bp
 ret
_FixedMul endp
;=====================================================================
; Divides one fixed-point value by another. C near-callable as:
; Fixedpoint FixedDiv(Fixedpoint Dividend, Fixedpoint Divisor);
FDparms struc
 dw 2 dup(?) ;return address & pushed BP
Dividend dd ?
Divisor dd ?
FDparms ends
 align ALIGNMENT
 public _FixedDiv
_FixedDiv proc near
 push bp
 mov bp,sp
if USE386
if DIV_ROUNDING_ON
 sub cx,cx ;assume positive result
 mov eax,[bp+Dividend]
 and eax,eax ;positive dividend?
 jns FDP1 ;yes
 inc cx ;mark it's a negative dividend
 neg eax ;make the dividend positive
FDP1: sub edx,edx ;make it a 64-bit dividend, then shift
 ; left 16 bits so that result will be in EAX
 rol eax,16 ;put fractional part of dividend in
 ; high word of EAX
 mov dx,ax ;put whole part of dividend in DX
 sub ax,ax ;clear low word of EAX
 mov ebx,dword ptr [bp+Divisor]
 and ebx,ebx ;positive divisor?
 jns FDP2 ;yes
 dec cx ;mark it's a negative divisor
 neg ebx ;make divisor positive
FDP2: div ebx ;divide
 shr ebx,1 ;divisor/2, minus 1 if the divisor is
 adc ebx,0 ; even
 dec ebx
 cmp ebx,edx ;set Carry if the remainder is at least
 adc eax,0 ; half as large as the divisor, then
 ; use that to round up if necessary
 and cx,cx ;should the result be made negative?
 jz FDP3 ;no
 neg eax ;yes, negate it
FDP3:
else ;!DIV_ROUNDING_ON
 mov edx,[bp+Dividend]
 sub eax,eax
 shrd eax,edx,16 ;position so that result ends up
 sar edx,16 ; in EAX
 idiv dword ptr [bp+Divisor]
endif ;DIV_ROUNDING_ON
 shld edx,eax,16 ;whole part of result in DX;
 ; fractional part is already in AX
else ;!USE386
;NOTE!!! Non-386 division uses a 32-bit dividend but only the upper 16 bits

; of the divisor; in other words, only the integer part of the divisor is
; used. This is done so that the division can be accomplished with two fast
; hardware divides instead of a slow software implementation, and is (in my
; opinion) acceptable because division is only used to project points to the
; screen (normally, the divisor is a Z coordinate), so there's no cumulative
; error, although there will be some error in pixel placement (the magnitude
; of the error is less the farther away from the Z=0 plane objects are). This
; is *not* a general-purpose divide, though; if the divisor is less than 1,
; for instance, a divide-by-zero error will result! For this reason, non-386
; projection can't be performed for points closer to the viewpoint than Z=1.
 ;figure out signs, so we can use
 ; unsigned divisions
 sub cx,cx ;assume both operands positive
 mov ax,word ptr [bp+Dividend+2]
 and ax,ax ;first operand negative?
 jns CheckSecondOperandD ;no
 neg ax ;yes, so negate first operand
 neg word ptr [bp+Dividend]
 sbb ax,0
 inc cx ;mark that first operand is negative
CheckSecondOperandD:
 mov bx,word ptr [bp+Divisor+2]
 and bx,bx ;second operand negative?
 jns SaveSignStatusD ;no
 neg bx ;yes, so negate second operand
 neg word ptr [bp+Divisor]
 sbb bx,0
 xor cx,1 ;mark that second operand is negative
SaveSignStatusD:
 push cx ;remember sign of result; 1 if result
 ; negative, 0 if result nonnegative
 sub dx,dx ;put Dividend+2 (integer part) in DX:AX
 div bx ;first half of 32/16 division, integer part
 ; divided by integer part
 mov cx,ax ;set aside integer part of result
 mov ax,word ptr [bp+Dividend] ;concatenate the fractional part of
 ; the dividend to the remainder (fractional
 ; part) of the result from dividing the
 ; integer part of the dividend
 div bx ;second half of 32/16 division
if DIV_ROUNDING_ON EQ 0
 shr bx,1 ;divisor/2, minus 1 if the divisor is
 adc bx,0 ; even
 dec bx
 cmp bx,dx ;set Carry if the remainder is at least
 adc ax,0 ; half as large as the divisor, then
 adc cx,0 ; use that to round up if necessary
endif ;DIV_ROUNDING_ON
 mov dx,cx ;absolute value of result in DX:AX
 pop cx
 and cx,cx ;is the result negative?
 jz FixedDivDone ;no, we're all set
 neg dx ;yes, so negate DX:AX
 neg ax
 sbb dx,0
FixedDivDone:
endif ;USE386
 pop bp
 ret

_FixedDiv endp
 end





[LISTING TWO]

/* Draws all visible faces in the specified polygon-based object. The object
 must have previously been transformed and projected, so that all vertex
 arrays are filled in. Ambient and diffuse shading are supported. */
#include "polygon.h"

void DrawPObject(PObject * ObjectToXform)
{
 int i, j, NumFaces = ObjectToXform->NumFaces, NumVertices;
 int * VertNumsPtr, Spot;
 Face * FacePtr = ObjectToXform->FaceList;
 Point * ScreenPoints = ObjectToXform->ScreenVertexList;
 PointListHeader Polygon;
 Fixedpoint Diffusion;
 ModelColor ColorTemp;
 ModelIntensity IntensityTemp;
 Point3 UnitNormal, *NormalStartpoint, *NormalEndpoint;
 long v1, v2, w1, w2;
 Point Vertices[MAX_POLY_LENGTH];

 /* Draw each visible face (polygon) of the object in turn */
 for (i=0; i<NumFaces; i++, FacePtr++) {
 /* Remember where we can find the start and end of the polygon's
 unit normal in view space, and skip over the unit normal endpoint
 entry. The end and start points of the unit normal to the polygon
 must be the first and second entries in the polgyon's vertex list.
 Note that the second point is also an active polygon vertex */
 VertNumsPtr = FacePtr->VertNums;
 NormalEndpoint = &ObjectToXform->XformedVertexList[*VertNumsPtr++];
 NormalStartpoint = &ObjectToXform->XformedVertexList[*VertNumsPtr];
 /* Copy over the face's vertices from the vertex list */
 NumVertices = FacePtr->NumVerts;
 for (j=0; j<NumVertices; j++)
 Vertices[j] = ScreenPoints[*VertNumsPtr++];
 /* Draw only if outside face showing (if the normal to the polygon
 in screen coordinates points toward the viewer; that is, has a
 positive Z component) */
 v1 = Vertices[1].X - Vertices[0].X;
 w1 = Vertices[NumVertices-1].X - Vertices[0].X;
 v2 = Vertices[1].Y - Vertices[0].Y;
 w2 = Vertices[NumVertices-1].Y - Vertices[0].Y;
 if ((v1*w2 - v2*w1) > 0) {
 /* It is facing the screen, so draw */
 /* Appropriately adjust the extent of the rectangle used to
 erase this object later */
 for (j=0; j<NumVertices; j++) {
 if (Vertices[j].X >
 ObjectToXform->EraseRect[NonDisplayedPage].Right)
 if (Vertices[j].X < SCREEN_WIDTH)
 ObjectToXform->EraseRect[NonDisplayedPage].Right =
 Vertices[j].X;

 else ObjectToXform->EraseRect[NonDisplayedPage].Right =
 SCREEN_WIDTH;
 if (Vertices[j].Y >
 ObjectToXform->EraseRect[NonDisplayedPage].Bottom)
 if (Vertices[j].Y < SCREEN_HEIGHT)
 ObjectToXform->EraseRect[NonDisplayedPage].Bottom =
 Vertices[j].Y;
 else ObjectToXform->EraseRect[NonDisplayedPage].Bottom=
 SCREEN_HEIGHT;
 if (Vertices[j].X <
 ObjectToXform->EraseRect[NonDisplayedPage].Left)
 if (Vertices[j].X > 0)
 ObjectToXform->EraseRect[NonDisplayedPage].Left =
 Vertices[j].X;
 else ObjectToXform->EraseRect[NonDisplayedPage].Left=0;
 if (Vertices[j].Y <
 ObjectToXform->EraseRect[NonDisplayedPage].Top)
 if (Vertices[j].Y > 0)
 ObjectToXform->EraseRect[NonDisplayedPage].Top =
 Vertices[j].Y;
 else ObjectToXform->EraseRect[NonDisplayedPage].Top=0;
 }
 /* See if there's any shading */
 if (FacePtr->ShadingType == 0) {
 /* No shading in effect, so just draw */
 DRAW_POLYGON(Vertices, NumVertices, FacePtr->ColorIndex, 0, 0);
 } else {
 /* Handle shading */
 /* Do ambient shading, if enabled */
 if (AmbientOn && (FacePtr->ShadingType & AMBIENT_SHADING)) {
 /* Use the ambient shading component */
 IntensityTemp = AmbientIntensity;
 } else {
 SET_INTENSITY(IntensityTemp, 0, 0, 0);
 }
 /* Do diffuse shading, if enabled */
 if (FacePtr->ShadingType & DIFFUSE_SHADING) {
 /* Calculate the unit normal for this polygon, for use in dot
 products */
 UnitNormal.X = NormalEndpoint->X - NormalStartpoint->X;
 UnitNormal.Y = NormalEndpoint->Y - NormalStartpoint->Y;
 UnitNormal.Z = NormalEndpoint->Z - NormalStartpoint->Z;
 /* Calculate the diffuse shading component for each active
 spotlight */
 for (Spot=0; Spot<MAX_SPOTS; Spot++) {
 if (SpotOn[Spot] != 0) {
 /* Spot is on, so sum, for each color component, the
 intensity, accounting for the angle of the light rays
 relative to the orientation of the polygon */
 /* Calculate cosine of angle between the light and the
 polygon normal; skip if spot is shining from behind
 the polygon */
 if ((Diffusion = DOT_PRODUCT(SpotDirectionView[Spot],
 UnitNormal)) > 0) {
 IntensityTemp.Red +=
 FixedMul(SpotIntensity[Spot].Red, Diffusion);
 IntensityTemp.Green +=
 FixedMul(SpotIntensity[Spot].Green, Diffusion);
 IntensityTemp.Blue +=

 FixedMul(SpotIntensity[Spot].Blue, Diffusion);
 }
 }
 }
 }
 /* Convert the drawing color to the desired fraction of the
 brightest possible color */
 IntensityAdjustColor(&ColorTemp, &FacePtr->FullColor,
 &IntensityTemp);
 /* Draw with the cumulative shading, converting from the general
 color representation to the best-match color index */
 DRAW_POLYGON(Vertices, NumVertices,
 ModelColorToColorIndex(&ColorTemp), 0, 0);
 }
 }
 }
}













































July, 1992
PROGRAMMER'S BOOKSHELF


The Ethics of Computing




Ray Duncan


It's easy to become so dazzled by the inexorable march of computing technology
that we lose sight of the profound moral and ethical questions that it raises.
Every year brings with it a new generation of CPUs, smaller and faster disk
drives, cheaper and higher-density RAM chips, more capable development tools
-- gosh, this must be the best of all possible worlds! But these advances
carry with them as well the potential for more tragic, more far-reaching
abuses of a computer's totally amoral abilities to crunch numbers, manage
data, or control devices. "Foo!" you may respond (or perhaps "Foobar!"), "my
programming [or engineering] job has no social implications and my heart is
pure." The reality, unfortunately, is that virtually all of us work with
computers in such a way that we encounter important ethical and moral issues
on a daily basis -- whether we recognize them or not.
We often pride ourselves on the cleanness and elegance of programming, the
tidy little digital universe that we can control so absolutely with our
properly structured routines, and the way we can distance ourselves from the
more mundane concerns of "ordinary people" who have to wear suits and go to
work at fixed hours. But no system is completely closed; the effects of
everything we do ripple out into the society we live in, and can sometimes
have an impact beyond our wildest imaginings. Think about the power Bill Gates
has over the direction of an industry worth hundreds of billions of dollars,
and the lives of all the people that work within that industry! Bill Gates
didn't acquire his power by inheritance, military action, or demagoguery -- he
literally just thought it into existence. Your actions and mine as programmers
or engineers aren't likely to have quite the same cosmic, long-term effects as
those of Bill Gates -- but the potential is there.
As a sensitization exercise, let's look aside at the use and misuse of a
rather different high technology. During the recent civil unrest in Los
Angeles (which I experienced at a somewhat closer perspective than I would
have preferred), roving journalists employed minicams to bring arson,
assaults, rioting, and looting to the television screens in real time. Among
the specific images that stick in my mind are an "action shot" of hoodlums
throwing trash cans through windows of one of the city government's office
buildings, a close-up of an innocent Hispanic driver being dragged out of his
car and beaten by a mob, a protracted scene of streams of people emptying a
furniture store while two policemen sat in their car watching, and a panoramic
view of the home of one of the Rodney King trial jurors that included the
juror's street address and a glimpse of his four-year-old daughter peeking
through the door.
I am sure that the minicam journalists would tell us that they were just doing
their job and that we shouldn't blame the messengers for bad tidings. But how
does the possession of a minicam and a press badge release a citizen from his
obligation to aid his fellow citizens in distress and uphold public order? At
what point do real-time images of looters operating with impunity under the
very noses of a paralyzed police force cease to be news and become an
invitation to other opportunists to join in the looting? Imagine that the
angry mob who thrashed the Hispanic passerby had instead decided to go burn
down the home of the juror I mentioned or (worse yet) injured or kidnapped his
daughter? What entitles some journalist with a thousand dollars worth of
high-tech hardware on his shoulder to put a four-year-old's life at risk in
the pursuit of "news?"
Before we get too self-righteous about the minicam wielders, though, let's
direct our attention back to our own field. Violent injuries or deaths that
can be directly attributed to computer-logic errors are mercifully rare so
far, but they definitely have occurred: In Europe, an incorrectly programmed
microcomputer-controlled fuel-injection system led to a fatal truck accident;
in Texas, a misprogrammed X-ray machine delivered radiation overdoses that
brought about the demise of at least one cancer patient. No doubt we can look
forward to more of these "computer homicides" with the introduction of
fly-by-wire airliners, driverless rapid transit trams, "closed-loop" therapy
in intensive care units, robotized factories, and the like. But wait!
Computers don't kill people, people kill people. So who should shoulder the
responsibility for these unfortunate events? The authors of the compilers that
were used to build the faulty applications, the application programmers
themselves, the programming managers or equipment manufacturers that did not
insist on sufficiently thorough testing, the managers who selected and
purchased the faulty equipment, or the equipment operators who put human lives
at the mercy of computerized machinery without adequate safeguards? Not an
easy question.
Perhaps the cases I just mentioned are too dramatic for your tastes. Let's
reach back, therefore, to the Jurassic age of computers for an utterly
different sort of ethical problem: one of the first recorded accounts of a
major breakdown in privacy protection for computer users. Here's Fernando J.
Corbato discussing Project MAC and the Compatible Time Sharing System (CTSS)
circa 1964 (IEEE Annals of the History of Computing, vol. 14, 1992):
In order to build the system simply, we put in a lot of constraints, and one
of the constraints was that a single user couldn't change the directory, and
conversely only one user at a time could be logged into a single directory...
Now for the system there was a single system programming directory where the
commands were [stored], and a couple of system programmers were twisting my
arm saying, "Gee, it's really awkward, but we have to get two installations
queued up in line. Just let us take off the interlock on that directory only,
and we'll let at least two people be logged in at once on the same directory."
That sounded innocent enough sort of, but what happened was that the editor
that was commonly used created a temporary file in that directory called
something like .temp. Unfortunately, it had no way of distinguishing who had
invoked it.
So it turned out that the system administrator of the day who was trying to
change the message of the day and another system programmer who was worrying
about entering some new passwords both used the editor. They both ended up
working at the same time, they both created .temp files and/or wrote over the
.temp file. And, lo and behold, before anyone quite realized what had
happened, the message of the day turned into being the password file. And the
way it was discovered was that somebody started to log in soon thereafter
(they were on the ninth floor) and they stared in wonderment as the passwords
unfolded in front of them...
This incident sounds pretty benign, until you remember that it occurred at MIT
on one of the very first systems to be regarded as an "information utility,"
in daily use by graduate students, researchers, and many of the foremost
computer scientists of that era. In a cut-throat academic environment like
MIT, careers can be made or broken by an indiscreet memo falling into the
wrong hands, the premature disclosure of research data that ultimately proves
to be wrong, or the preemptive publication of some new technique that should
have been credited to someone else. In other words, this little anecdote shows
all too graphically how lives can be changed (and not for the better) as a
result of someone failing to think through all the implications of a minor
change to the system software. For another incident that lies ethically
somewhere between the two extremes of death and passwords, consider the
notorious Internet Worm.
On November 2, 1988, Robert T. Morris, a 23-year-old computer science student
at Cornell University, unleashed a "worm" program that took advantage of a
trapdoor in the UNIX sendmail program and a bug in the finger program to
propagate itself from one BSD 4.3 system to another. (The term "worm,"
incidentally, was coined by John Brunner in his 1975 novel The Shockwave
Rider, which foreshadowed the cyber-punk novels of Gibson and Sterling.)
Within a matter of hours, the worm had spread across the continent and brought
thousands of UNIX systems to their knees. Whatever Morris's intentions, it's
clear that the worm was not simply an idle prank; the worm used a
sophisticated, multipronged attack strategy and was based on a deep knowledge
of UNIX system internals. Also, inspection of the Cornell system logs revealed
that Morris was testing his worm as early as October 19, and tried several
implementations before arriving at the "successful" version. Some people have
defended Morris with the observation that the worm didn't damage any files and
(due to the efforts of teams at MIT, Berkeley, and other sites) was fairly
rapidly dissected and neutralized. But what about the hundreds of thousands of
man-hours that were lost forever by those programmers, engineers, professors,
students, and researchers who depended on the worm-infested UNIX machines?
Some machines were not reconnected to the Internet until as late as November
10, and the mail backlog was not cleared until November 12.
Like a gun or a minicam, the immense power and the even more immense stupidity
of the computer make it a dangerous device in the hands of the malicious, the
ignorant, the morally blind, or the merely oblivious. Every time we create a
programming tool, automate some task that previously required a real-live
person, change someone's password for our own convenience as the network
manager, "loan" a copy of Lotus 1-2-3 or Microsoft Excel to a friend, consult
for the government, or even put a customer's name and credit rating into a
database, we should be thinking carefully about the broader ethical and moral
implications of our actions. Whose time might we be stealing, whose
expectations might we be confounding, whose trust might we be betraying, whose
position might we be compromising, whose children might we be endangering?
It's required, but hardly sufficient, to act honorably according to our own
lights; we must also seek to broaden our understanding of the moral and
ethical issues and learn how others have dealt with challenging situations or
ambiguous predicaments. While I'm certainly no expert in these areas, there
are three books that I would commend to you as a starting point.
The seminal work on the social issues of computing is Joseph Weizenbaum's
Computer Power and Human Reason: From Judgment to Calculation. This book,
which was published in 1976, had a deep influence on me at the time and I give
it a lot of credit for steering me away from an isolated, antisocial, hacker
existence. Weizenbaum's description of the "compulsive programmer" is classic:
The computer programmer is a creator of universes for which he alone is the
lawgiver. So, of course, is the designer of any game. But universes of
virtually unlimited complexity can be created in the form of computer
programs. Moreover, and this is a crucial point, systems so formulated and
elaborated act out their programmed scripts. They compliantly obey their laws
and vividly exhibit their obedient behavior. No playwright, no stage director,
no emperor, however powerful, has ever exercised such absolute authority to
arrange a stage or a field of battle and to command such unswervingly dutiful
actors or troops.
One would have to be astonished if Lord Acton's observation that power
corrupts were not to apply in an environment in which omnipotence is so easily
achievable. It does apply. And the corruption evoked by the computer
programmer's omnipotence manifests itself in a form that is instructive in a
domain far larger than the immediate environment of the computer. To
understand it, we will have to look at a mental disorder that, while actually
very old, appears to have been transformed by the computer into a new genus:
the compulsion to program.
Wherever computer centers have become established, that is to say, in
countless places in the United States, as well as in virtually all other
industrial regions of the world, bright young men of disheveled appearance,
often with sunken glowing eyes, can be seen sitting at computer consoles,
their arms tensed and waiting to fire their fingers, already poised to strike,
at the buttons and keys on which their attention seems to be as riveted as a
gambler's on the rolling dice. When not so transfixed, they often sit at
tables strewn with computer printouts over which they pore like possessed
students of a cabalistic text. They work until they nearly drop, twenty,
thirty hours at a time. Their food, if they arrange it, is brought to them:
coffee, Cokes, sandwiches. If possible, they sleep on cots near the computer.
But only for a few hours, then back to the console or the printouts. Their
rumpled clothes, their unwashed and unshaven faces, and their uncombed hair
all testify that they are oblivious to their bodies and to the world in which
they move. They exist, at least when so engaged, only through and for the
computers.
Weizenbaum's book was written to be accessible to laymen, so the first part of
the book is tutorial in nature: "Where the Power of the Computer Comes From,"
and "How Computers Work." Once the introductory material is out of the way,
however, the book is equally valuable to the experienced computer
practitioner. Weizenbaum discusses problem solving and modeling, the
psychology of computer interactions, natural-language processing,
self-organizing complex programs (he refers to these as "incomprehensible
programs"), and artificial intelligence. (Weizenbaum was the author of the
ELIZA program.) In retrospect, nearly 20 years later, much of his discussion
of AI was prophetic even if it has become moot, because the field has lost
most of its credibility and is frankly floundering. Weizenbaum's book ends
with a chapter called "Against the Imperialism of Instrumental Reason" that
should be required reading for every computer science student.
The second book I would like to bring to your attention is Computer Ethics:
Cautionary Tales and Ethical Dilemmas in Computing, by Tom Forester and Perry
Morrison. This short but superb text surveys the entire gamut of
computer-related ethical, moral, and social issues, including
computer-assisted crimes, software theft, hacking, viruses, unreliable
computers, invasion of privacy, artificial intelligence, the Strategic Defense
Initiative ("Star Wars"), and computerization of the workplace. Many detailed
case histories are included, and these are supported by an extensive
bibliography. The authors have a direct, vivid, precise writing style that I
admire enormously!
The third book that I think you would enjoy is Computers Under Attack:
Intruders, Worms, and Viruses, edited by Peter J. Denning, past
editor-in-chief of Communications of the ACM. The book begins with a history
and description of the Internet by Denning, then continues with a collection
of articles on various security topics by authors such as Ken Thompson,
Clifford Stoll, Maurice Wilkes, Richard Stallman, and Paul Saffo. The
extremely detailed analysis of the Internet Worm, some of which was previously
published in Communications of the ACM, is particularly interesting. The
common PC viruses are also listed and their implementations are explained. The
last part of the book surveys the reactions to the Internet Worm by all
sectors of society, ranging from legislators to hackers.































July, 1992
OF INTEREST





In addition to announcing support of Microsoft C7 and Borland C++ 3.1,
MultiScope has released its debugger for OS/2 2.0. The OS/2 2.0 version offers
full 32-bit support and the ability to debug Presentation Manager (PM)
applications with a PM-hosted debugger. New features include an advanced
message-spying capability, complete use of the 80386/486 processor's
watch-point debugging capabilities, a VCR-style remote-control interface for
allocating most of the PM screen to the application, and network debugging
over any NetBIOS-compatible network.
Also included are a runtime debugging capability for controlling program
execution and a Crash Analyzer System with MED (Monitor Execution and DUMP), a
utility to be sent to Beta sites for testing. If the application crashes, a
dump file is automatically created. The developer can then run the file
through the Crash Analyzer Debugger to determine the cause of the crash.
MultiScope Debuggers for OS/2 2.0 retails for $449; upgrades are $129.00.
Reader service no. 21.
MultiScope Inc. 1235 Pear Avenue Mountain View, CA 94043 415-968-4892
Now shipping from Genus Microprogramming is the PCX Toolkit for Windows. The
new toolkit includes over 30 functions for displaying, saving, scaling, and
manipulating PCX bitmapped graphics from within Windows programs. An image
display and capture utility called pcxShow comes with the toolkit, as do
sample programs in C, Pascal, and Basic.
Among the features included are: bitmap and DIB support; palette manipulation;
easy display functions and clipboard usage; and the ability to save PCX images
and store PCX files as resources.
The PCX Toolkit for Windows costs $249.00 ($599.00 with source code) and
supports Microsoft C, QuickC for Windows, Borland C++, Turbo Pascal for
Windows, Visual Basic, and languages that support DLLs. Reader service no. 22.
Genus Microprogramming 2900 Wilcrest, Suite 145 Houston, TX 77042-3355
800-227-0918 or 713-870-0737
PEXtk, a graphics library for developers of mechanical computer-aided design,
scientific visualization, molecular modeling, and other three-dimensional
applications, is available through the MIT X Consortium. PEX extends the
X-Window System and features multi-mode operation, enabling developers to use
both immediate-mode, three-dimensional graphics commands, and retained
structures in the same program. Use of the low-level PEXlib library from the X
Consortium ensures portability across multiple platforms.
A complete source code version of PEXtk 1.0 is available free of charge from
the MIT X Consortium. For more information, contact SHOgraphics. Reader
service no. 23.
SHOgraphics 1890 N. Shoreline Boulevard Mountain View, CA 94043 415-903-3880
The Michaelangelo VRAM 1280 graphic accelerator from IOcomm is a Windows
accelerator based on S3's 86C9ll graphics processor. Michaelangelo's most
outstanding feature is its capability to generate non-interlaced 1280x1024
graphics. It paints in up to 65,768 colors in lower resolutions (using
built-in Sierra HiColor RAMDAC technology) and up to 256 colors in
non-interlaced 128OxlO24 mode with 2 Mbytes of VRAM.
Non-interlaced operation ensures flicker-free performance, and the 70/72/
75-Hz refresh rates enhance stability and provide VESA compliance. To boost
speed and performance, Michaelangelo uses dual-ported fast page VRAM in 1- and
2-Mbyte standard configurations and proprietary ASICs, BIOS, and caching
techniques.
Michaelangelo retails for $395.00 and is register- and BIOS-compatible with
VGA, EGA, CGA, MDA, and Hercules software. Reader service no. 24.
IOcomm International Technology 12700 Yukon Avenue Hawthorne, CA 90250
213-644-6100
The PHIGS Programming Manual has recently been published by O'Reilly and
Associates. The volume documents the PHIGS and PHIGS PLUS graphics standards
and provides guidelines for using PHIGS within the X environment (for Xlib,
Motif, OLIT, and XView). The book opens with the PEX Sample Implementation,
the publicly available base for commercial PHIGS products and goes on to
discuss output primitives, attributes, color, and structures; viewing,
lighting, and shading; the SIS ISO C binding; use of PHIGS and PHIGS PLUS in
interactive programs; and all the PHIGS and PHIGS PLUS functions.
The PHIGS Programming Manual costs $52.95; ISBN #0-937175-92-7. Reader service
no. 25.
O'Reilly and Associates Inc. 103 Morris Street, Suite A Sebastopol, CA 95472
707-829-0515
PC Interrupts: A Programmer's Reference to BIOS, DOS, and Third-Party Calls,
by Ralf Brown and DDJ contributor Jim Kyle, has been published by
Addison-Wesley. This is the first complete reference to all IBM system calls,
covering MS-DOS, the ROM BIOS, over 25 major APIs, and dozens of resident
utilities.
PC Interrupts provides a concise description and other essential information
about each call, plus information on potential conflicts between calls from
different APIs. Also covered are: multitaskers, DPMI, networking calls,
hardware and video, low-level and serial I/O, Windows and Netware, and DOS
extenders.
The suggested retail price is $32.95; ISBN #0-201577-97-6. Reader service no.
26.
Addison-Wesley 1 Jacob Way Reading, MA 01867 617-944-3700
The American National Standards Committee on Pascal, X3J9, has made available
a draft of the Technical Report on Object-Oriented Extensions to Pascal. To
receive a copy of the draft for review and/or comment, contact:
Thomas N. Turba Chairman X3J9, Pascal Unisys Corp. MS: 4672 P.O. Box 64942 St.
Paul, MN 55164-0942 612-635-6774
Two new C++ libraries, View.h++ and MouseWrapper.h++, have been released by
Rogue Wave.
View.h++ is a library based on Motif that offers two levels of functionality.
At the lower level, it offers the functionality of the Open Software
Foundation's Motif GUI in a C++ library. At the higher level, it offers a
Model-View-Controller architecture with an abstract, powerful programming
interface which ensures that all of a user's data is refreshed and up-to-date
whenever the data changes. Complex objects such as a file dialog box can be
created with a single function call, and many precanned views are provided.
Custom views can be created by inheriting from an existing view.
MouseWrapper.h++ is a system for managing object picks, drags, rubber banding,
and interactive graphics. Any object created with it contains mouse
sensitivity. Grabbers are included for interactive construction of complex
graphical composite objects.
In both View.h++ and MouseWrapper.h++ all objects are fully persistent, and
graphical views of data can be recreated in a new application or on a
different operating system.
Prices start at $795.00 for View.h++ and $495.00 for MouseWrapper.h++. Reader
service no. 27.
Rogue Wave Software Inc. P.O. Box 2328 Corvallis, OR 97333 503-754-3010
Contexture is shipping Contessa 2.0, a graphical development toolset for
building X-Window based, client/server applications. The latest version
supports the OSF/Motif interface standard and is designed for creating
applications that simultaneously access multiple, diverse data sources such as
relational databases, real-time feeds, spreadsheets, graphics packages, and so
on. Data can be mixed and matched in real time, allowing the programmer to
access any combination of information, regardless of where it resides on the
network.
Contessa comprises a Motif GUI builder, a TCP/IP and DECnet networking
component, a data-integration interface component and a high-level scripting
language.
Development licenses cost $5000.00; runtime licenses are $500.00 Reader
service no. 28.
Contexture Systems One Exeter Plaza Boston, MA 02116 617-424-8340
The Workshop from Optibase allows you to add JPEG image compression to Windows
applications. It includes DDE (Dynamic Data Exchange) capabilities to link
image compression to Windows databases and other applications. The Workshop
performs JPEG compression and expansion on color images with 16 or 24 bits per
pixel and gray-scale images of 8 bits per pixel.
The Workshop is packaged with both Windows and DOS versions and includes basic
image-processing features such as image flipping, brightness and contrast
control, and a preview mode for image compression. It supports the Optibase
Model 100 and 500 DSP-based PC boards for acceleration of JPEG and additional
processing features, and incorporates the Optibase Visual Model, which
displays 16- and 24-bit-per-pixel color images on a standard 256-color VGA
screen without degradation of image quality.
The Workshop sells for $149.00 and supports RIFF, BMP, Targa, and PCX file
formats. Reader service no. 29.
Optibase Inc. 7800 Deering Avenue Canoga Park, CA 91304 818-719-6566
Two window classes are available from SE International: PrimaryWindow Class, a
tool and methods library for developing OS/2 Presentation Manager (PM)
applications with C; and FormattedEntryfields, which dynamically validates and
formats numeric user input in PM entry fields.
PrimaryWindow Class lets you create SAA/CUA-conforming, sizable, scrollable
primary windows that contain controls in their client areas. The layout of the
primary windows can be designed using the OS/2 Dialogbox Editor.
FormattedEntryfields has upper and lower bounds determined by C data types and
application-specific ranges. It validates the number of decimal digits, omits
leading zeros, and permits logical checks. The OS/2 Dialogbox Editor embeds
FormattedEntryfields in Dialog templates.
The PrimaryWindow Class costs $420.00 per workstation, $2095.00 for LAN
server; FormattedEntryfields is $329.00 per workstation, $1495.00 for LAN
server. Reader service no. 30.
SE International Inc. One Park Place, Suite 240 621 NW 53rd Street Boca Raton,
FL 33487 407-241-3428
Icarus has announced SourceSafe for DOS version control. SourceSafe enables
you to track old versions of source code and other text and binary files
without storing every version independently. It also coordinates access by
multiple developers, thus preventing collisions that cause one programmer to
overwrite the code of another. SourceSafe is targeted at programmers who rely
on modular, shared code -- those who have a base of core code that they port
to different products, who tailor one program to individual clients, or who
use object-oriented languages. SourceSafe stores code in user-defined projects
that can literally share the same file; file update is automatically reflected
in every project that uses a file.
DDJ spoke with Gary Loew of MCG Ltd. in West Orange, New Jersey, a company
that develops accounting and distribution and document-assembly software.
According to Loew, SourceSafe's outstanding feature is its transparency. "It
requires few commands and does its work in the background," he said, "plus it
has incredible ease of setup and use and very good documentation."
Prices for SourceSafe start at $245.00 for a single-user license. Reader
service no. 31.
Icarus Software P.0 Box 11639 Raleigh, NC 27604 800-397-2323 or 919-821-2300
The Manifold Editor from Fuzzy Systems provides an intuitively helpful
interface for capturing the expert judgments needed to build any fuzzy system.
The Manifold Editor allows two-dimensional matrix display of the fuzzy system
rules for viewing and editing. It permits up to five input dimensions and two
output dimensions in the fuzzy estimation surface design; output selection in
each rule is performed by point and click. Piecewise and freeform fuzzy-set
editing are both provided. Up to 77 arbitrary fuzzy sets may be defined and
edited. Editing tools include: simple graphical editing of triangular and
trapezoidal fuzzy sets; detailed graphical editing of arbitrary fuzzy sets;
and translation from the simple piecewise fuzzy sets to the arbitrary freeform
fuzzy sets. All expert information is output in an include file for use in
your applications code. Output languages include C, Basic, and Fortran.
Also included is the Manifold Walker, a testing tool that allows static
testing of the fuzzy estimation surface at any time in the design cycle.
The Manifold Editor costs $250.00. Reader service no. 32.

Fuzzy Systems Engineering 12223 Wilsey Way Poway, CA 92064 619-748-7384





























































July, 1992
SWAINE'S FLAMES


DoonsDay




Michael Swaine


I was sitting in one of the hot spots of downtown Bonny Doon, California,
where I work and live, when it occurred to me that I might get a fascinating
column, or at least a fast one, out of halting the handiest local, or
"Doonie," as we call us, and soliciting his or her opinion on -- well, on
whatever he or she wanted to opine on, frankly. My column deadline was
looming: I had 60 minutes to finish. The result was this minute with a handy
Doonie.
"Didja ever notice how practically every major hotel chain was founded by
someone whose name starts with the letter 'H'? Hilton, Helmsley, Hyatt,
Holiday. Why do you suppose that is? Do these people go into the hotel
business because they enjoy the alliteration? Hyatt Hotel, Helmsley
hospitality, Honolulu Hilton. Then again, maybe it's some ethnic thing. If so,
I hope I'm not offending anyone. God knows I don't want to offend anyone.
"All I know is, it's confusing. On a recent business trip I stayed at a Hilton
and a Hyatt. My company paid for the Hyatt, and I was on my own for the
Hilton. Or maybe it was the other way around. I had a terrible time keeping my
lunch receipts straight when I was filling out my expense report. And the cab
driver who took me from the Hyatt to the Hilton was as confused as I was. He
took me to another Hyatt. I had to walk 12 blocks.
"Didja ever wonder what it would take to clone a Macintosh? Benjamin Chou
thinks he knows. Here's his list. Oh, I took the liberty of simplifying it so
I could understand it."
1. a box, a bus, a tube, a keyboard, and a mouse
2. a host processor, a Motorola sixty-eight-oh-whatever
3. a SCSI controller chip
4. a LOT of RAM
5. some glue logic
6. some magic chips
7. a clone of the Mac's GUI
8. a clone of the Mac's API
9. a very good lawyer
"Chou is the president of NuTek, the company that showed a functioning Mac
clone in Europe this spring. He says that NuTek has the last four items and
that OEMs can supply the other five. The magic chips are 1-micron CMOS ASICs.
There are three of them and they supply all the logic for DMA, burst-mode
memory transfer, sound generation, disk drive control, and for addressing more
memory than Macs and talking to the NuBus at up to 33 MHz. That sort of stuff.
"NuTek was very careful about copyright and patent questions. The NuTek
chipset and API were laboriously developed using cleanroom procedures. At
least that's what Chou says. And the GUI is Motif from the Open Software
Foundation, which Apple hasn't sued yet. So wouldn't you think it would be
good news for NuTek that U.S. District Judge Vaughn Walker threw out most of
Apple's copyright claims in its suit against Microsoft?
"Well, you'd be wrong. The latest word is that Apple is now considering legal
action against NuTek. Why, Apple? If you really need to sue somebody, can't
you find an easier target? Can't you track down some school children making
illegal copies of the calculator desk accessory? But I guess bottom feeders
have to eat whatever drifts their way.
"It looks like NuTek is going to need that 9.
"The people at Apple probably won't like that business about bottom feeders.
No sense of humor. Isn't it nice when people have a sense of humor about their
work? But doesn't it make you a little uneasy that the people who work in the
Pentagon call the little park in the center of their building 'ground zero?' I
guess it's funny, but I kinda expect the Pentagon to be a more serious place.
You know, everyone walking around with their heads down, muttering about
'mutually assured destruction' and that sort of thing. More like Apple."
Next month, I investigate the technical expertise of Sears sales people.






























August, 1992
August, 1992
EDITORIAL


DDJ's 1993 Editorial Calendar and Networks -- Neural and Otherwise




Jonathan Erickson


Every year at this time, we present the editorial lineup for the coming
months. In addition to the topics listed below, we'll be offering up our
regular monthly fare of object-oriented programming tools and techniques,
solutions to Windows development problems, embedded systems programming
projects, network programming, advanced algorithms, UNIX systems, nitty-gritty
DOS, and more. Our basic approach remains unchanged: one programmer talking to
other programmers, sharing ideas and techniques. And there'll always be a lot
of source code. With all this in mind, here's what you'll being reading about
in next year's Dr. Dobb's Journal:
January 32-bit Programming
February Cognitive Computing
March Data Structures and File Formats
April Algorithms
May Operating Environments
June ASM and Architectures
July Graphics Programming
August C/C++ Programming
September Numeric Programming
October Object-oriented Programming
November Debugging and Profiling
December Interoperability
If you have a particular article in mind on these or other topics, or if you'd
like a copy of our author guidelines, send us a note (DDJ, 411 Borel Ave., San
Mateo, CA 94402) or call (415-358-9500).


Networked Systems


We've long believed that you should write every program to be network aware,
even if you don't initially plan on running it on a network. To underscore the
importance of this, in this issue we're launching "Networked Systems," a new
monthly section focusing on network programming.
In the coming months, we'll be examining all facets of software development
for networked environments, from LANs and WANs to open (and closed) systems.
Again, if you have an article you'd like to contribute to this section or have
problems you'd like to see us address, give us a call.


Dr. Dobb's C++ Sourcebook


Each year we try to give you a bonus edition of DDJ that supplements your
regular monthly issues. This year's special issue is entitled Dr. Dobb's C++
Sourcebook, and if you're a subscriber, you'll receive it at no extra charge
with your December 1992 issue.
Included in this bonus supplement will be interviews with C++ language
designers, discussions of why some programmers think C++ is better than
Fortran for numeric programming, source code for some powerful class
libraries, useful C++ utilities, and much more.


Neural Net News, or Who's On First?


Neural nets are in the news again. For starters, Bellcore has announced it's
developing a neural-net computer that processes 100,000 signals per second.
The system eventually will be used for routing telephone calls, assigning
radio frequencies for wireless telephones, and speech recognition.
Caere's soon-to-be-released Windows-based FaxMaster software uses neural nets
to recognize and re-create characters lost or eroded during data transmission
of low-resolution faxes.
Ricoh, the Japanese company known for printers and photocopiers, has announced
a high-speed, hardware-only neural-network computer that will be used as an
embedded controller for copiers. Ricoh, which is manufacturing its own
16-neuron chips, claims bragging rights to what it calls the first neural-net
computer developed solely in hardware.
VeriFone has unveiled a check reader that processes up to 20,000 images per
second at an accuracy rate of 99.6 percent. The VeriFone reader is built
around the Synaptics I-1000 neural-net chip that's capable of 1 billion
operations per second. The PC-based reader has the I-1000 on a daughterboard,
plugging into VeriFone's Motorola 68HC11-based motherboard. VeriFone and
Synaptics claim the check reader is the first commercial application of
neural-network technology.
Hmm ... this might be news to Nestor Inc. and others who for years have been
developing neural-net software for handwriting recognition, predictive
modeling, and speech recognition. I'm sure VeriFone and Synaptics meant to say
"the first commercial hardware application." If so, I'll then leave it to them
and Ricoh to work out who's really on first with neural-net hardware.










August, 1992
LETTERS







C is the Pitts


Dear DDJ,
Reader Marty Leisner ("Letters," April 1992) seems to have intoxicated himself
on C water. I have no quarrel with his claim that "C lets the programmer do
things the computer is capable of." So also, with greater or lesser efficiency
respectively, do assembler and Basic, but somehow that fact slipped by
unnoticed. I want a programming language that stops me from doing the things
the computer is capable of -- when they are the things I don't want the
computer to do.
I have seen over the years perhaps a dozen independent benchmarks comparing
some aspect of C to a better language like Pascal or Modula-2, and I find it
remarkable that C was beaten without exception. Several of the benchmarks were
reported by persons as biased toward C as Leisner, and these poor cheerleaders
were reduced to a forlorn hope that the future would bring improvements to
their favorite language sufficient to even the score.
I think C is a wonderful programming language, and I hope all my competitors
make full use of it.
Tom Pittman
Spreckels, California


It's the Thought that Counts


Dear DDJ,
Regarding Jeff Duntemann's April "Structured Programming" column about OOP and
collection objects, I don't think Jeff should get so depressed about the rise
of inheritance (driven by the economics of code reuse) ruining the beauty and
potential of purely encapsulated/abstracted object design. In fact, I see a
ray of hope emanating from an idea of using collection classes as a basis for
a runtime object-oriented system.
We want a system wherein all objects are compatible and polymorphism is taken
to an extreme. All objects should be able to respond to a given method
invocation, either by doing something, doing nothing, passing the buck, or
complaining. We also want all of our data to be objects, built on other
objects solely by composition. To do this we have to avoid inheritance and
extend polymorphism to a composition paradigm.
Another thing to work towards is a system that would allow us to create and
use object classes at runtime. And wouldn't it be great if programs were
object classes and an invocation of a program would really be an instantiation
of a program object? Both an object's data and methods would then also be
objects. (I once worked on a research system where even the state of the
processor was a data object!)
I think one key to this system could be an extension of the collection-class
concept. Essentially, a collection object is a set of pointers to differing
objects. It's almost a way of performing runtime composition/encapsulation.
However, the collection object and therefore its methods are written in stone
by the collection-class definition at compile time.
A first step might be to create a collection class with a master list of
methods applicable to its intended collection of members. At run time, the
collection object could use a qualification-for-membership function that would
interrogate nominated objects to see if they possessed the proper data or
methods. Or it could be left up to the nominee to provide function that could
prove membership qualifications. One could then use the collection object as a
runtime abstraction layer, providing a form of runtime encapsulation.
Our collection object might allow a nominated member object to adopt some
default generic methods or data defined in the collection class. This is
almost a form of runtime inheritance. Really, the nominee would not inherit
the method, but would allow the collector to provide a method for it.
For those wild collections, maybe the nominated member and the collection
object together would identify which methods on the collection object's master
list the nominee supports on its own, which of the collector's default methods
the nominee would accept, and which ones it won't deal with at all. This
extends polymorphism to the runtime arena using dynamic encapsulation (or
perhaps "dynamic object overloading").
If we took this a small step further, the collection object could adopt the
methods of its members. If member A had a method x, and member B had a method
y, a collection object C containing A and B could have methods x(A) and y(B).
This would really be runtime composition of objects. How would a program use
an object that did not have defined methods at compile time? If our program
was itself a collection object, it could dictate that it needs an x and y
method from its member objects. Or our program that used object C could be one
that does something with C's methods regardless of what the methods actually
are, like placing them on a menu or running them all in some order.
All this could be tracked internally to the collection object in much the same
way virtual methods are tracked today, but allowing runtime modification of
the virtual method tables. Windows does this same sort of runtime processing
when you define farprocs and pass them to the Windows API.
Somewhere we might need to develop a communication facility so that objects
could tell each other what their methods do instead of their names. Right now,
there are only two things one could use to determine what a method does: its
name and its actual code. Both are unsatisfactory, as names are ambiguous and
comparing pieces of code is sometimes impossible (Turing) and a bad idea
(ruins abstraction). We need an interobject language that is either computed
perfectly with a standard or one that could handle precise translations from
different corporate dialects.
Going way out into science fiction, imagine an operating-system environment
wherein everything was an object. Processors would run objects. Virtual
processors would be objects. Peripherals would put interface objects into the
operating-system space. Active operating-system collection objects would
continuously evaluate other objects for membership. If a new printer were
hooked up, it would put a services-offered object into the system environment
and a system printer-collection object would collect it, providing a
higher-level printer interface to other objects.
What's missing is that in current static object systems, an object class is
still defined at compile time, limiting the potential for extending or
modifying an instance of that class during run time. It's really not that far
from a system in which objects could be created dynamically. The difference
between an object-class definition and an instance of an object could just
slowly disappear, as an object class will be itself an object.
In any case, the column on collection objects sure got me thinking. Thanks!
Mike Matchett
North Little Rock, Arkansas


Checking Up on Checksums


Dear DDJ,
In the article "Fletcher's Checksum," John Kodis concluded that CRC
calculation is some 20 times slower than calculating Fletcher's checksum. I
disagree. One does not have to shift and XOR bit by bit to calculate CRC.
As pointed out by Mark Nelson in his article, "File Verification Using CRC"
(May, 1992), one can "use a table lookup that exchanges a small increase in
storage space for fast calculation."
For example, the 16-bit CRC-CCITT (X^16+X^12+X^5+1) can be calculated with the
Pascal code in Example 1, using two 256-byte tables. When the code in Example
1 is optimized in assembly, I believe it can be just as fast as calculating
Fletcher's checksum.
Example 1

 var Temp, Data, CRC_Low, CRC_High: byte;
 const Table_Low: array[0..255] of byte = ...;
 Table_High: array[0..255] of byte = ...;
 ...
 Temp := Data xor CRC_Low;
 CRC_Low := Table_Low[Temp] xor CRC_High;
 CRC_High := Table_High[Temp];


Lichen Wang
Palo Alto, California
Dear DDJ,
John Kodis's May 1992 article, "Fletcher's Checksum," was interesting. I like
to fool around with CRC and PRN algorithms. My comments fall into two major
areas: misinformation on CRC computational speed and weaknesses in the
"standard" CRCs.
There are several interesting ways to speed up the traditional CRC
calculations. The biggest speed-up comes from the most obvious design: table
lookup. You can convince yourself that the effect of adding another byte of
data depends only on the prior state of the most significant eight bits of the
CRC and the new byte of data. Any CRC generator can be supported, but its
syndrome table must be precomputed.
From my experience, any 16-bit CRC can be calculated in seven instructions per
byte--not 52, as stated. Fletcher's checksum, at seven instructions per byte
is still faster; just not as much as stated.
Also, while I'm not usually an 80x86 fan, note that the lowly 8088 can
calculate 16-bit CRCs in only nine instructions per 16-bit word, while the
best 68020 loop I could find requires 11 instructions per word. This is
because the 68000 EOR instruction is "crippled," and the 86 family can operate
directly on AH, while the 68000 must use shift instructions to access upper
register bytes.
On these machines, 16-bit CRC calculation appears to be modestly faster than
Fletcher's checksum.
Mr. Kodis notes several properties of 16-bit CRCs. Not being the main focus, a
very short summary was presented. One important property not mentioned that
all standard CRCs share is that any block containing an odd number of errors
is detected. This is accomplished by ensuring that G(x) is divisible by (x+2).
Unfortunately, I know of no communication or storage method which tends to
make mostly odd numbers of errors. Further, differential codes tend to make
errors in pairs. Thus, the property of codes is, at best, useless. Note that
in a channel which pairs errors, only the 15-bit component G15 = G/(x+1) is
actively detecting errors. This doubles the undetected-error rate expected
with a prime generator. Also, for high error rates, some standard CRC codes
degenerate: The undetected-error rate can actually climb above 2-16 (!) (In
this regard, CRC-ANSI is much worse than CRC-CCITT, for some reason.)
Thus, for compatibility you should select a standard CRC, preferably
CRC-CCITT. Otherwise, you should select a prime generator if detection
performance is key, especially at high-link error rates.
Another important property of CRCs is that an N-bit CRC always detects blocks
with a single burst of errors of length<=N. If your error-generation process
tends to make bursts of errors, this fact can aid in your analysis of how well
the resulting system will work with CRC.
Peter Hanson
Santa Clara, California
John responds: I would like to thank Messrs. Wang and Hanson for the feedback
on my article, particularly Mr. Hanson's explanation of some additional
properties of the CRC. They both make a valid point: that performing CRC
generation a byte or a word at a time can provide significant speed
improvements over the bit-oriented approach I referred to. I still believe
that Fletcher's technique will be significantly faster than the table-driven
CRC approaches cited, since Fletcher's method requires fewer instructions and
no memory-fetch time for the table-lookup operations.
When selecting a data-validation method, a trade-off must be made between high
data integrity and high computational effort. The main point I wanted to make
in my article was that Fletcher's technique provides designers with an
excellent alternative to the more common checksum and CRC methods.


A Programmer by Any Other Name


Dear DDJ,
In his April 1992 "C Programming" column, Al Stevens insists that programmers
will not become obsolete. He's right, but he's also wrong.
He cites doctors, accountants, and car salespeople as examples of people who
could never write their own programs. All, however, will create a spreadsheet
or request a report from a database. These tasks would have required a
programmer to write a formal program only a few years ago. More likely, they
would not have been done.
I would argue that the doctor, accountant, or auto seller is programming. Not
only can accountants write programs, they do. But calling it "programming"
would scare off the people who do it. Programming has become so easy it isn't
called programming any more.
When Fortran was developed in the late 1950s, it was intended to eliminate
programmers. This can be regarded as laughable, but in fact it succeeded --
within the context of the time. Computing meant scientific or engineering
number crunching. Those programs are still being written, almost always by a
scientist or engineer rather than a professional programmer.
Today much computing consists of common applications such as word processing
or spreadsheets. Even very sophisticated applications can be purchased, and
the people who buy them work in a user department rather than in the
information services department.
Our idea of programming has changed over the last 30 years. There will
probably always be programmers--but they won't always do what they do now.
Jim Caughran
Willowdale, Ontario
































August, 1992
LARGE CHARACTER SETS FOR C


The internationalization of C is underway




P.J. Plauger


P.J. Plauger, whose most recent book is The Standard C Library (Prentice Hall,
1992), is a member of the ISO JTC1/SC22/WG14. He can be contacted at
uunet!plauger!pjp.


C is the only standardized programming language that supports large character
sets. That will not always be true. The Japanese have made their position
clear to ISO, the international organization that standardizes programming
languages. Several years ago, they announced their intention to veto any
future language standards that do not contain similar support. Wisely, ISO
passed a resolution endorsing the Japanese position.
C could have become the last standardized programming language that did not
support large character sets. The Japanese were willing to exempt the C
Standard because, at the time, it was very near completion. Many of us had
already put five or more years into standardizing C. We were tired and ready
to quit. But we didn't want to be the last of an old breed, not after all that
work. Rather, we chose to do the extra work that made us the first of the new
breed.
To do so, we had to bend our self-imposed rules a bit. Standard C is highly
compatible with the C of Kernighan & Ritchie. We resisted rather well the
numerous temptations to "fix" the language--particularly where such fixes
would require existing C code to change. We did add a number of features. Some
of the additions are changes to the language proper. Most are pure add-ons,
such as new library functions. All of the additions, however new they may
appear to many, were based on some form of prior art. Even function prototypes
and type qualifiers (such as const) were derived from C++ and other dialects
of C.
It was harder to find precedents for manipulating large character sets, at
least the way we chose to do so. True, several companies have provided Kanji
support libraries for a number of years. A few have permitted limited
inclusion of Kanji characters within C source code itself. Nobody had chosen
to be as ambitious as we felt we had to be. Like it or not, we had to be
inventive.
We were equally inventive in adding "locales." That's the machinery we added
to make the Europeans happy. They were just ahead of the Japanese in
requesting that C be made more international. A locale summarizes many of the
conventions of a given culture. Francophones want their dates spelled out in
French. Accountants want negative numbers to print with a trailing "DB"
instead of a leading minus. Dictionary writers want words to sort in a funny
way. The locale machinery added to C is intended to support an open-ended set
of such cultural conventions by defining and mixing locales.
If you are not conversant with locales and large-character support in C, don't
fret. The material is so new that many experienced C programmers barely
understand the basic concepts. More important, most C programmers don't need
to care. At least not yet. The push for internationalization has just begun,
and it may be years before your corner of the world will feel the impact.
But don't feel you can ignore this topic indefinitely. The international
marketplace for software is growing fast. Whoever pays your salary will soon
care very much about meeting that growing demand with an economy of effort.
Standard C offers that economy better than any other programming language in
use today. It behooves all professional programmers to understand the issues
involved in this new field of "internationalization."
My focus in this article is primarily on support for large character sets. If
you want to learn more about locales as well, see my book, The Standard C
Library (Prentice Hall, 1992). It discusses the entire C library, but pays
particular attention to features added for internationalization.


Representing Large Character Sets


When Europeans talk about large character sets, they usually mean sets with
extra language-specific characters. The 95 graphics defined in the U.S. form
of ISO 646 (also known as ASCII) are not enough. Practically every European
alphabet defines additional characters or accented versions of the English
characters. In fact, I am told that only three languages in the world can get
by with just the 26 letters of the common subset of ISO 646--English,
Hawaiian, and Swahili.
Still, these extra characters number only in the dozens. You can throw in
every known accented character in European alphabets, the funny extra
characters, Cyrillic, and Greek--and still fit all the graphics comfortably in
a 256-character set. So the European notion of a large character set is one
that uses all 8 bits of a byte. Forget any comfortable C-ish no ions that all
printable characters have positive values.
The Japanese face an entirely different problem. They inherited tens of
thousands of Kanji characters from the Chinese. They also use several phonetic
alphabets--Hirigana, Katakana, and Romanji (the Western alphabet). But they
refuse to give up the compactness and delightful ambiguity of Kanji. They are
not alone. The Chinese, Koreans, and Arabs likewise have huge alphabets that
form an important part of their respective cultures. Nobody wants to stop
using something that works well just because it's inconvenient for American
software to process.
(In fairness to the Japanese, I should make an important observation here.
They do not insist that new programming language standards support Kanji. They
want them to support all large character sets, from all cultures around the
world. That also helps with an internal political/technical problem in Japan.
Several coding schemes are in common use for Kanji, just as both ASCII and
EBCDIC are used in the U.S. The C Standard has always been general enough to
accommodate both of the latter. It now also accommodates all the known ways to
encode Kanji. And it allows for a variety of ways to encode the other large
character sets of the world.)
Over the years, Japanese programmers have developed two distinct ways to
augment text-processing software for large character sets. In the language of
the C Standard, these are called "multibyte characters" and "wide characters."
We included both in C because each has its uses. Naturally, that means we also
had to include ways to convert between conventional, multibyte, and wide
characters.


Multibyte Characters


An old trick for expanding a character set is to give each code multiple
meanings. The old Teletype Model 37, for example, could print both English and
Greek characters. Send the terminal a "shift out" code (SO) and it began
speaking Greek. A "b" printed as a beta, as I recall. Subsequent characters
also printed funny until you sent a "shift in" code (SI). The terminal then
reverted to more customary behavior; see Figure 1.
Figure 1: Model 37 code for printing Greek.

 TEXT: A [SO]b[SI]-ray is an electron.

 DISPLAY: A beta-ray is an electron.

You get more mileage out of each character code this way, but at a price. How
you interpret each code depends on what has gone before. You might assume, for
example, that each sequence of characters begins in an "initial shift state."
For our Model 37, that would be printing English characters. Most characters
that follow are interpreted in this context to determine the "metacharacter"
you really mean to designate. Some characters simply alter the current shift
state. They specify no metacharacter at all, at least not by themselves. The
Model 37 code may have (almost) doubled the number of characters you can
represent, but it must maintain one bit of state information to determine each
metacharacter.
The Japanese JIS code takes this approach a step farther. In the initial shift
state, each character defines a single metacharacter. ASCII is ASCII. You
shift out to Kanji with the three-character sequence \33$B (ESC, dollar sign,
capital B). In this state, each subsequent pair of characters determines a
single metacharacter. Both the first and second characters of a Kanji pair
must be in the range [0x21, 0x7e]. You shift in to ASCII with the
three-character sequence \33(B; see Figure 2.
Some simple arithmetic tells you that you can specify nearly 10,000 distinct
metacharacters with JIS. That's nowhere near all the Kanji characters--only
the more popular ones are included. Still, it's worlds better than the mere
256 codes supported by a single 8-bit character. The price once more is added
complexity. Parsing a JIS string takes work. It requires state memory just
like the Model 37 code. And opportunities abound for making malformed strings.
It is possible to eliminate the need for state memory. The Japanese Shift JIS
code sets aside certain character codes to signal the start of a two-character
sequence. A character in the range [0x81, 0x9f] or [0xe0, 0xfc] must be
followed by a character in the range [0x40, 0xfc]. Together, these define a
single Kanji metacharacter. Any other first character defines the
metacharacter all by itself. (Again, ASCII is ASCII). See Figure 3.
Extended UNIX Code is a variation on the same thing. It was contrived to
simplify the conversion of many UNIX utilities to processing Kanji text.
Essentially, any character with its sign bit set (in the range [0x80, 0xff])
is part of a two-character sequence. No shift state need be retained. But you
still need to keep track of where you are within a multiple-character
sequence.
(I have studiously avoided using the obvious terms "byte" and "multibyte" in
this description. I have been equally careful to distinguish between
"characters" and "metacharacters." C has long had the rule that a character
occupies a single byte. C often lives on machines where a byte consists of 8
bits. That has led to endemic confusion between the notions of character,
byte, and octet of bits, and the confusion will not soon disappear.)
The reasons for using multibyte sequences should be obvious. We live in a
world of character streams. Disks, diskettes, parallel ports, and serial ports
all traffic in sequences of 8-bit bytes. To ignore this world would be
foolish. A large character set must be representable as sequences of bytes.
Yet there are equally obvious drawbacks to multibyte sequences. You can't
manipulate individual characters without a lot of parsing. You can't paste
strings together without careful thought about shift states (in the general
case). At the very least, you may have to introduce many redundant shift
sequences to be on the safe side.


Wide Characters



If you want to manipulate characters inside a program, it's easiest if they're
all the same size. An alternate representation for large character sets has
just this property. A wide character is an integer large enough to represent
distinct codes for all the characters in the set. It can be type char, short,
int, or long. Or it can be one of the unsigned versions of these types.
Standard C provides the type definition wchar_t for the wide-character type.
Include either of the headers <stddef.h> or <stdlib.h> to define this type.
Just as there are several multibyte encodings for Kanji, there are also
several wide-character encodings. The more popular ones are easily derived
from one of the multibyte encodings. Essentially, you cut and paste bits from
the two characters in the multibyte representation to make the wide-character
code; see Figure 4.
More Details.
Figure 4: Converting Shift JIS to wide character.

 MULTIBYTE: is 1[0x8C][0x8E].

 WIDE CHAR: ['i']['s']['']['1'][0x8C8E]['.']

Wide-character encodings tend to be a private matter for each implementation.
Imagine trying to exchange data between two different systems by shipping wide
characters. First you must make sure that both implementations of C use the
same number of bits to represent wchar_t. Then you have to worry about whether
the byte orders are the same. In the general case, you have to transform the
code values in some way. It's much easier simply to read and write a common
multibyte code. Then you don't much care about the various internal forms for
wide characters.
C programmers do care somewhat about wide-character codes. Code value 0, for
example, must be reserved for the null wide character. Otherwise,
wide-character strings are a nuisance to manipulate. And you want 'a' to have
the same numeric value when converted to a wide character. In fact, any value
you can store in an unsigned character should have the same numeric value when
converted to a wide character. Otherwise, all sorts of subtle but nuisancy
problems arise. The C Standard endorses no particular wide-character encoding,
but it does impose a few restrictions on acceptable code sets.
The C Standard imposes similar constraints on multibyte encodings, by the way.
Code value 0 always stands for the null character. It can never appear as part
of a longer character sequence representing some metacharacter. If the
encoding has shift states, then the initial shift state is somewhat
constrained. All the basic C characters (the ones you need to express a C
source file) stand for themselves. Put another way, 'a' stands for lowercase
"a" in the initial shift state. It is never the first character of a longer
character sequence. Again, these few constraints let the C programmer use
proven techniques to manipulate even multibyte strings. (For a discussion of
the implications of wide-character on C++, see the accompanying text box, "So
What About C++?".)


Extensions to C


We added as little as possible to C to support multibyte and wide characters.
In a C source file you can write a multibyte sequence in one of the following
ways:
As part of a comment.
Within a "wide-character constant" such as L'x'.
Within a "wide-character string literal" such as L"kon ban wa".
In the last two cases, you specify one or more wide characters in the
executable code by writing multibyte sequences in the C source. In all cases,
the multibyte sequence must begin and end in the initial shift state (if shift
states matter). It is up to each implementation to choose multibyte and
wide-character encodings. Note that an identifier cannot include a multibyte
sequence--you're still confined to the English alphabet for contriving names.
(Several proposals are kicking around ISO to generalize the rules for writing
identifiers in all programming languages, however.)
You can also write multibyte sequences in all the formats used by the library
print and scan functions. That lets you intermix multibyte literal text with
converted values on output. It also lets you match such text on formatted
input to a limited degree. The problem with input comes, as usual, with shift
sequences. They let you specify the same sequence of metacharacters many
different ways. But the scan functions still match literal text character by
character. That can lead to all sorts of unpleasant surprises for innocent
users.
The only other addition to the C Standard is a handful of library functions.
The header <stdlib.h> now declares the following functions:
mblen, for determining how many characters in a multibyte sequence constitute
the next metacharacter.
mbtowc, for converting a single metacharacter from multibyte to wide
character.
wctomb, for converting a single wide character to a multibyte sequence.
mbstowcs, for converting a null-terminated, multibyte string to a
null-terminated, wide-character string.
wcstombs, for converting a null-terminated, wide-character string to a
null-terminated, multibyte string.
Besides the type wchar_t mentioned earlier, the library also defines two
macros. These help you allocate work buffers for code that converts between
multibyte and wide-character encodings:
MB_CUR_MAX, defined in <stdlib.h>, is the length of the longest permissible
multibyte sequence for a single metacharacter in the current locale.
MB_LEN_MAX, defined in <stddef.h>, is the same length across all locales.
Yes, an implementation can change its multibyte and wide-character encoding
when it changes locales, at least in principle. Such antics are fraught with
peril, however. I suspect that only the more ambitious implementations will
permit such games.


Future Additions


All sorts of additional functions would be useful for manipulating large
character sets:
Wide-character analogs of the <ctype.h> and <string.h> functions.
Wide-character analogs of the conversion functions in <stdlib.h>, such as
strtod and strtol.
Wide-character analogs of the string I/O functions sprintf, vsprintf, and
sscanf.
I/O functions that convert automatically between multibyte files in the
outside world and wide characters inside the program.
We did think about these issues when we drafted the C Standard. But remember
we already felt that we were running late. So we chose to include only the
bare minimum of functionality. We figured that more extensive library support
would emerge as people understood better how to manipulate large character
sets in C.
We figured right. The Japanese have proposed an extensive addition to the
Standard C library. It includes all the functions outlined above. It also
describes some of the subtler semantic issues in greater detail. I've glossed
over many such issues here because of space limitations.
The ANSI C Standard was approved in 1989. ISO C followed in 1990. Normally, a
language standard remains stable for at least five years before it gets
revisited. You'd think the Japanese proposal had missed the boat, but thanks
to an accident of ISO politics, that's not the case. For a variety of reasons,
the ISO C committee has the charter to produce a "normative addendum" to the C
Standard. It wasn't hard to convince the committee to include the Japanese
proposal as part of that addendum.
The net result is that the C Standard will likely be changed within the next
year or so. Essentially, that change will incorporate the Japanese extensions
to large-character support. The extensions are confined to the library, and
they are fairly pure. That means that existing C programs should not change
meaning when these new functions are added. Your biggest worry will be whether
any existing external names collide with the names of added functions. And
that, as we all know, is a perennial problem with progress.


Living with Large Character Sets


Now you know the basics of large-character set support in Standard C. What
should you do about it? As I mentioned at the outset, you probably don't have
to do much of anything right now. What you do in the near future depends on
your expectations for the code you write.
If you believe your code will never care about large character sets, you can
generally ignore them. We tried to contrive the C Standard so the cost is low
for those who don't use large character sets. Even implementors can get off
cheap. A C compiler for a small microprocessor can, for example, define
wchar_t as type char. The five conversion functions then become trivial. The
print and scan functions don't have to change. Your code can stay lean and
mean.
For many applications, a wiser approach is to make it multibyte tolerant.
Remember that a multibyte string often looks like any other null-terminated
string. You wouldn't second guess the structure of a filename in a portable
program, would you? Then learn to be just as tolerant of text strings you read
and write. They might one day be multibyte strings. If you don't try to chop
them up or paste other characters in the middle, they will probably survive
passage through your code. Who knows, your application may one day start
speaking Japanese or Arabic.
Some applications must learn to be multibyte aware. You use the multibyte
parsing functions religiously when manipulating strings. You probably want to
adapt to the locale preferred by each user. (My book The Standard C Library
contains complete code for manipulating locales and large character sets with
varied encodings.) You may even want to use arrays of wide characters for
manipulating some text.
A few applications will have to be wide-character oriented. These work
exclusively with wide characters instead of conventional characters. They
convert to and from multibyte characters only when communicating with the
outside world. Such applications really benefit from the additions to Standard
C proposed by the Japanese. (I understand that Windows NT fits this
description.)
My personal belief is that conventional character strings will not soon go
away. They meet most of our needs, even when dealing with large character
sets. But I also see a growing use of wide characters in the years to come.
Internationalization is a major driving force, but it is not the only one.
Remember that large character sets have uses well beyond Japanese word
processors. They can also be handy for representing characters of different
point sizes or colors in a typesetting package. Or they can represent musical
notes of different pitches and durations. I leave other uses to your
imagination.



So What About C++?


C++ is still being standardized jointly by ISO WG21 and ANSI X3J16. Upward
compatibility with Standard C is a clearly stated goal. Thus, all the current
support for large character sets has already been adopted as part of C++.
What to do with the proposed Japanese extensions is another matter. These
include literally dozens of new functions to manipulate wide-character
strings. All are direct analogs to the old C standbys for manipulating
conventional character strings. To name just two examples, strlen begets
wcslen and sprintf begets wcsprintf.
C++ provides function-name overloading. It is considered much better to
overload one name than to introduce a trivial variant of that name. Thus, C++
may very well overload strlen for both character and wide-character arguments.
Do the same for all those dozens of functions and you can see a real
improvement.
At least one technical problem remains to be solved for this approach. In C,
the wide-character type wchar_t is simply a synonym for some existing integer
type. That might very well be char or int. So the two declarations

 size_t strlen(const char *);
 size_t strlen(const wchar_t *);

may be indistinguishable on some implementations. This does not make for
portable code.
C++ must find some way to distinguish wchar_t from other integer types with
the same representation. It must do so without severely compromising upward
migration of C code. Several approaches can work, but the C++ standards
committee has yet to choose one.
It is an open issue whether C++ includes the Japanese proposal as is. Even if
it does, however, function overloading will almost certainly be provided as
well.

--P.J.P.













































August, 1992
NUMERICAL EXTENSIONS TO C


An NCEG status report




Robert Jervis


Bob is an independent consultant, programming, writing, teaching C courses,
and developing new operating-system and language software. He can be reached
at bjervis!rbj@uunet.uu.net.


There's a joke in the supercomputing community that scientists don't know what
programming language they'll be using in the 21st century, but they do know
it'll be spelled "Fortran." If the extensions being designed by the NCEG
succeed, 21st century scientists may instead spell it "C."
The Numerical C Extensions Group (NCEG) began as a working group of members of
the ANSI C committee interested in numerical programming issues, and it has
been meeting regularly since mid-1989. In March 1991, the group was officially
recognized as a subcommittee working under ANSI with the designation of
X3J11.1. This group will not produce a full standard, but instead will write a
technical report. In the world of standards a technical report does not carry
as much weight because government agencies will not demand conformance to it.
However, the NCEG report will act as practical guidance to anyone wanting to
extend C in the directions covered by the report. When the C standard is next
revised around 1995, the directions of the NCEG report will be important input
for the new standard.
The group has organized its activities into eight topics: aliasing, array
syntax, IEEE floating-point support, complex arithmetic, variably dimensioned
arrays, exceptions and errno, aggregate initializers, and extended-integer
range.
Progress on each of these topics has proceeded at its own pace, with widely
varying degrees of activity and interest. Some of these topics, such as IEEE
floating point, have proceeded to the point that drafts of their section of
the report are circulating among other committees for comment. Others, like
the recently added extended-integer ranges, still have fundamental issues to
resolve.


Aliasing


An alias exists whenever you have more than one way to reach a piece of
memory. When a pointer contains the address of a static or automatic variable,
that pointer is an alias of the variable. When two pointers point at the same
memory location, they are aliases for the same object.
Pointers in C have always presented formidable problems for optimizing
compilers, because C code has been written for years with numerous aliases and
no way to tell the compiler where the aliases are present. C code routinely
bumps a pointer through an array, for example, instead of using subscripting.
When a compiler tries to optimize code that contains references using several
pointers, the compiler must do one of two things: It must either trace the
history of those pointers to determine whether they are aliases or not, or
assume the worst and act as if the pointers might be aliases. Most advanced
optimizing compilers do go to the trouble of tracing histories, because the
benefits can be profound.
On a PC, knowing that two pointers are not aliases can help the compiler keep
a few temporaries in registers rather than have to recompute values. In a
tight inner loop that might mean generating half as many memory references. On
a supercomputer, however, knowing two pointers are not aliases can determine
whether or not the compiler can use vector instructions. In the same tight
inner loop, using vector instructions can mean more than a factor-of-ten
improvement in speed!
When tracing the history of a pointer, a C compiler runs up against a brick
wall at the top of a function. A compiler doesn't know whether the arguments
passed to a function are aliases. Unless the compiler can find all the calls
to the function to continue tracing back (and I don't know of a C compiler
that even tries), the compiler must give up and assume that aliases might be
present.
Some of you may remember that the original ANSI C committee put a keyword into
the language called "noalias" to solve this optimization problem but withdrew
it. Since that attempt, people have gone back and tried to eliminate the major
problems with noalias. The proposal currently being considered uses a
different keyword, "restrict." The basic idea is that any pointer can be
declared as being a restricted pointer, which means that when the pointer is
created, it is not an alias of anything else. This is really a promise from
the programmer.
If you declare a pointer-function parameter to be restricted, you are
promising that no one will call this function with an alias in that parameter.
When the compiler traces back to the top of the function, it finds your
promise and can conclude that no aliases are present. The variable in Example
1(a) is restricted. Only pointers used to modify memory really matter as far
as aliases are concerned, so b and c do not need to be marked as restricted
pointers.
Example 1: (a) A restricted variable; (b) calls involving undefined behavior.

 (a)

 void f3(int n, float *restrict a, float *b, float *c) {
 int i;
 for (i = 0; i < n; i++)
 a[i] = b[i] + c[i];
 }

 (b)

 float x[100];
 float *c;

 void f5(int n, float *restrict a, float *restrict b) {
 int i;
 for (i = 0; i < n; i++)
 a[i] = b[i] + c[i];
 }
 void g5(void) {
 float d[100], e[100];
 c = x;

 f5(100, d, e); /*Behavior defined. */
 f5( 50, d, d+50); /*Behavior defined. */
 f5( 99, d+1, d); /*Behavior undefined.*/

 c = d;
 f5(100, d, e); /*Behavior undefined.*/
 f5(100, e, d); /*Behavior defined.*/
 }

In Example 1(b), two of the five calls involve undefined behavior because some
sort of overlap exists that involves modified values. The first call works
because it is just adding arrays x and e together and storing them in d. The
second call illustrates that contiguous portions of arrays can be used, as
long as they don't overlap. The third call involves adding d to itself, but
shifted by one element. An optimizing compiler might perform f5 using vector
hardware that would produce different results, depending on the size of the
vectors and the order in which pieces are calculated. The fourth call fails
because d is being aliased. This particular example will probably get it
right, but that depends on exactly what f5 does internally. A slightly
modified f5 (for example with a[i] = b[i] + c[i+1]) could get the wrong
answer. The last example works fine because the alias is between two pointers
that are not used to modify data. A draft of this proposal is now circulating
among other standards bodies for comment.


Array Syntax


The subgroup examining array syntax has undergone a significant transformation
since the forming of NCEG. The original topic was focused on syntax to make
generating code for vector processors easier. Supercomputers like the Cray
X/MP are designed to perform arithmetic on whole blocks of an array at a time
by providing a separate floating-point unit for each element of a vector. It
is a natural fit, then, for a programmer to be able to write expressions
involving whole arrays at a time.
In the last couple of years, however, emerging interest in massively parallel
computers has shifted the focus of the group. In a nutshell, Amdahl's Law
states that as you add more processors attached to a shared memory, you reach
a point of diminishing returns where additional processors block each other
from access to the memory. Massively parallel computers try to solve that
problem by giving each separate processor its own memory. Advances in
microprocessor technology have made that approach economically attractive.
Unfortunately, most existing programming languages like C were not designed
for a massively parallel computer. [Editor's note: For more discussion on
Amdahl's Law, see "Personal Supercomputing: Seamless Portability" by Ian
Hirschsohn, DDJ, July 1992.]
What does it mean to have a pointer when there may be 10,000 processors, each
with 10,000 separate memories? Massively parallel computers solve complex
problems by distributing arrays across the many memories of the machine. Each
processor solves the problem on its local piece of the array, without
competing with other processors for access to memory. The conventional C
approach of bumping a pointer through an array simply does not work
efficiently in that environment.
At the January 1992 NCEG meeting, the array-syntax group decided to focus on
solving this problem. Several existing dialects of C have tackled this
problem, including C* from Thinking Machines and MPC from MasPar. The
array-syntax group will be hammering out a compromise that merges the
capabilities of those dialects into a data-parallel C.
The interest in this group is high and a special meeting was held at Thinking
Machines in April to move the group rapidly forward. The group has decided to
use the C* Reference Manual as its base document for future work. This
decision was difficult to achieve because this group has received 14 different
significant proposals, whereas the other NCEG groups have received only one or
two proposals each.
The C* language introduces a new aggregate type called "shape" that has
dimensions like an array, except that elements are not stored contiguously in
memory and are usually distributed across multiple memories. The C operators
are overloaded to operate element-wise on objects of the same shape.
Operations which are not element-wise or involve operands of different shapes
require the use of a new left-indexing subscript operator. Example 2(a)
illustrates the use of a shape.
Example 2: (a) Sample use of a shape; (b) this assignment can be performed 100
times; (c) this code is the equivalent of that in (b).

 (a)

 Shape [10] [10] Sa;

 double: Sa a, b, c;

 f() {
 double n;
 int i;

 a = b + c;
 n = [4] [i]b + 2.0;
 where (b < 5) {
 a++;
 c = b;
 }
 }

 (b)

 void f (void) {
 iterator I = 100;
 float a[100], b[100];

 a[I] = b[I] + 1;
 }

 (c)

 void f (void) {
 int I;
 float a[100], b[100];

 for (I = 0; I < 100; I++)
 a[I] = b[I] + 1;
 }

The arrays b and c can be added together as if they were normal C variables,
but because they have been declared with a shape, corresponding elements of
the shape will be added instead. The left indexing illustrates how a single
element of a shape can be selected. Left indexing is rarely needed, however,
because other operations exist that select subsets of a shape. For example,
the Where statement above selects only those positions of the shape for which
the test is true, and then executes the block code (incrementing a and copying
b to c) only for those selected positions.
There is still some controversy concerning whether parallel operations and
data distribution can be made independent. An alternative proposal has been
put forward involving a concept called iterators to express parallel
operations. An "iterator" is a new kind of object that signals the need for
"iteration" when it is used in an expression. Example 2(b) causes the
assignment to be performed 100 times. An easy way to think about iterators is
that expressions are executed as if a for loop were wrapped around the
expression using the iterator variable in the loop. So Example 2(b) can be
equivalent to Example 2(c).
Iterators do not address the problems of data distribution in massively
parallel computers, but they may be integrated into a larger proposal that
does address these issues.



IEEE Floating-point Support


Since more and more machines are standardizing on IEEE floating-point
hardware, the floating-point subgroup has been working to specify how C should
be implemented on machines supporting IEEE floating point. The most important
elements of the IEEE floating-point standard (ANSI/IEEE 754-1985) not found in
Standard C are the notions of infinity and NaN (not-a-number), along with the
notion of the floating-point environment and a more complete computational
library.
An infinity is a value so large in magnitude that it cannot be represented in
the range of a floating-point value, so it is more than just the mathematical
concept of infinity. Any nonzero number divided by 0 produces infinity, but so
does a very large number divided by a very small number, as long as the result
overflows the floating-point range.
A NaN is a mechanism that IEEE defined to carry information about erroneous
conditions that arise during computations. For example, trying to compute the
logarithm of a negative number produces a NaN. A programmer need not check
hardware error flags or C's errno after every step of a computation. Once a
NaN has appeared in a computation, any further expressions using that NaN also
produce NaN. Rather than having errors introduce numbers that might be
produced by normal computations, a NaN is uniquely identifiable as an error
condition.
A floating-point environment is a set of dynamic runtime flags that control
rounding of intermediate results and reporting of exceptional conditions.
The data types used to hold intermediate results are another area addressed by
this group. The current C rules demand that at each operator, the computation
occur with a precision at least as wide as the wider of the operands, but
otherwise leave it up to the compiler. One of the alternative evaluation
methods being discussed, called "widest need," would widen all values to the
widest type appearing anywhere in the expression, not just at each operator.
In Example 3, the value of x + y is computed in at least float precision,
according to the C standard. Using widest need, this addition is computed
using long double precision because the assignment (and thus the multiply and
the additions) all need long double.
Example 3: The value of x+y is computed in at least float precision according
to the C standard.

 float x, y, z;
 long double u, v;

 u = (x + y) * (z + v);

An additional area not demanded by the needs of IEEE arithmetic, but deemed to
be useful in general, is overloaded functions. The IEEE group defines a new
header (fp.h) that contains a set of overloaded functions, such as sqrt, that
use a distinct implementation that depends on the types of their arguments.
User-supplied overloading is not provided, but C++'s overloading could
accomplish most of the capabilities needed. Widest-need evaluation presents a
unique problem because it is supposed to affect the choice of overloaded
functions. In effect, C++ overloading only uses the types of the function
arguments to pick which function to call, while widest-need evaluation uses
the needed return type as well.
The IEEE subgroup has produced a document that is now circulating among other
standards groups for comment in draft form. As much as possible, the
extensions have been defined so that any floating-point hardware could take
some advantage of them.


Complex Arithmetic


The complex arithmetic group has produced a specification for complex data
types of three precisions: float complex, double complex, and long double
complex, representing values in Cartesian, not polar, coordinates.
These types are accompanied by a new floating-constant suffix, i, which
promotes the constant to have the corresponding complex type. Example 4
assigns a value to x, with the real part being 3.0 and the complex part being
5.0. This notation very closely mimics the notation used by mathematicians.
Example 4: Assigning a value to x with the real part being 3.0 and the complex
part being 5.0.

 double complex x;

 x = 3 + 5i;

The proposal also extends the normal arithmetic operators for complex types
and adds conversion rules for combining complex, floating-point, and integral
types in expressions.
A new header (complex.h) has been proposed that contains prototypes for
complex functions like complex sines, cosines, exponentials, logarithms,
powers, and square roots. In addition, functions exist to extract the real and
imaginary parts of complex numbers, as well as the conjugate of a complex
number.
An issue which has not been completely resolved is whether there should be a
type called "imaginary." This would strictly be the imaginary part of a
complex number. Mathematically, an imaginary is just a complex number with a
zero real part. For C, this type would solve some exotic issues relating to
NaNs.


Variably Dimensioned Arrays


The ability to specify varying array bounds for multidimensional arrays would
greatly simplify the code used to access such array parameters. For example, a
matrix-multiply routine that works on two-dimensional square arrays today must
pass those arrays as simple pointers and explicitly perform the index
arithmetic. Clearly, Example 5(a) indexing would be much more obvious if it
could be written as in Example 5(b).
Example 5: (a) Indexing; (b) clearer indexing; (c) arrays of variable
dimension.

 (a)

 void f(int m, float *a) {
 ... a[i * m + j] ...
 }

 (b)

 void f(int m, float a[m] [m]) {
 ... a[i][j] ...
 }

 (c)

 void f(int n) {
 float a[n];

 /* ... */
 }

There are currently two competing proposals. In the "fat-pointer" proposal,
one can declare pointers to variably dimensioned arrays. These objects are
pointers that include not only an address but also a descriptor for each of
the dimensions of the array being pointed at.
The alternative proposal allows variable dimensions on array parameters to
functions as a way to simplify writing library routines for multidimensional
arrays.
Both proposals now allow automatic arrays of variable dimension for more
flexible local storage; see Example 5(c).
Some implementations of C provide a nonstandard function alloca that allows
one to allocate auto storage that is automatically recovered on function
return (usually done by directly manipulating the hardware stack pointer).
This function is not possible on some hardware, however, which is why it is
not part of the Standard. Variable-length automatic arrays have the same
problems. The proposal that allows automatic variable-length arrays allows an
implementation to use malloc to allocate space for the object. As a result,
setjmp and longjmp may not work properly around variable-length arrays.


Exceptions and errno


The specification of Standard C requires that errno be set when certain errors
occur in math functions. This mechanism is not the approach that IEEE has
advocated for signaling error conditions. A number of hardware implementations
are significantly slowed down by the need to set errno correctly.
Not coincidentally, the IEEE floating-point support <fp.h> math functions
specifically do not have to set errno.
At this point a specific proposal has not been formulated. The general feeling
is that the support for errno may not be present in an extended math library,
which will perhaps use the IEEE floating-point environment instead.


Aggregate Initializers


Aggregate initializer extensions have been proposed to solve two slightly
different problems. First, programmers must be able to initialize selected
elements of an array or structure. Second, they must be able to specify an
initializer as a constant in an expression.
A proposal has been accepted that allows either an array subscript expression
or a structure member name to appear in an aggregate initializer. This
selector may appear before any individual value in the initializer. That value
is then assigned to the named element or member, and subsequent values are
assigned to consecutive elements or members after that.
Initializing selected elements of an array or members of a structure
simplifies several different situations. In Example 6(a), the div_t type is
specified as part of the standard as having two members, quot and rem, for the
quotient and remainder of a division, respectively. But which member comes
first? With the above initializer, it doesn't matter.
Example 6: (a) The div_t type is specified as part of the standard as having
two members; (b) initializing an array with the same elements as an
enumerator; (c) an array that needs nonzero entries near each end of the
array; (d) combining subscripting and member selection; (e) using an extension
to initalize a union; (f) constructing aggregate constants in expressions.

 (a)

 div_t answer = { .quot = 1, .rem = 0};

 (b)

 enum etag { Mem1, Mem2, /* ... */ };
 const char *nm[] = { [Mem2] = "Mem2", [Mem1] = "Mem1",
 /* ... */};

 (c)

 int a[MAX] = {
 1, 3, 5, 7, 9, [MAX-5] = 8, 6, 4, 2, 0

};

 (d)

 struct s { int a[3], b; };
 struct s w[] = { [0].a = { 1 }, [1] .a[0] = 2 };
 struct s z[] = { { 1 }, 2 };

 (e)

 union utag { /* ... */ } u = { .any_member = 42 };

 (f)

 struct point { int x, y };
 void drawCircle (struct center, int radius);

 struct point p;


 p.x = 11; p.y = 12;

 drawCircle(p, 15);

 drawCircle ((struct point) { 11, 12 }, 15);

For an array that must be initialized with the same elements as an enumerator,
the code in Example 6(b) can be used. You can be confident that in the table,
the strings will correspond to the correct array elements, even if you later
add more enumerators or change their values.
Suppose you have an array that only needs nonzero entries near each end of the
array, as in Example 6(c). Subscripting and member selection can be combined;
see Example 6(d).
Both w and z are initialized to the same values. The initializers are both
inconsistently bracketed, but in the first case, it is clear which elements
are being set to which values. This extension can even be used to initialize a
union; see Example 6(e).
There has also been some sentiment for supporting initialization for ranges of
array elements. Often, you wish to set all the elements of an array to
something other than 0. Several suggestions have been made for possible
syntax, but nothing has yet been accepted. There are other extensions that may
also need to express ranges of values, so it is likely that once some notation
can be agreed upon for expressing ranges, they will be adopted wherever
appropriate.
Another area that has an accepted proposal provides the ability to construct
aggregate constants in expressions; see Example 6(f). In the second call, an
aggregate constant is used to avoid creating an extra variable. Nonconstant
runtime values can appear in the aggregate constants.


Extended Integer Range


Support for extended integer range is the newest addition to the work items of
NCEG. Numerous hardware vendors either offer machines with 64-bit integers, or
are planning to introduce such machines. C programmers have always thought of
char, short, and long integers as being 8, 16, and 32 bits wide. Standard C
requires that each of these types be at least that wide. These new machines
with 64-bit integers present some problems. If such a machine makes long 64
bits wide, any software that assumes that long is exactly 32 bits wide will
fail. Similarly, if int is made 64 bits wide, software that assumes int is
either 16 or 32 bits wide will fail. Probably most troublesome of all, on
machines that support 64-bit integers, programmers will still want 8-, 16-,
and 32-bit integers as well. C can make char, short, int, and long each into
distinctly sized integers, but what happens when -- some day -- 128-bit
integers appear.
As it happens, when surveys of actual C code have been made, it turns out that
the biggest problem is with configurations in which the size of int differs
from the size of pointers. After years of telling programmers that ints and
pointers are not interchangeable, significant amounts of code still make that
assumption. And although we may condemn such code, we must accept that it
exists and that customers will not be happy if they are forced to convert.
Because of these difficulties, some implementations have decided to create a
new type, "long long," that is used for 64-bit integers. The members of NCEG
feel strongly that "long long" is a very poor design choice. The
extended-integer range group has decided that despite disapproving of "long
long" as a new integral type, they will produce some guidelines as part of the
technical report. These guidelines will point out practical portability issues
involved in defining a "long long" type.
Given the strains of incorporating an additional integer size into the C type
system, there is a strong desire among group members to find a more general
solution. The solutions suggested have centered around being able to declare
the minimum size of an integer. Integers could be specified as being some
minimum number of bits, or else as a range of values. In Example 7, for
instance, the declarations would make x a variable that is a signed integer at
least 32 bits wide. The type of y is some integer large enough to hold the
values from 1 through 100. This capability is similar to the ability to define
integer ranges in Pascal.
Example 7: The declarations make x a variable that is a signed integer at
least 32 bits wide. Type y is some integer large enough to hold the values
from a through 100.

 int:32 x;
 int {1 .. 100} y;

While the work is still very preliminary, the type system of C is not likely
to be as strict as that for Pascal. Integer ranges in C would simply be
synonyms for one of the supported integral types. Significant work is still
needed on conversion rules, along with the exact meaning of these
declarations.
One question still unanswered is whether a programmer will only be able to
specify types that are at least as wide as desired. Some wish to specify that
they want an integer exactly N bits wide. This is analogous to the existing
bit-fields of C, except perhaps allowing arrays of these new integer types.


Conclusion


There has been a subtle shift in the standards-making process since the
beginning of the ANSI C committee in 1983. At that time, a standards committee
was assembled after a language had been in use for some time and a de facto
standard had been determined by the marketplace. The standards committee was
charged with ironing out the differences between implementations and writing a
clear specification for all the dark corners.
Nowadays, standards committees are being used to create standards from
scratch. Companies are using these neutral forums as a way to arrive at
sensible solutions without having to risk years of possibly expensive conflict
between competing products. The Beta/VHS battle is an obvious example from the
consumer-electronics industry of the costs of letting the market pick winners.
The NCEG group is part of the new spirit in standards. Competing vendors can
come together and share their expertise before they invest large sums of money
in building C compilers and before customers invest huge sums of money in
competing designs. The customers are the ultimate winners. They can buy a C
compiler that supports an NCEG extension and feel confident that they can port
that code to another vendor.
Will C, with the addition of the NCEG extensions, replace Fortran? The NCEG
committee does not see this as the goal of the group, nor is it likely to be
accomplished. Programmers continue to use Fortran for reasons that have
nothing to do with whether it is the best language in the world. Decades worth
of code have built up in Fortran that no one wants to convert to C.
Programmers with decades of experience aren't willing to change languages
easily.
What will emerge from the NCEG extensions is a better C for those who want to
use it for numerical programming. The data-parallel extensions promise the
possibility that C will become one of the first truly portable languages
available for massively parallel supercomputers. If that happens, C will be
leading the way into the 21st century and the brave new world.


_NUMERIC EXTENSIONS TO C_
by Robert Jervis

Example 1:

(a)

 void f3(int n, float *restrict a, float *b, float *c) {
 int i;
 for (i = 0; i < n; i++)
 a[i] = b[i] + c[i];
 }



(b)

 float x[100];
 float *c;


 void f5(int n, float *restrict a, float *restrict b) {
 int i;
 for (i = 0; i < n; i++)
 a[i] = b[i] + c[i];
 }
 void g5(void) {
 float d[100], e[100];
 c = x;

 f5(100, d, e); /* Behavior defined. */
 f5( 50, d, d+50); /* Behavior defined. */
 f5( 99, d+1, d); /* Behavior undefined. */
 c = d;
 f5(100, d, e); /* Behavior undefined. */
 f5(100, e, d); /* Behavior defined. */
 }



Example 2:

(a)
 shape [10][10] Sa;

 double:Sa a, b, c;

 f() {
 double n;
 int i;

 a = b + c;
 n = [4][i]b + 2.0;
 where (b < 5){
 a++;
 c = b;
 }
 }

(b)

 void f(void) {
 iterator I = 100;
 float a[100], b[100];

 a[I] = b[I] + 1;
 }


(c)

 void f(void) {
 int I;
 float a[100], b[100];

 for (I = 0; I < 100; I++)
 a[I] = b[I] + 1;
 }

Example 3:



 float x, y, z;
 long double u, v;

 u = (x + y) * (z + v);

Example 4:

 double complex x;

 x = 3 + 5i;


Example 5:

(a)

 void f(int m, float *a) {
 ... a[i * m + j] ...
 }


(b)

 void f(int m, float a[m][m]) {
 ... a[i][j] ...
 }




(c)


 void f(int n) {
 float a[n];
 /* ... */
 }



Example 6:

(a)


 div_t answer = { .quot = 1, .rem = 0 };


(b)

 enum etag { Mem1, Mem2, /* ... */ };

 const char *nm[] = { [Mem2] = "Mem2", [Mem1] = "Mem1",
 /* ... */ };


(c)


 int a[MAX] = {
 1, 3, 5, 7, 9, [MAX-5] = 8, 6, 4, 2, 0
 };


(d)

 struct s { int a[3], b; };
 struct s w[] = { [0].a = { 1 }, [1].a[0] = 2 };
 struct s z[] = { { 1 }, 2 };


(e)

 union utag { /* ... */ } u = { .any_member = 42 };


(f)

 struct point { int x, y };
 void drawCircle(struct point center, int radius);

 struct point p;

 p.x = 11; p.y = 12;
 drawCircle(p, 15);

 drawCircle((struct point){ 11, 12 }, 15);

Example 7:

 int:32 x;
 int {1 .. 100} y;




























August, 1992
MULTIPLE-PRECISION ARITHMETIC IN C


Big jobs require big numbers




Burton S. Kaliski, Jr.


Burt is chief scientist of RSA Laboratories, a division of RSA Data Security.
He received a PhD in computer science from MIT in 1988 and is interested in
cryptography and fast arithmetic techniques. He can be contacted at
burt@rsa.com and at 10 Twin Dolphin Drive, Redwood City, CA 94065.


Computer arithmetic gets better and faster all the time. Once you could only
add 8-bit numbers, then 16 bits became the standard, now 32 bits, soon 64....
But what if you want to add, subtract, multiply, or divide 512-bit numbers?
Few computers have machine-language instructions for big-number arithmetic
and, for obvious reasons, even fewer programming languages have built-in
operations supporting it.
The ability to manipulate 512-bit (or larger) numbers is necessary, for
instance, when you're implementing mathematically oriented encryption schemes;
both the RSA and DSS schemes involve such numbers. (For more-detailed
information on RSA and DSS, see the sidebar entitled, "Public-Key Cryptography
Meets the Real World" on page 22 of the May 1992 DDJ.) So the question arises,
how can you operate on 512-bit numbers in a language such as C that only goes
as far as 32 bits?
For the answer, we can look at one simple implementation of what is called
"multiple-precision" (MP) arithmetic. MP arithmetic has been implemented in
RSA Laboratories' cryptographic toolkit, RSAREF, and the code has been ported
to many machines without modification. (RSAREF is available to U.S. and
Canadian citizens at no charge for non-commercial use; contact RSA
Laboratories, 10 Twin Dolphin Drive, Redwood City, CA 94065 or at rsaref
@rsa.com.)


Representing MPs


MPs are represented as arrays of type NN_DIGIT, where NN_DIGIT depends on the
machine. For instance, NN_DIGIT might be defined as an unsigned long, which
has 32 bits on most machines. For technical reasons, the number of bits b must
be even. Each element of the array is a digit in the base B representation of
the MP, where B= 2{b}. The minimum digit is 0 and the maximum digit is B-1.
Lower-indexed elements of the array are less significant than higher-indexed
elements. We call a[0] the 1s digit of array a, a[1] the Bs digit (similar to
10s digit), a[2] the B{2}s digit (similar to 100s digit), and so on.
For example, the ninth Fermat number, 2{512}+1, would be represented as an
array of 17 32-bit NN_DIGITS:
 a[0]=1
 a[1]=a[2]=...=a[14]=a[15]=0
 a[16]=1
Listing One (page 116) defines NN_DIGIT and some other items helpful in
handling MPs, including function prototypes.


MP Tools


We start with four tools: C's built-in addition, subtraction, multiplication,
and division operators. The tools let us:
Add two NN_DIGITs, and get the 1s digit of the sum (but not the carry-out).
Subtract an NN_DIGIT from an NN_DIGIT, and get the 1s digit of the remainder
(but not the borrow-out).
Multiply two NN_DIGITs, and get the 1s digit of the product (but not the Bs
digit).
Divide an NN_DIGIT by an NN_DIGIT, and get the quotient, also an NN_DIGIT.
MP operates on top of these tools. Adding and subtracting MPs is pretty easy;
multiplying them is harder; and dividing is hardest. We tackle the problems in
that order. We stop along the way to add tools for multiplying and dividing
NN_DIGITs.


First Step: MP Addition


In grade school you were taught that to add two numbers, you wrote them down,
one above the other, then added columns of digits from right to left. You
wrote down the 1s digit of the sum at the bottom of the column. If the sum had
a 10s digit, you wrote it down at the top of the column to the left. This was
the "carry," and the last carry became the leftmost sum digit.
Adding two MPs is much the same: We have a carry-in that's either 0 or 1 and
two addend digits; we want a sum digit and a carry-out. But it's harder to get
the carry-out than in the grade-school method because our tools only let us
see the 1s digit of a sum. So let's take it a step at a time.
We first add the carry-in to the first addend digit, and look at the 1s digit
of the sum. Let's call this the "subsum". Here's where we do a "twist" on the
grade-school addition. We don't know directly whether there is a carry. But we
do know that if there is, then the subsum must be less than the carry-in,
because the real sum has wrapped past the maximum digit--but it can't get as
far as the carry-in. (We detect carries this way throughout our MP
implementation.)
So if the subsum is less than the carry-in, we have to carry out. We also know
that the carry-in is 1, not 0 (otherwise we couldn't have carried out); and
the subsum is 0 (since it's less than the carry-in). Since the subsum is 0, we
write down the second addend digit as the sum digit, and go on to the next
digit.
If we don't carry out right away, we next add the subsum to the second addend
digit and look at the 1s digit of the sum, which we write down as the sum
digit. If the sum digit is less than the second addend digit, we have to carry
out. Otherwise, we don't. Figure 1 gives an example of this twist on
grade-school addition. The first row (1s and 0s) are the carries; the third is
the subsum. In the 1s column, the subsum is more than the carry, and the sum
digit is more than the second addend digit, so there's no carry out. In the
10s and 100s column, the subsum is more than the carry, and the sum digit is
less than the second addend digit, so there is a carry out. In the 1000s
column, the subsum is less than the carry, so there is a carry out. The twist
is that we can do everything with an addition operation that gives only the 1s
digit of its result.
Figure 1: Grade-school addition with a twist.

 11100 <--carry
 9876 <--first addend

 0976 <--subsum

 5432 <--second addend

 15308 <--sum

The procedure NN_Add (Listing Two, page 116) adds MPs with our limited tools.
The addends are b and c, and the sum is a. They all have length digits. Index
variable i moves through the digits from right to left, with carry as the
carry-in and -out. The procedure returns the final carry-out.
NN_Add adds MPs with a loop around a simple three-part conditional. The first
part computes the subsum. If there's a carry out, the first part sets the sum
digit to the second addend digit, and stops. The second part computes the sum
digit. If there's a carry-out, the second part sets carry to 1. The third part
sets carry to 0 when there's no carry-out.
The sum can share memory with the addends, since NN_Add stores intermediate
results in the variable ai. (The sum must share exactly the same memory; if it
overlaps only partially, the result is undefined.) All procedures described
here have the shared-memory feature.


MP Subtraction


Subtracting two MPs is just like adding two MPs, except that we borrow instead
of carry. Here, we have a borrow-in that's either 0 or 1, a subtrahend digit,
and a minuend digit (how's that for terminology!); we want a remainder digit
and a borrow-out.
We first subtract the borrow-in from the subtrahend digit and look at the 1s
digit of the remainder. We call this the "subremainder." If the subremainder
is more than the maximum digit minus the borrow-in, we have to borrow out. We
also know that the borrow-in is 1, and the subsum is the maximum digit. (This
is just like addition, except everything's reversed: "Less" is "more," "plus"
is "minus," "0" is "maximum digit.")
Since the subremainder is the maximum digit, we write down the maximum digit
minus the minuend digit as the remainder digit and go on to the next digit.
If we don't borrow out right away, we next subtract the minuend digit from the
subremainder and look at the 1s digit of the remainder, which we write down as
the remainder digit. If the remainder digit is more than the maximum digit
minus the minuend digit, we have to borrow out. Otherwise, we don't.
The procedure NN_Sub, shown in Listing Three (page 117), subtracts MPs. The
subtrahend is b, the minuend is c, and the remainder is a. As in NN_Add, they
all have length digits, and the index variable i moves through the digits. The
variable borrow is the borrow-in and -out. The procedure returns the final
borrow-out, which is 1 when the subtrahend is less than the minuend.


More Tools


Before going on to MP multiplication and addition, we need two additional
tools. These add to those built into the C language, and let us:
Multiply two NN_DIGITs and get a two-NN-DIGIT product with a 1s digit and a Bs
digit.
Divide a two -NN_DIGIT dividend with a 1s digit and a Bs digit by an NN_DIGIT
and get the quotient, also an NN_DIGIT.
We need another type for these tools: NN_HALF_DIGIT, which has half as many
bits as NN_DIGIT. (This explains the technical constraint that the number of
bits in an NN_DIGIT must be even.) The type depends on the machine; it might
be an unsigned short, which has 16 bits on most machines. The minimum half
digit is 0 and the maximum half digit is square root of B-1. NN_HALF_DIGIT is
defined in Listing One.
(These extra tools are in the instruction sets of most machines, even when
NN_DIGIT is 32 bits, but they're not in Standard C. So if you're working in
assembly language, your work is done. Read on to see how to do it in C.)


Digit Multiplication


We multiply two NN_DIGITs by combining four NN_HALF_DIGIT x NN_HALF_DIGIT
multiplications. We can multiply two NN_HALF_DIGITs with the tools already
built into the C language; the product is an NN_DIGIT. (The C language's
type-conversion rules require that we cast the NN_HALF_DIGITs to an NN_DIGIT
to get an NN_DIGIT product. A clever compiler will notice the casting and
generate an NN_HALF_DIGITxNN_HALF_DIGIT machine-language multiplication.)
The procedure NN_DigitMult, shown in Listing Four (page xx), multiplies
NN_DIGITs. The multiplicand and multiplier are b and c, and the product is a
(a two-NN_DIGIT array).
A familiar algebraic formula describes the mapping to four multiplications.
Let bHigh, bLow, cHigh, and cLow denote the high-order and low-order halves of
b and c, so that
 b=bHighx\/B+bLow ;
 c=cHighx\/B+cLow
Then we have:
bxc=bHighxcHighxB+((bHighxcLow)
 +(bLowxcHigh)squareroot of B+bLowxcLow
NN_DigitMult computes the four multiplications and stores the product of
low-order halves in a[0], the product of high-order halves in a[1], and the
two cross products in t and u.
The low-order halves of the cross products need to be added into the
high-order half of a[0]. Similarly, the high-order halves need to be added
into the low-order half of a[1].
NN_DigitMult adds the intermediate values t and u. If the sum is less than
u--there is a carry out -- the procedure adds 1 to the high-order half of
a[1]. NN_DigitMult then adds the low-order half of t+u to a[0], carrying into
a[1]. It finally adds the high-order half of t+u to a[1].


Digit Division


We divide a two-NN_DIGIT dividend by an NN_DIGIT divisor by combining two
NN_DIGITxNN_HALF_DIGIT divisions, four NN_HALF_DIGITxNN_HALF_DIGIT
multiplications, and some other arithmetic. We can divide an NN_DIGIT dividend
by an NNHALF_DIGIT divisor with the C-language tools; the quotient is an
NN_HALF_DIGIT, assuming the high half of the dividend is less than the
divisor. (It's worth mentioning that we're interested in the "floor" of the
quotient--the greatest integer not more than the quotient.)
The procedure NN_DigitDiv, shown in Listing Five (page 117), divides a
two-NN_DIGIT dividend by an NN_DIGIT divisor. The dividend is b (a
two-NN_DIGIT array), the divisor is c, and the quotient is a. The two-NN_DIGIT
variable t holds a copy of the dividend. NN_DigitDiv assumes that the quotient
is not more than the maximum digit. We guarantee this later when we call
NN_DigitDiv.
NN_DigitDiv follows the estimate-and-correct method. First, it estimates the
high-order half of the quotient with an NN_DIGITxNN_HALF_DIGIT division. It
then subtracts the product of the estimate and the divisor from the dividend.
It finally corrects the estimate by subtracting the divisor (shifted left a
half digit) from the dividend, until the dividend is less than the divisor
(shifted left a half digit). Next, NN-DigitDiv estimates the low-order half of
the quotient, given what remains of the dividend; it subtracts the product of
the estimate and the divisor from the dividend, and it finally corrects the
estimate. Usually there's only one correction for each half, so NN_DigitDiv
gets the correct quotient pretty fast.
NN_DigitDiv estimates the high-order half of the quotient by dividing t[1] by
the high-order half of c, plus 1. (If the high-order half of c were the
maximum half digit, division would be by square root of B, which is too large,
so the estimate is just the high-order half of t[1].)
NN_DigitDiv subtracts the product of the estimate aHigh and the divisor,
shifted left a half digit, from t. Then it subtracts the divisor, shifted left
a half digit, and increments the estimate until what remains of t is less than
the divisor shifted left a half digit.
The assumption that the quotient is not more than the maximum half digit
ensures that the estimate doesn't exceed the maximum half digit.
NN_DigitDiv then estimates the low-order half of the quotient by similar
means. It divides the low-order half of t[1], shifted left a half digit, plus
the high-order half of t[0], by the high-order half of c, plus 1. (If the
high-order half of c is the maximum half digit, the estimate is just the
low-order half of t[1].) Again, the estimate aLow is an underestimate, and the
procedure corrects it.
It's a good idea to normalize the divisor by shifting as far left as possible
before calling NN_DigitDiv, because otherwise it can take a long time to
correct the estimates. If the divisor is normalized, then the initial estimate
can't be off by more than 2. NN_DigitDiv thus makes at most two and usually
one correction. We guarantee that the divisor is normalized, too, when we call
NN_DigitDiv later.


MP Multiplication



Returning to grade school, remember that to multiply two numbers (a multiplier
and a multiplicand), you wrote them down, one above the other. Then you took
the 1s digit of the multiplier, multiplied the multiplicand by it, and wrote
down the product. You took the 10s digit of the multiplier, multiplied the
multiplicand by it, and wrote down the product, below and one digit to the
left of the first product. And so on; after you wrote down all the
intermediate products, you added them up, and the sum became the final
product.
Now let's consider grade-school multiplication with a twist. Instead of
waiting until the end to add, suppose we accumulate the sum as we go along.
Then we don't have to store the intermediate products, just the accumulator,
and the accumulator becomes the final product.
Multiplying two MPs thus involves two procedures: one that multiplies an MP by
a digit and adds the product to an accumulator, and one that moves through the
digits of the multiplier.
The procedure NN_AddDigitMult, shown in Listing Six (page 118), does the first
part. The digit is c, the multiplicand is d, the input accumulator is b, and
the output is a. The multiplicand and accumulators all have length digits.
Index variable i moves through the digits of the multiplicand from right to
left, with carry as the carry-in and -out. NN_AddDigitMult returns the final
carry-out.
NN_AddDigitMult is similar to NN_Add, except for two things. It calls the tool
NN_DigitMult to multiply digits, rather than adding them with a built-in
NN_DIGIT add (since we're multiplying, not adding); and carry is a digit, not
just 0 or 1. Here there are two conditionals. The first conditional adds the
carry. If there's a carry-out, the first conditional sets the carry to 1;
otherwise, it sets the carry to 0. The second conditional adds the low-order
digit of the NN_DIGITxNN_DIGIT product. If there's a carry out, the second
conditional increments the carry. NN_AddDigitMult then unconditionally adds
the high-order digit of the NN_DIGITxNN_DIGIT product to the carry and
continues.
The procedure NN_Mult, shown in Listing Seven (page 118), multiplies MPs. The
multiplier is b, the multiplicand is c, and the product is a. The inputs have
length digits and the output has length 2xdigits. Index variable i moves
through the digits of the multiplier from right to left. In fact it doesn't go
all the way to the left, only as far as the leftmost non-zero digit. (The
procedure NN_Digits, in Listing Ten, page 119, determines how far to go.)
NN_Mult similarly reduces the length of the multiplicand to avoid unnecessary
multiplications. It's easy to see how NN_Mult works with NN_AddDigitMult to
accumulate the product. Each call has a new multiplier digit, b[i], and
operates on a new part of the accumulator, &t[i]. NN_Mult stores
NN_AddDigitMult's carry as a new high-order digit of the accumulator. This is
all quite similar to the illustration of grade-school multiplication that
Figure 2 describes, in which the first row is the multiplicand, the second is
the multiplier, and the last is the product. Intermediate rows give
accumulator values and intermediate products. Intermediate products are
between a multiplier digit and the multiplicand. They are shifted left as many
digits as the multiplier digit. The accumulator value is the sum of the
previous accumulator value and an intermediate product. Intermediate products
can be added directly into the accumulator. (NN_Mult also calls two other
procedures we haven't discussed: NN_Assign, which copies an MP, and
NN_AssignZero, which sets an MP to 0. See Listing Ten.)
Figure 2: Grade-school multiplication with a twist.

 5432 <-- multiplicand
 9876 <-- multiplier
 ________
 0000 <-- accumulator value
 32592 <-- intermediate product
 ________
 32592 <-- accumulator value
 38024 <-- intermediate product
 ________
 412832
 43456
 ________
 4758432
 48880
 ________
 53646432 <-- final product




Last Step: MP Division


We're now ready to put together all our tools to do what's generally
considered the most difficult arithmetic operation: division.
We've already seen how to divide a two-NN_DIGIT dividend by an NN_DIGIT
divisor. The estimate-and-correct method has served us well. We extend that
method to MP division.
For each digit of the quotient, we first estimate the digit by dividing the
high-order two digits of the dividend (or what remains of it) by the
high-order digit of the divisor, plus 1. (If the high-order digit of the
divisor is the maximum digit, the estimate is just the high-order digit of the
dividend.) We then subtract the product of the estimate and the divisor,
shifted left some number of digits, from what remains of the divisor.
The estimate is an underestimate, so we may have to increase it, just as in
NN_DigitDiv. We do so by subtracting the divisor, shifted left some number of
digits, from the dividend, until what remains of the dividend is less than the
shifted divisor. When we reach the 1s digit of the quotient, we have a
remainder that's less than the divisor, and we're done. Figure 3 illustrates
grade-school long division with the latest twist. The first row is the
quotient. The first part of the second row is the divisor, and the second part
is the dividend. The last row is the remainder. Quotient digits are estimated
by dividing the two high-order digits of what remains of the dividend by the
high-order digit of the divisor, plus 1. If what remains after subtracting a
shifted multiple of the divisor is too large, the estimate is corrected. For
instance, the digits 53 are divided by (5 + 1) to get an estimate 8 for the
1000s digit of the quotient. What remains after subtracting 8000x5432 is too
large, so the estimate is incremented, and another 1000x5432 is subtracted.
Figure 3: Grade-school division with a twist.

 09876
 _________
 5432)053646432
 0000 <-- estimate: 5/6 = 0
 _________
 53646432
 43456 <-- estimate: 53/6 = 8
 _________
 10190432 (too large)
 5432 <-- corrected estimate: 9
 _________
 4758432
 38024 <-- estimate: 47/6 = 7
 _________
 956032 (too large)
 5432 <-- corrected estimate: 8
 _________
 412832
 32592 <-- estimate: 41/6 = 6
 _________

 86912 (too large)
 5432 <-- corrected estimate: 7
 _________
 32592
 27160 <-- estimate: 32/6 = 5
 _________
 5432 (too large)
 5432 <-- corrected estimate: 6
 _________
 0 <-- remainder

Dividing MPs, like multiplying MPs, involves two new procedures: one that
multiplies an MP by a digit and subtracts the product from an accumulator, and
one that moves through the digits of the quotient.
The procedure NN_SubDigitMult, shown in Listing Eight (page 118), does the
first part. It is just like NN_AddDigitMult. NN_SubDigitMult returns the final
borrow-out.
The procedure NN_Div, shown in Listing Nine (page 118), divides MPs. The
dividend is c, the divisor is d, the quotient is a, and the remainder is b.
The dividend and quotient have length cDigits, while the divisor and remainder
have length dDigits. (Notice that NN-Div returns both quotient and remainder,
whereas NN_DigitDiv returns only the quotient.)
The index variable i moves through the digits of the quotient from left to
right. (This is the first time we've moved in that direction, and we do so
because division reveals the most significant digit first.) NN_Div estimates
the quotient digit with NN_DigitDiv, subtracts the product of the estimate and
the divisor with NN_SubDigitMult, and corrects the estimate with NN_Cmp (MP
comparison, see Listing Ten) and NN_Sub.
NN_Div is a little more complicated than the other procedures; this is due to
NN_DigitDiv's two requirements. For efficiency, NN_Div normalizes the divisor.
It does this with NN_LShift (MP left shift, Listing Ten). The number of bits
to shift left is determined with NN_DigitBits. NN_Div shifts both the divisor
and the dividend, and at the end, shifts back the remainder with NN_RShift (MP
right shift, Listing Ten). The quotient is unaffected by the normalization.
For correctness, NN_Div extends the dividend with a leading 0 digit. This
guarantees that the leading digit of the dividend is less than the leading
digit of the divisor, as NN_DigitDiv requires. The condition continues
throughout the main loop.


Conclusions


In this article, we've gone from built-in C tools to multiple-precision
addition, subtraction, multiplication, and division. One possible next step is
modular arithmetic, where all results are divided by a predetermined quantity
called the "modulus," and only the remainder is kept. Modular arithmetic, and
modular exponentiation in particular, are essential to the RSA and DSS
schemes. You can also implement some number-theoretic operations such as
greatest common divisor with the tools presented here. Other next steps
include converting an MP to base 10--useful if you want to print your
results--and even computing the digits of pi.
It's not difficult to implement the arithmetic operations more efficiently
than we've done here, since we've tried to keep things simple.
Assembly-language speedups would help a lot. Some technical refinements would
help, too, and indeed are available in various commercial and public-domain
implementations. Better and faster implementations are encouraged.


Editor's Note


Public-key cryptosystems such as RSA and DSS are subject to U.S. patents
4,218,582 and 4,405,829, as well as other patents issued to Stanford
University and MIT. Public Key Partners of Sunnyvale, California holds
exclusive licensing rights. Licensed software embodying the cryptosystems is
available from several vendors, including RSA Data Security.


Recommended Reading


Baker, Henry G. "Computing A*B (mod N) Efficiently in ANSI C." ACM SIGPLAN
Notices 27 (January, 1992).
Buell, Duncan A. and Robert L. Ward. "A Multiprecise Integer Arithmetic
Package." The Journal of Supercomputing (vol. 3, 1989).
Dusse, Stephen R. and Burton S. Kaliski, Jr. "A Cryptographic Library for the
Motorola DSP56000." Advances in Cryptology--EUROCRYPT '90 Proceedings, volume
473 of Lecture Notes in Computer Science, New York: Springer-Verlag, 1991.
Knuth, Donald E. "Seminumerical Algorithms." The Art of Computer Programming,
Volume 2, Second edition. Reading, MA: Addison-Wesley, 1983.
Montgomery, Peter L. "Modular Multiplication Without Trial Division"
Mathematics of Computation (vol. 44, 1985).


_MULTIPLE-PRECISION ARITHMETIC IN C_
by Burton S. Kaliski, Jr.



[LISTING ONE]

/* Copyright (C) 1991-2 RSA Laboratories, a division of RSA Data
 Security, Inc. All rights reserved. */
/* PROTOTYPES should be set to one if and only if the compiler supports
function argument prototyping. The following makes PROTOTYPES default to 0
if it has not already been defined with C compiler flags. */

#ifndef PROTOTYPES
#define PROTOTYPES 0
#endif


/* UINT2 defines a two byte word */
typedef unsigned short int UINT2;

/* UINT4 defines a four byte word */
typedef unsigned long int UINT4;

/* PROTO_LIST is defined depending on how PROTOTYPES is defined above. If
using
 PROTOTYPES, PROTO_LIST returns the list, otherwise it returns empty list. */
#if PROTOTYPES
#define PROTO_LIST(list) list
#else
#define PROTO_LIST(list) ()
#endif

/* RSA key lengths. */
#define MAX_RSA_MODULUS_BITS 1024
#define MAX_RSA_MODULUS_LEN ((MAX_RSA_MODULUS_BITS + 7) / 8)

typedef UINT4 NN_DIGIT;
typedef UINT2 NN_HALF_DIGIT;

/* Length of digit in bits */
#define NN_DIGIT_BITS 32
#define NN_HALF_DIGIT_BITS 16
/* Length of digit in bytes */
#define NN_DIGIT_LEN (NN_DIGIT_BITS / 8)
/* Maximum length in digits */
#define MAX_NN_DIGITS \
 ((MAX_RSA_MODULUS_LEN + NN_DIGIT_LEN - 1) / NN_DIGIT_LEN + 1)
/* Maximum digits */
#define MAX_NN_DIGIT 0xffffffff
#define MAX_NN_HALF_DIGIT 0xffff

/* Macros. */
#define LOW_HALF(x) (NN_HALF_DIGIT)((x) & MAX_NN_HALF_DIGIT)
#define HIGH_HALF(x) \
 (NN_HALF_DIGIT)(((x) >> NN_HALF_DIGIT_BITS) & MAX_NN_HALF_DIGIT)
#define TO_HIGH_HALF(x) (((NN_DIGIT)(x)) << NN_HALF_DIGIT_BITS)
#define DIGIT_MSB(x) (unsigned int)(((x) >> (NN_DIGIT_BITS - 1)) & 1)
#define DIGIT_2MSB(x) \
 (unsigned int)(((x) >> (NN_DIGIT_BITS - 2)) & 3)

/* NOTE: A bug in the MPW 3.2 C compiler causes an incorrect sign extension in
the routine NN_DigitDiv. To overcome this bug, change the definition of the
macro HIGH_HALF to:
#define HIGH_HALF(x) (((x) >> NN_HALF_DIGIT_BITS) & MAX_NN_HALF_DIGIT) */

void NN_Assign PROTO_LIST ((NN_DIGIT *, NN_DIGIT *, unsigned int));
void NN_AssignZero PROTO_LIST ((NN_DIGIT *, unsigned int));

NN_DIGIT NN_Add PROTO_LIST
 ((NN_DIGIT *, NN_DIGIT *, NN_DIGIT *, unsigned int));
NN_DIGIT NN_Sub PROTO_LIST
 ((NN_DIGIT *, NN_DIGIT *, NN_DIGIT *, unsigned int));
void NN_Mult PROTO_LIST
 ((NN_DIGIT *, NN_DIGIT *, NN_DIGIT *, unsigned int));
int NN_Cmp PROTO_LIST ((NN_DIGIT *, NN_DIGIT *, unsigned int));
unsigned int NN_Bits PROTO_LIST ((NN_DIGIT *, unsigned int));
unsigned int NN_Digits PROTO_LIST ((NN_DIGIT *, unsigned int));


NN_DIGIT NN_LShift PROTO_LIST
 ((NN_DIGIT *, NN_DIGIT *, unsigned int, unsigned int));
NN_DIGIT NN_RShift PROTO_LIST
 ((NN_DIGIT *, NN_DIGIT *, unsigned int, unsigned int));
void NN_Div PROTO_LIST
 ((NN_DIGIT *, NN_DIGIT *, NN_DIGIT *, unsigned int, NN_DIGIT *,
 unsigned int));
NN_DIGIT NN_AddDigitMult PROTO_LIST
 ((NN_DIGIT *, NN_DIGIT *, NN_DIGIT, NN_DIGIT *, unsigned int));
NN_DIGIT NN_SubDigitMult PROTO_LIST
 ((NN_DIGIT *, NN_DIGIT *, NN_DIGIT, NN_DIGIT *, unsigned int));
unsigned int NN_DigitBits PROTO_LIST ((NN_DIGIT));
void NN_DigitMult PROTO_LIST ((NN_DIGIT [2], NN_DIGIT, NN_DIGIT));
void NN_DigitDiv PROTO_LIST ((NN_DIGIT *, NN_DIGIT [2], NN_DIGIT));






[LISTING TWO]

/* Copyright (C) 1991-2 RSA Laboratories, a division of RSA Data
 Security, Inc. All rights reserved. */
/* Computes a=b+c. Returns carry. Lengths: a[digits], b[digits], c[digits]. */
NN_DIGIT NN_Add (a, b, c, digits)
NN_DIGIT *a, *b, *c;
unsigned int digits;
{
 NN_DIGIT ai, carry;
 unsigned int i;

 carry = 0;

 for (i = 0; i < digits; i++) {
 if ((ai = b[i] + carry) < carry)
 ai = c[i];
 else if ((ai += c[i]) < c[i])
 carry = 1;
 else
 carry = 0;
 a[i] = ai;
 }
 return (carry);
}






[LISTING THREE]

/* Copyright (C) 1991-2 RSA Laboratories, a division of RSA Data
 Security, Inc. All rights reserved. */
/* Computes a=b-c. Returns borrow. Lengths: a[digits], b[digits], c[digits].
*/
NN_DIGIT NN_Sub (a, b, c, digits)
NN_DIGIT *a, *b, *c;

unsigned int digits;
{
 NN_DIGIT ai, borrow;
 unsigned int i;

 borrow = 0;

 for (i = 0; i < digits; i++) {
 if ((ai = b[i] - borrow) > (MAX_NN_DIGIT - borrow))
 ai = MAX_NN_DIGIT - c[i];
 else if ((ai -= c[i]) > (MAX_NN_DIGIT - c[i]))
 borrow = 1;
 else
 borrow = 0;
 a[i] = ai;
 }
 return (borrow);
}






[LISTING FOUR]

/* Copyright (C) 1991-2 RSA Laboratories, a division of RSA Data
 Security, Inc. All rights reserved. */
/* Computes a=b*c, where b and c are digits. Lengths: a[2]. */
void NN_DigitMult (a, b, c)
NN_DIGIT a[2], b, c;
{
 NN_DIGIT t, u;
 NN_HALF_DIGIT bHigh, bLow, cHigh, cLow;

 bHigh = HIGH_HALF (b);
 bLow = LOW_HALF (b);
 cHigh = HIGH_HALF (c);
 cLow = LOW_HALF (c);

 a[0] = (NN_DIGIT)bLow * (NN_DIGIT)cLow;
 t = (NN_DIGIT)bLow * (NN_DIGIT)cHigh;
 u = (NN_DIGIT)bHigh * (NN_DIGIT)cLow;
 a[1] = (NN_DIGIT)bHigh * (NN_DIGIT)cHigh;

 if ((t += u) < u)
 a[1] += TO_HIGH_HALF (1);
 u = TO_HIGH_HALF (t);

 if ((a[0] += u) < u)
 a[1]++;
 a[1] += HIGH_HALF (t);
}







[LISTING FIVE]

/* Copyright (C) 1991-2 RSA Laboratories, a division of RSA Data
 Security, Inc. All rights reserved. */
/* Sets a=b/c, where a and c are digits. Lengths: b[2]. Assumes b[1] < c
 and HIGH_HALF (c) > 0. For efficiency, c should be normalized. */
void NN_DigitDiv (a, b, c)
NN_DIGIT *a, b[2], c;
{
 NN_DIGIT t[2], u, v;
 NN_HALF_DIGIT aHigh, aLow, cHigh, cLow;

 cHigh = HIGH_HALF (c);
 cLow = LOW_HALF (c);

 t[0] = b[0];
 t[1] = b[1];

 /* Underestimate high half of quotient and subtract product
 of estimate and divisor from dividend. */
 if (cHigh == MAX_NN_HALF_DIGIT)
 aHigh = HIGH_HALF (t[1]);
 else
 aHigh = (NN_HALF_DIGIT)(t[1] / (cHigh + 1));
 u = (NN_DIGIT)aHigh * (NN_DIGIT)cLow;
 v = (NN_DIGIT)aHigh * (NN_DIGIT)cHigh;
 if ((t[0] -= TO_HIGH_HALF (u)) > (MAX_NN_DIGIT - TO_HIGH_HALF (u)))
 t[1]--;
 t[1] -= HIGH_HALF (u);
 t[1] -= v;

 /* Correct estimate. */
 while ((t[1] > cHigh) 
 ((t[1] == cHigh) && (t[0] >= TO_HIGH_HALF (cLow)))) {
 if ((t[0] -= TO_HIGH_HALF (cLow))
 > MAX_NN_DIGIT - TO_HIGH_HALF (cLow))
 t[1]--;
 t[1] -= cHigh;
 aHigh++;
 }
 /* Underestimate low half of quotient and subtract product of
 estimate and divisor from what remains of dividend. */
 if (cHigh == MAX_NN_HALF_DIGIT)
 aLow = LOW_HALF (t[1]);
 else
 aLow =
 (NN_HALF_DIGIT)
 ((NN_DIGIT)(TO_HIGH_HALF (t[1]) + HIGH_HALF (t[0]))
 / (cHigh + 1));
 u = (NN_DIGIT)aLow * (NN_DIGIT)cLow;
 v = (NN_DIGIT)aLow * (NN_DIGIT)cHigh;
 if ((t[0] -= u) > (MAX_NN_DIGIT - u))
 t[1]--;
 if ((t[0] -= TO_HIGH_HALF (v)) > (MAX_NN_DIGIT - TO_HIGH_HALF (v)))
 t[1]--;
 t[1] -= HIGH_HALF (v);

 /* Correct estimate. */
 while ((t[1] > 0) ((t[1] == 0) && t[0] >= c)) {

 if ((t[0] -= c) > (MAX_NN_DIGIT - c))
 t[1]--;
 aLow++;
 }

 *a = TO_HIGH_HALF (aHigh) + aLow;
}






[LISTING SIX]

/* Copyright (C) 1991-2 RSA Laboratories, a division of RSA Data
 Security, Inc. All rights reserved. */
/* Computes a=b+c*d, where c is a digit. Returns carry.
 Lengths: a[digits], b[digits], d[digits]. */
NN_DIGIT NN_AddDigitMult (a, b, c, d, digits)
NN_DIGIT *a, *b, c, *d;
unsigned int digits;
{
 NN_DIGIT carry, t[2];
 unsigned int i;

 carry = 0;
 for (i = 0; i < digits; i++) {
 NN_DigitMult (t, c, d[i]);
 if ((a[i] = b[i] + carry) < carry)
 carry = 1;
 else
 carry = 0;
 if ((a[i] += t[0]) < t[0])
 carry++;
 carry += t[1];
 }
 return (carry);
}






[LISTING SEVEN]

/* Copyright (C) 1991-2 RSA Laboratories, a division of RSA Data
 Security, Inc. All rights reserved. */

/* Computes a=b*c. Lengths: a[2*digits], b[digits], c[digits].
 Assumes digits < MAX_NN_DIGITS. */
void NN_Mult (a, b, c, digits)
NN_DIGIT *a, *b, *c;
unsigned int digits;
{
 NN_DIGIT t[2*MAX_NN_DIGITS];
 unsigned int bDigits, cDigits, i;


 NN_AssignZero (t, 2 * digits);

 bDigits = NN_Digits (b, digits);
 cDigits = NN_Digits (c, digits);

 for (i = 0; i < bDigits; i++)
 t[i+cDigits] += NN_AddDigitMult (&t[i], &t[i], b[i], c, cDigits);
 NN_Assign (a, t, 2 * digits);
}






[LISTING EIGHT]

/* Copyright (C) 1991-2 RSA Laboratories, a division of RSA Data
 Security, Inc. All rights reserved. */
/* Computes a=b-c*d, where c is a digit. Returns borrow.
 Lengths: a[digits], b[digits], d[digits]. */
NN_DIGIT NN_SubDigitMult (a, b, c, d, digits)
NN_DIGIT *a, *b, c, *d;
unsigned int digits;
{
 NN_DIGIT borrow, t[2];
 unsigned int i;

 borrow = 0;
 for (i = 0; i < digits; i++) {
 NN_DigitMult (t, c, d[i]);
 if ((a[i] = b[i] - borrow) > (MAX_NN_DIGIT - borrow))
 borrow = 1;
 else
 borrow = 0;
 if ((a[i] -= t[0]) > (MAX_NN_DIGIT - t[0]))
 borrow++;
 borrow += t[1];
 }
 return (borrow);
}






[LISTING NINE]

/* Copyright (C) 1991-2 RSA Laboratories, a division of RSA Data
 Security, Inc. All rights reserved. */
/* Computes a=c div d and b=c mod d. Lengths: a[cDigits], b[dDigits],
 c[cDigits], d[dDigits]. Assumes d > 0, cDigits < 2 * MAX_NN_DIGITS,
 dDigits < MAX_NN_DIGITS. */
void NN_Div (a, b, c, cDigits, d, dDigits)
NN_DIGIT *a, *b, *c, *d;
unsigned int cDigits, dDigits;
{
 NN_DIGIT ai, cc[2*MAX_NN_DIGITS+1], dd[MAX_NN_DIGITS], t;

 int i;
 unsigned int ddDigits, shift;

 ddDigits = NN_Digits (d, dDigits);
 if (ddDigits == 0)
 return;







[LISTING TEN]

 /* Normalize operands. */
 shift = NN_DIGIT_BITS - NN_DigitBits (d[ddDigits-1]);
 NN_AssignZero (cc, ddDigits);
 cc[cDigits] = NN_LShift (cc, c, shift, cDigits);
 NN_LShift (dd, d, shift, ddDigits);
 t = dd[ddDigits-1];

 NN_AssignZero (a, cDigits);

 for (i = cDigits-ddDigits; i >= 0; i--) {

 /* Underestimate quotient digit and subtract. */
 if (t == MAX_NN_DIGIT)
 ai = cc[i+ddDigits];
 else
 NN_DigitDiv (&ai, &cc[i+ddDigits-1], t + 1);
 cc[i+ddDigits] -=
 NN_SubDigitMult (&cc[i], &cc[i], ai, dd, ddDigits);
 /* Correct estimate. */
 while (cc[i+ddDigits] (NN_Cmp (&cc[i], dd, ddDigits) >= 0)) {
 ai++;
 cc[i+ddDigits] -= NN_Sub (&cc[i], &cc[i], dd, ddDigits);
 }
 a[i] = ai;
 }
 /* Restore result. */
 NN_AssignZero (b, dDigits);
 NN_RShift (b, cc, shift, ddDigits);
}


















August, 1992
PERSONAL SUPERCOMPUTING VIRTUAL MEMORY, 64-BIT


Multi-megabyte applications on the PC




Ian Hirschsohn


Ian holds a BSc in Mechanical Engineering and an MS in Aerospace Engineering.
He is the principal author of DISSPLA and cofounder of ISSCO. He can be
reached at Integral Research, 249 S. Highway 101, Suite 270, Solana Beach, CA
92075.


In last month's installment, I introduced the concept of a "virtual computer"
patterned on the Cray supercomputer model--an architecture defined by software
rather than hardware. Using the PORT system (a PC-based environment that
adheres to Cray's design principles, as described in my June 1992 article), I
showed how the virtual computer concept can achieve surprising performance on
PCs and exploit RISC processors, and do so in a seamlessly portable
environment. This month, I'll discuss salient features of the PORT system
itself, focusing on design decisions rather than implementation details. As
you recall, PORT has a philosophy diametrically different from that of UNIX
and OS/2. It unabashedly plagiarized the best features of a number of
mainframe systems, especially Cray's CDC 6600. In short, PORT's architecture
is more an incarnation of several proven systems than an untested fresh
approach.
The key difference between PORT and most other systems is that PORT is totally
dedicated to running multi-megabyte Fortran applications. (Although PORT was
originally designed for Fortran, its Fortran/C compiler is being modified to
accept C syntax.) For example, the file manager treats all directories as
libraries (there is no need to MAKE libraries), the linker automatically
resolves OBJ-type files from any specified directories, and FIND C=WORKSP
searches OBJ files (ignoring all other types) for a COMMON block called
WORKSP.
It's no secret that ANSI Fortran is unsuitable for systems programming. So
rather than throw the baby out with the bathwater, we vastly extended Fortran
(while preserving Fortran 77 compatibility) instead of creating some new
language. PORT's Fortran/C incorporates most of the functions found in
C--Boolean operations, shifts, bit-field operatives, a pointer type, and
low-level I/O intrinsics--and has proven an effective systems-programming
vehicle. The entire PORT system--including the compiler, linker, editor,
virtual memory management, and system libraries (about 1 million lines)--is
written in Fortran/C.


Virtual Memory Without Virtual Memory


Mainframe-grade Fortran applications often expect virtual memory, or the
ability to run programs larger than available RAM by transparently swapping
pieces to disk. Virtual memory has proved to have an even more important
benefit: the ability to call massive programs from within one another. For
example, the ability to call the editor from within your program, then the
compiler within the editor, the file manager within the compiler, and so on.
Your own program may eventually be totally swapped out to disk, but this is
transparent.
Chaining multiple programs, which is often confused with multitasking, enables
the development of gigantic applications. For example, if you can depend on
calling the editor, there is no need to write editor functions into your
program.
Being focused on Fortran/C, PORT achieves virtual memory simply by basing its
page size on the longest practical subroutine (or function). As long as a
subroutine never crosses a page, all variables and branches within that
subroutine are local and don't need virtual-memory address translation. This
idea is somewhat similar to the segment organization of the DOS huge model,
except that PORT pages are a fixed length and can hold multiple subroutines.
As I pointed out in my June article, keeping most references local is
equivalent to about an 85 percent "cache hit ratio" right off the bat. PORT
local addressing is relative to the start of the current subroutine
instruction and data areas rather than a segment base address or other
hardware-dependent artifice. The absolute RAM offset address to the subroutine
is reset automatically upon any call or return, like the DOS huge model.
Unlike the huge model, however, if the PORT metacode does not find the target
page in RAM memory, it generates a page fault and the page is rolled in from
disk (behind the programmer's back). Thus the program size is limited by the
disk-staging area you assign, not the available RAM.
The current PORT page is 4096 64-bit words, or 32 Kbytes. (Support for
64-Kbyte pages will be available soon.) The instruction and data areas for
each subroutine are generally kept in separate pages, so instruction pages
don't have to be written back to disk. Measurements on numerous programs since
1979 have shown that the Fortran/C compiler generates 1.2 to 1.4
meta-instructions (64 bits each) per executable source statement. Hence a PORT
page can accommodate a subroutine of up to about 3000 executable statements,
or roughly 4500 typical lines. A single subroutine longer than that tends to
be unwieldy and is best broken into smaller ones.
The linker packs as many subroutines as possible into an instruction/data page
pair, which holds around 60 to 200 typical-length subroutines (32-Kbyte
pages). If any one of those routines is called and the page pair is not in
RAM, the remaining 59-199 ride in with it. The linker exploits this to improve
performance by grouping routines that call each other into the same page pair
and also provides user-specified grouping. Indeed, the linker gives precedence
to grouping the branch of a call tree over page packing efficiency. Thus, a
"miss" on one routine brings in many related routines on the same two disk
references. By contrast, most virtual-memory systems go through a whole
sequence of disk references before they develop a "working set." Disk I/O
being a pacing factor in many virtual-memory applications, disk efficiency can
outweigh processor performance.
More Details.
A key difference between the DOS huge model and PORT is that PORT permits
arrays to cross pages: Arrays can be multi-megabyte with an overall limit of
32 Mbytes per program (310 Mbytes with 64-Kbyte pages). This may not be
anywhere near the 4-gigabyte limit of 32-bit addressing, but in practice, a
single program of even 5-10 Mbytes involves thousands of subroutines, and I
have found it to be unwieldy to develop beyond a certain point. (Bear in mind
that a megabyte amounts to about 100,000 lines of Fortran/C, accounting for
data areas and typical arrays.) The 4-gigabyte number is a bit
academic--simply dedicating a 4-gigabyte disk-staging area is beyond most PCs,
workstations, and even mainframes (per user). Just think of the disk-swap
time! In practice, since PORT programs can call one another, even 32 Mbytes
per program has yet to prove a limitation. What is not academic is that 32-bit
addressing wastes 1-2 bytes 99.99 percent of the time, hogging the processor's
critical prefetch queue.
Virtual-memory addressing is only applied to access across pages. Since no
subroutine can straddle pages, virtual-memory translation is only necessary
for array references, call/return, and pointers. (COMMON blocks, strings, and
other structures are presented as arrays to the metacode so there is no
distinction at the metacode level.)


All 64 Bit


Like mainframes and supercomputers, PORT targets a 64-bit bus: Addressing is
in terms of 64-bit words, and all instructions plus data items are processed
as multiples of 64-bit words. A byte is regarded as just an 8-bit field, so
the metacode changes a byte value by bringing in its 64-bit word, modifying
the 8-bit field and writing out the whole word. (The cost of a byte and 64-bit
word are identical on a 64-bit bus.) "All 64-bit" execution may be excessive
for 80x86 PCs, but most plug-in RISC cards (such as i860-based cards) use a
64-bit bus locally. Boards like the Hyperspeed D860 allow multiple cards to
interconnect via ribbon cables carrying a blistering fast 64-bit local bus.
Although the advantage of 64 bits for performance is clear, it has proved
surprisingly beneficial to Fortran/C coding. All Fortran/C variables are
64-bit, or a multiple thereof. (INTEGER*2, REAL*4, and so on, are accepted but
ignored.) Strings are 64-bit aligned and padded to a word boundary. Thus
integers, floating point, pointers, and strings can all be freely intermixed.
For example, INTEGER, REAL, CHARACTER, and POINTER type variables can be
interchangeably passed through the same subroutine argument and EQUIVALENCEd
to one another, contrary to both ANSI Fortran and C rules. Interchangeability
simplifies dynamic record structures containing variable-length components or
omitted items. 64-bit integers have proved invaluable for packing and
manipulating multiple values simultaneously; for example, A=B moves eight
bytes at a pop.


The Metacode


As mentioned in the previous articles, PORT defines its own
machine-independent "instruction set" customized to Fortran/C. This metacode
is executed via a processor-dependent decoder supplied for each platform. Each
meta-instruction is a 64-bit word organized as shown in Figure 1. All
meta-instructions have the form A=B op C. The tag for each operand A, B, and C
determines the operand mode--151, J, X(I), Y(I,J), for instance. The text box
"PORT's Metacode" describes metacode operation.
Table 1 lists all three-operand instructions (A=B op C). Because it's wasteful
to use three operand slots for two-operand instructions (A = op B), these are
grouped as instruction 25h using the value of C to distinguish them; see Table
2. Note how specific the instructions are to Fortran and C.
Table 1: Metacode three-operand instructions.

 Code (hex) Mnemonic Description Code Mnemonic Description
 (hex)
 -------------------------------------------------------------------------

 00 -- Illegal op 27 CALL Call subroutine.
 code
 (catches 0
 as an
 instr.).
 01 (+):i Integer 28 AND Logical AND.
 ADD.

 02 (-):i Integer 29 OR Logical OR.
 SUBTRACT.
 03 (*):i Integer 2A XOR Logical XOR.
 MULTIPLY.
 04 (/):i Integer 2B ANDF Boolean AND.
 DIVIDE.
 05 (//):i Integer 2C ORF Boolean OR.
 INCLUSIVE
 DIVIDE.
 06 B**IC:i Real 2D XORF Boolean XOR.
 -to-integer
 power.
 07 MOD:i Integer 2E SBYTE6 Replace 6-bit byte
 remainder. (in array or
 word).
 08 (=):i Integer test 2F LBYTE6 Extract 6-bit
 EQUAL. byte.
 09 (<):i Integer test 30 I_FIELD Extract arbitrary
 LESS THAN. field from word.
 0A (>):i Integer test 31 S_FIELD Replace arbitrary
 GREATER THAN. field in word.
 0B (#):i Integer test 32 ATAN2 Ratio arc tangent.
 NOT EQUAL.
 0C (<=):i Integer test 33 SIN_COS Compute both sine
 LESS THAN OR and cosine of C.
 EQUAL.
 0D (=>):i Integer test 34 shift_L Boolean LEFT
 GREATER THAN SHIFT.
 OR EQUAL.
 0E arith_L Arithmetic 35 shift_R Boolean RIGHT
 LEFT shift SHIFT.
 (preserve
 sign).
 0F arith_R Arithmetic 36 StrngOP Text op (copy,
 RIGHT shift. search,
 concatenate,
 and so on).
 10 B**C:r Real to power 37 ByteOP Byte-string op
 of a real. (nonspecific
 StrngOP).
 11 (+):r Real ADD. 38 Unpack8 Unpack string to
 word-per-byte
 array.
 12 (-):r Real 39 Pack8 Pack eight
 SUBTRACT. bits/word array
 as string.
 13 (*):r Real 3A CALL_P Call program on
 MULTIPLY. another processor.
 14 (/):r Real 3B Load_P Load program on
 DIVIDE. another proc.
 15 POLYNML Polynomial 3C Reset_P Restart program on
 expansion another proc.
 evaluate.
 16 INCADR Increment 3D-3F Not used.
 POINTER.
 17 KCHECK Test for 40 -- Illegal op code
 operand (catches -0 as
 error. instr.).
 18 (=):r Real test 41 (+):c Complex ADD

 EQUAL (c/tests
 (r/tests use use fuzz factor).
 fuzz factor).
 19 (<):r Real test 42 (-):c Complex SUBTRACT.
 LESS THAN.
 1A (>):r Real test 43 (*):c Complex MULTIPLY.
 GREATER THAN.
 1B (#):r Real test 44 (/):c Complex DIVIDE.
 NOT EQUAL.
 1C (<=):r Real test 45 B**IC:c Complex to integer
 LESS THAN power.
 OR EQUAL.
 1D (=>):r Real test 46 CMPLX Real and imaginary
 GREATER THAN to complex.
 OR EQUAL.
 1E SBYTE8 Replace 47 -- Not used.
 8-bit byte
 (in array
 or word).
 1F LBYTE8 Extract 48 (=):c Complex test
 8-bit byte. EQUAL.
 20 BreakPt Breakpoint. 49 (<):c Complex test LESS
 THAN.
 21 GO TO* List GO TO. 4A (>):c Complex test
 GREATER THAN.
 22 jmp#0 Branch if 4B (#):c Complex test NOT
 NONZERO. EQUAL.
 23 jmp=0 Branch if 0. 4C (<=):c Complex test LESS
 THAN OR EQUAL.
 24 ARYMOV Block op 4D (=>):c Complex test
 (array GREATER THAN OR
 search, init, EQUAL.
 copy, and
 so on).
 25 2op Two-operand 4E-7F Not used.
 instructions
 (see
 Table 2).
 26 DO end Loop test,
 increment/
 decrement,
 and branch.


Table 2: Metacode two-operand instructions.

 Code (hex) C Mnemonic Description
 -------------------------------------------------------------

00 neg Negate.
01 NOT Logical invert.
02 NOTF Boolean complement.
03 PP req Flag peripheral processor and halt.
04 GO TO Unconditional branch.
05 (:=) Transfer one word.
06 RETURN Subroutine RETURN.
07 Intrpt Suppress interrupts (ignore PP flag).
08 ABS Get absolute value.
09 INT Truncate real to integer.

0A FLOAT Integer to real.
0B SIGN Assign sign of B to A.
0C STATS Get memory-error statistics.
0D VADDR Set POINTER to address.
0E EXECUTE Execute local procedure.
0F MEMCHK Force memory reference (test RAM).
10 EXCHNG Swap contents of A and B.
11 IROUND Round real to integer.
12 -- Unassigned.
13 SQRT Square Root.
14 SIN Sine.
15 COS Cosine.
16 TAN Tangent.
17 ArcSIN Inverse sine.
18 ArcCOS Inverse cos.
19 ArcTAN Inverse tan.
1A LOG Log to base e.
1B LOG10 Log to base 10.
1C EXP e to power B.
1D EXP10 10 to power B.
1E (+):i Direct INTEGER ADD.
1F (-):i Direct INTEGER SUBTRACT.
20 (*):i Direct INTEGER MULTIPLY.
21 (/):i Direct INTEGER DIVIDE.
22 arith_L Direct arithmetic LEFT SHIFT.
23 arith_R Direct arithmetic RT SHIFT.
24 shift_L Direct Boolean LEFT SHIFT.
25 shift_R Direct Boolean RT SHIFT.
26 MOD:i Direct remainder.
27 ORF Direct Boolean OR.
28 ANDF Direct Boolean AND.
29 XORF Direct Boolean XOR.
2A (+):r Direct REAL ADD.
2B (-):r Direct REAL SUBTRACT.
2C (*):r Direct REAL MULTIPLY.
2D (/):r Direct REAL DIVIDE.
2E REAL Real part of complex.
2F CMPLX Real to complex.
30 CONJG Complex conjugate.
31 SINH Hyperbolic sine.
32 COSH Hyperbolic cosine.
33 TANH Hyperbolic tangent.

Since each meta-instruction needs to be interpreted, the decode time is
critical. The metacode is therefore designed to be decoded as fast as
possible. For example, even though the PORT page is 4096 words, needing only
12 bits, it is assigned a 16-bit field because most processors have
instructions that can operate on 16 bits directly. Likewise, the 3-bit tags
describing the contents of A, B, C are grouped as a 9-bit field, rather than
being tacked onto their respective A, B, and C--allowing the decoder
programmer to use a fast 512-way indexed branch to interpret all three tags in
a single assembly instruction.
Without exception, all meta-instructions have an identical format.
Furthermore, the metacode operand format is independent of the actual
operation. (Both the 386/486 and i860 decoders interpret the operands before
they even look at the Op Code.) This rigorous independence not only speeds the
decoding on existing processors, but has an eye on future chips, which will
incorporate multiple independent processors on the same silicon (Motorola's
recently announced M88110 RISC multiprocessor, for instance). A context-free
format lends itself to parallel decoding the A, B, and C operands and even to
pre-interpreting instructions. Obsolescence being reality in computers, we
have tried to think as far ahead as possible.


The Peripheral Processor


As previously mentioned, all I/O functions are handled by the peripheral
processor (PP) program. Although designed to execute on a separate processor
to the metacode computation processor (CP), I discussed how the PP coexists
surprisingly well with the CP in a single-processor environment. Each PP
action is specified via a 5x64-bit word structure deposited in the CP memory.
Table 3 lists the current PP action codes. The PP is free to honor the
requisite action in any way it chooses, thus enabling PORT to be implemented
seamlessly in diverse host environments.
Table 3: Peripheral processor actions.

 Code Action Description
 -------------------------------------------------------------------------

 01 Output text line Sends line to screen, COM or LPT port.
 02 Input text line Gets line/char from keyboard or port.
 03 Set flag chars Defines chars for line feed, prompt, and so

 on.
 04 Set date/time Resets current date and time.
 05 Get error stats Gets error counts for disk and PP memory.
 06 Jumper ports Redirects text-line output to specific ports.
 07 Get date/time Retrieves current date and time.
 09 Read/write disk Retrieves/replaces disk page.
 10 Discard interrupts Clears any interrupt flag from PP to CP.
 11 Jump history Activates/gets last 2048 branches, calls.
 12 Mag tape op Reads/writes/positions 9-track, 8mm, 3480,
 DAT.
 13 Set COM params Sets serial-port baud rate, and so on.
 14 Generate tone Sends tone sequence to speaker.
 16 Exec host program Executes DOS(UNIX) based program.
 18 Set spooling Activates/discards output to DOS file buffer.
 21 Set COM protocol Sets port handshaking (XON/XOFF, RTS).
 24 Swap disk page Exchanges page in memory with page on disk.
 25 High-speed I/O Block I/O to IEEE 488, SCSI, and so on.
 27 Terminate PORT Signs off PORT and returns to host system.
 28 Switching programs Tells PP that PORT starting different
 program.
 29 Host file op Reads/writes/finds/copies/deletes DOS file.
 30 Screen graphics op Outputs line/fill/block/color to screen.
 31 Window/mouse op Creates/removes/retrieves/outputs a window.
 128 Fatal error PORT detected internal error, sign off.

Again, all I/O data items lie on a 64-bit boundary and always have a length
which is a multiple of 64 bits. (Strings are padded, as necessary.) This
enables the PP implementor to utilize the maximum bandwidth available on the
machine; for example, the PP lends itself readily to 32-bit EISA and Micro
Channel implementation. By way of comparison, UNIX, OS/2, and DOS I/O are
specified in terms of a byte offset and byte count, which requires a costly
shift of the data block if the start byte does not fall on a 32-bit boundary.
To avoid the PP becoming involved in the virtual-memory process, the CP
guarantees that no I/O block crosses a page boundary. As an example, standard
Seismic SEG-B tapes have records of several megabytes apiece. The Fortran/C
program can readily define a multi-megabyte array to hold the record, but the
PP transfers these records as a sequence of one-page blocks. This places the
onus on PORT and the CP to process any virtual-memory page faults on the huge
array, between the record pages, thereby simplifying the PP implementation.
The CP and PP are designed to execute in parallel. Even on a single 386/486
processor, the PP caches tape I/O using the DOS timer interrupt to service the
drive concurrent with the CP program.
To illustrate the effectiveness of these ideas, PORT is able to copy a
2500-foot, 9-track, 6250-bpi Seismic SEG tape in 4.5 minutes between two
streaming 125-ips drives using a Fortran program, on a stock 486/33 EISA PC
(sans i860). Multiplying 2500 feet by 125 inches per second gives a
theoretical lower limit of four minutes. By comparison, an oil company had run
under UNIX on a Sun-compatible rated faster than a SPARC 4 and, after a year
of custom device driver coding plus C conversion, was able to reduce the time
from 12 to 6.5 minutes. The 4.5 minutes reflects a sustained data-transfer
rate of 2x6250x125x(4/4.5)=1.3 Mbytes/sec (accounting for interrecord gaps)
from Fortran. (At the last SEG show in November 1991, the PC was hidden under
the table because the company felt nobody would believe a PC could do what it
was doing.)


Debug Insecticide


True to the maxim that the last 5 percent of the bugs take 95 percent of the
development time, a bug in the DISSPLA shading algorithm took us almost two
years to find. It showed up on one mainframe, but not another. It even
disappeared from execution to execution. Not even the most sophisticated
debuggers were any help, because the program never failed when the debuggers
were active. New DISSPLA releases were held up because every time we thought
we had it, the bug popped up anew. (Sometimes we thought it depended on the
phases of the moon.) It turned out to be a truncation from floating point to
integer used to compute the index to a workspace array, whose contents
depended on memory at the time the program was run. (I=INT(A) is highly
processor dependent when A falls between 0.999... and 1.000...01.) Experiences
like these on standard systems almost drove me into a different line of work,
and I resolved that when we designed what is now PORT, debugging features
would take priority over any performance considerations. The first time it was
run under the SUPERSET precursor to PORT, the bug popped up--and was nailed.
The design of PORT assumes that nothing is working properly. For example, the
file-management system attaches an internal software checksum to every record
of every file. Whenever a file is copied on the disk or transferred to or from
tape, this checksum is validated by the PORT software. We've even caught
problems in the error-checking circuitry of the hardware!
The PORT metacode performs numerous checks on every instruction without
exceptions, not even for the PORT system itself. These include:
Array-bounds checking on every array reference, including COMMON blocks,
strings, and subroutine arguments.
Uninitialized-variable checks on all arithmetic and compare instructions (but
not Boolean). The linker initializes memory to -0 (0 with sign bit set), which
the metacode never generates on any arithmetic operation.
Invalid pointers. Pointers are 64-bit and carry the length of the original
item together with a mandatory check pattern.
Unnormalized floating point. Usually an integer is passed through a real
argument.
Field overflow (such as attempting to store a value greater than 255 in an
8-bit byte).
Field, or shift count, off either end of a 64-bit word.
Invalid text strings--strings not terminated by a null (0 byte).
Invalid loop limits (DO J=1,N with N<=0, for example).
These bugs tend to occur during execution, and no compiler "lint" can pick
them up. The DISSPLA bug was detected by the uninitialized-variable check. In
most cases, an uninitialized fault is just a nuisance, but every so often it
stumbles across something really serious and pays for itself. The checks are
conducted by the metacode decoder program (in the CP) as part of each
meta-instruction's execution. As I pointed out in June, a RISC-based decoder
can often bury the checks in free cycles so that they add little to the
overhead.
The metacode approach provides other bonuses for code development. For
example, the decoder can be requested to store the last 2048 branches
(providing a software "logic analyzer" execution history). The decoder stores
the count of meta-instructions executed by subroutine, furnishing a "profiler"
that has proved invaluable for optimizing multi-megabyte applications. The
PORT debugger can request the decoder to test for an event on every
meta-instruction. Since the tests are embedded in the decoder, the debugger
slows the program by just a few percent. Unlike debuggers that insert code,
there are no loop-holes and no need to recompile. Basically, the metacode
decoder represents modifiable "microcode" that can be programmed to do
whatever you like.
Another subtle use of the modifiable microcode is the application of a "fuzz
factor" to all floating-point compares (for example, IF(B=1.0) THEN succeeds
if B=0.999...). The fuzz factor is user modifiable and circumvents the
nightmare DISSPLA bug I described.
An important aspect of virtual memory is that the PORT debugger is available
at all times on all programs, especially on release versions at user sites.
The linker attaches a copy of its tables to the end of each executable image
(no exceptions). These include the location, programmer name, date/time, and
source installation of all the OBJ files.
Upon any fault (like an array bounds or bad pointer), the debugger swings into
action, swaps the program back to disk, and loads the linker attached tables.
Using this roadmap, the debugger can probe any program--even itself. The key
difference between the PORT debugger and Codeview is that PORT can debug the
program at the user site with the user's proprietary data in the exact state
of its failure. (PORT doesn't know "memory dumps.") In multi-mega-byte apps
developed by a team, the ability to pinpoint the subroutine version by
date/time and the programmer responsible has proved worth its weight in gold.


Multiprocessor


The philosophy of PORT is to throw as many processors as possible at the task,
not as many tasks as possible at the processor. Dedicating the entire system
to the current application simplifies multi-processor handling, thereby
improving throughput rather than wasting cycles in overhead. The elegant
simplicity of Cray's memory-mapped CP/PP communication scheme proved easily
extensible to multiple processors.
All processors are under the control of the CP, which in turn is under the
control of your program. The program treats each processor as if it were
executing a subroutine. To see how this works, consider using an auxiliary
processor to do three-dimensional rendering; see Figure 2. The rendering is
done by a call to RENDER_3D (OBJ_LIST, FRAME_BUF), for example, where OBJ_LIST
is the list of solid-object coordinates, and FRAME_BUF is the target frame
buffer. In a single-processor environment, RENDER_3D would just be a normal
Fortran/C subroutine, and indeed, the linker expects a Fortran/C version.
Prior to calling RENDER_3D, the program requests the CP to load RENDER_3D on
processor 1 via a LOAD_PROCESSOR system call. If there is no processor 1, or
no assembly version of RENDER_3D for it, the CP returns an error status to the
user program. The program can then opt to terminate or proceed using the
Fortran/C version. Given that everything is kosher, the metacode decoder on
the CP intercepts the RENDER_3D call. Instead of executing the normal
Fortran/C call, the decoder makes sure that the OBJ_LIST and FRAME_BUF arrays
are loaded in RAM, passes their addresses to the RENDER_3D version in
processor 1, and fires up processor 1. The decoder then executes a return to
the user program as if it had executed the Fortran/C RENDER_3D call.
The user program executing in the CP and RENDER_3D in processor 1 proceed in
parallel. When RENDER_3D is done, it sets a bit in a word tested by the CP
prior to executing each meta-instruction (an interrupt without a hardware
interrupt). A user-supplied interrupt-processing routine is then called.
Alternatively, the program can just opt to poll a common status word it
defines, to avoid interrupt handling. Upon detecting completion by processor
1, the program can send FRAME_BUF to the VGA and make another call to
RENDER_3D or other routine loaded in processor 1. The bottom line is that your
program is in control of all processors at all times.
Note that RENDER_3D has two versions: a high-speed, machine-dependent assembly
version and a slower Fortran/C version. Thus the seamless portability of the
program is preserved, albeit with the slower version. This feature is also
useful for the orderly development of multiprocessor applications. If you go
to the hassle of multiprocessor, speed is usually your criterion; PORT
therefore focuses on running tight assembly modules, rather than a
bureaucratic, symmetric, multithreaded C system. (Silicon Graphics
workstations atest to the speed of a tightly coupled MIPS 3000 RISC processor
rendering 3-D.)



Conclusion


The underlying PORT theme is to minimize entropy--to concentrate the
information and focus on the objective. Bug checks increase entropy, but a
program with a serious bug has infinite entropy.
In these three articles on "Personal Supercomputing," I've endeavored to
introduce an alternative to the UNIX and OS/2 approach. My intent is not to
disparage their multitasking philosophy, but to show that it is not gospel and
that there is room for diverse system designs. PORT shows that by targeting a
specific class of applications, it can overcome many of the limitations of the
UNIX approach, for its constituency. Conversely, PORT is not the ideal vehicle
for agenda managers, word processors, and multiple TSRs. The PORT design
realizes this and endeavors to complement the host system rather than replace
it. Nevertheless, technology has moved far beyond the PDP 11 and PC/XT. DOS,
UNIX, and OS/2 must come to terms with high-speed, multiple RISC processors
and 64-bit operation to realize the full potential of the next wave of
technology.


PORT's Metacode


In examining the operation of the PORT metacode, let's begin with an example.
Consider the Fortran/C source statement: Y(J)=X**11. Assume that Y is a local
array with DIMENSION Y(100).
A=B**C is a single 64-bit meta-instruction; see Op Code 06 in Table 3. As
shown in Figure 3, C is 11, B is the scalar X, and A the indirect reference to
Y(J). The tags differentiate operand modes: C has tag 1 (immediate
integer<64K), B tag 2 (local scalar,) and A tag 3 (indirect). Tag 1 uses the
value of the 16-bit field itself (11). Tag 2 adds the field (257) to the start
address of the subroutine data area giving the word address, so X at
[d/area+257] has the 64-bit value 5.7. Tag 3 is similar to tag 2, except that
the word points to the operand value, hence the value at [d/area+316] points
to Y(J). All 64 bits as a single address is excessive, so the array base
virtual address is in the upper half-word and the the lower 32 bits is for the
length and index. In our example, Y starts at virtual page 5, word 862, and J
has the value at [d/area+541], namely 25.
The metacode decoder first shifts out the 16-bit fields corresponding to A, B,
and C. In the 386/486 version, the decoder then masks out the 9-bit
combined-tags value in the upper 16 bits (1*64+2*8+3=83), uses this as an
index to a table, and branches to entry 83. This simple branch decodes the
tags 1,2,3 in one 386/486 instruction. Operands C and B are straight-forward
(tags 1 and 2). For A, the decoder proceeds as with B, except that it now
masks out the lower 16 bits and retrieves the 32-bit pair for J. The second
16-bit field in the indirect word has the length of Y, namely 100; if J is
less than 1 or greater than 100 an array-bounds fault is produced. In the
example, the decoder adds J-1=24 to the base v/address for Y (5:0862), giving
5:0886 as the location of Y(J). Page 5 is a virtual page, so the decoder
references entry 5 of its page-translation table for the real-memory page
address, adds offset 886, and obtains the real-memory word address for Y(J).
Having the values for B and C plus the location of A, the decoder shifts out
the Op Code and performs the operation listed in Table 1.
If entry 5 of the page-translate table is negative, it flags that Y(J) is
rolled out to disk, and the decoder generates a page fault. If the value for
either B or C is -0, it produces an uninitialized-variable fault. If the
exponent in the upper 12 bits of B is 0, an unnormalized floating-point fault
results. If A's exponent will be greater than 1023, a floating-point overflow
fault is produced. These tests are each a matter of one or two instructions
and add little to the overhead, but add a whole dimension of debugging power
and implement virtual memory with simplicity.
The upper six bits of the packed indirect word determine whether A is Y(J),
Y(I,J), Y(I+J), or a POINTER, and whether J is a variable or an immediate
constant; for example: Y(5), Y(I,5), and Y(I+5). In the Y(I,J) and Y(I+J)
cases, the index field points to a second level of indirection, for Y(I,J,K) a
third level, and so on. All instances of Y(J) in the subroutine point to the
same indirect word at offset 316, so there is only one entry for Y(J), not a
copy for each source line at which it occurs (reducing data-area redundancy).
If the array is longer than 16-bits (64 Kwords) or starts at any value other
than 0 or 1 (for example, DIMENSION Y(-100:100)), the length field points to a
word in page 0 containing the actual limits.
The metacode may appear daunting, but the bottom line is that it concentrates
the maximum information in as few bits as possible, thereby minimizing the
entropy. Using an 8088 PC/XT to execute it would be a joke, a 32-bit 386 is
marginal; RISC processors with their one instruction per cycle and huge banks
of registers are just what the doctor ordered. Most importantly, the mass of
information presented to the RISC processor allows it to change the order of
the logic, thereby plugging free cycles and memory-reference delays with
useful RISC instructions. (As long as the decoder produces the result
expected, the internal sequence is academic.)
--I.H.












































August, 1992
INSIDE THE WINDOWS SCHEDULER


Making a black box transparent




Matt Pietrek


Matt works for a California programming-tools vendor, specializing in
debuggers and file-format programming. He can be reached at CIS 76117, 1720.


The Microsoft Windows API is full of "black boxes." The SDK documentation
tells programmers how to call the various Windows functions, but rarely
explains why you must do something. All too often, we're told, "You must do
this for your program to work," or " You can't do that in this particular
situation." But seldom is there any description of what is going on beneath
the surface.
A principal "black box" in Windows is the scheduler. In this article, I
examine how the Windows scheduler is implemented, and how it interacts with
the rest of the Windows. In explaining the scheduler, I refer to data
structures internal to Windows. It will help your understanding of the
scheduler if you are at least somewhat familiar with structures such as the
task database, the module table, and the message queue. Many of these
structures have never been publicly documented, but fortunately some
information is now starting to appear in print (see "References").


The Two Schedulers


Please note that, in this article, I discuss the Windows scheduler as
implemented in the KERNEL module. (As you may recall, Windows is principally
composed of three DLLs: KERNEL, USER, and GDI.) The Windows scheduler is the
entity that controls execution of individual Windows apps. This scheduler
should not be confused with another scheduler that exists at a lower level
within the Windows environment. That other scheduler is the time-slicing
virtual machine (VM) scheduler implemented by WIN386.EXE. The WIN386 scheduler
only exists in Enhanced mode, and is used to preemptively switch from one VM
to another. A VM is a hardware-supported task that can contain a DOS session
or the entire set of Windows apps. The Windows scheduler runs within the
system VM; the system VM is the first VM to be created, and contains all the
Windows apps. The Windows scheduler is therefore only active when the WIN386
scheduler has given a timeslice to the system VM.
Because there is no time-slicing at the level of KERNEL, the Windows scheduler
is conceptually rather simple (although complicated in its implementation).
Unlike the OS/2 scheduler which manages a hierarchy of thread classes (further
subdivided into priority levels within each class), the Windows scheduler is
fairly "flat." Because there is no preemption, a Windows app will continue to
run until it gives up the CPU to another program. If a program enters a tight
loop, or otherwise "hogs" the CPU, the system will effectively be deadlocked
until the application yields the CPU. The act of yielding is either done
explicitly, or by calling a Windows-API function that causes a yield to occur.


Scheduler Overview


The basic idea of how a Windows app gets the CPU is relatively simple, and can
be summarized as: "The first person in line who has a dime gets to use the
phone next." The "line" in this case is the linked list of Windows tasks. The
"dime" is a nonzero value in the "event-count" field (offset 6) of the task
database. Of course, the actual implementation of the scheduler is more
complex than that, but it's a good first-order approximation.
So how do tasks get these events? The event-count field is incremented by
calls to PostEvent(), an undocumented function called by routines such as
PostMessage() and SendMessage(). Additionally, it is called when there are
events in the "system queue" destined for the task, and when invalid regions
need to be repainted. The system queue is a "holding area" for input from
hardware devices such as the mouse and keyboard. In addition, each task has
its own message-queue segment for handling messages posted to it. The handle
for this segment is stored at offset 20H in the task database. It is important
to note that the event-count field in the task database does not correlate
directly to a message sitting in the task's message-queue segment, waiting to
be retrieved. Instead, the event count is an indicator to the scheduler that
the task needs to be woken up, because something is waiting to be processed.


How do You Get into the Scheduler?


Most applications are written without a thought as to how the application will
give up control of the CPU, and how it can be a good citizen in the Windows
environment. The reason is that the basic structure of most Windows
applications is based upon the GetMessage()/DispatchMessage() loop. Inside
GetMessage(), if no messages are waiting to be processed, and if no "hardware"
or Paint messages are waiting to be synthesized, then the task is put to
"sleep." When the task wakes up, there is input for it, waiting to be
processed.
The act of sleeping consists of looping around, waiting for a particular set
of wakeup flags to be set in the message queue of the task. The waiting is
accomplished by calling the semi-documented WaitEvent() function, which calls
the core scheduling routine. We'll discuss this core routine in exhaustive
detail later. If the event count in the task database is nonzero upon entry to
WaitEvent(), then WaitEvent() doesn't call the core scheduling routine, but
instead returns without yielding. Each time it's called, WaitEvent()
decrements the event-count field in the task database by 1.
Besides GetMessage(), several other functions result in the scheduler getting
invoked:
SendMessage(). If the message to be sent is destined for a task other than the
current one, SendMessage() calls DirectedYield(), which forces the destination
task to be woken up and become the current task. By calling PostEvent(),
SendMessage() ensures that the destination task will be scheduled by the call
to DirectedYield(),
PeekMessage(). Much of the code in PeekMessage() repeats that in GetMessage().
The PM_NOYIELD flag can be used with PeekMessage() to prevent a task switch
from occurring. PeekMessage() is commonly used to allow other tasks to execute
while your program is performing a lengthy chunk of work. The perfect example
of this is doing a compile within one of the Windows-based
integrated-development environments, such as Turbo Pascal for Windows, or
Microsoft QuickC for Windows.
The dialog-box functions. These include all the variations of DialogBox(), as
well as MessageBox(). These routines all use a PeekMessage()/DispatchMessage()
loop inside the USER module. When you call one of these functions, it is
entirely possible that your task may be switched away from for a while.
The Yield() family of functions. Yield() and DirectedYield() are "layers" on
top of the undocumented OldYield() function. OldYield() calls the core
scheduling routine. The key difference between DirectedYield() and Yield() is
that DirectedYield() sets a field in the current task database that specifies
which hTask the scheduler should attempt to transfer control to. Besides
SendMessage(), DirectedYield() is also used by the WINDEBUG/CVWIN/TDWIN DLLs
to switch between the debugger and the debuggee. It is also used by the
Toolhelp TaskSwitch() and TerminateApp() functions.


Inside the Core Scheduling Routine


At this point, we'll dive into details of the actual Windows scheduler. The
function is called Reschedule() and is not exported by the KERNEL module. In
this article, I will be discussing the implementation of Reschedule() as it
appears in Windows 3.1 KRNL386.EXE. Except as otherwise noted, KRNL286.EXE has
the same code. Figure 1 provides pseudocode for the scheduler. It is
reasonably close to what actually occurs in Reschedule(). I suggest you
examine it with parmesan cheese close at hand, because it's a spaghetti
coder's dream!
Figure 1: Pseudocode for RescheduleO.

 //------------------STARTUP SECTION------------------
 Save registers on stack of outgoing task
 Update task profiles
 If ( TDB signature not OK)
 goto Walk_through_task_list
 Get DirectedYield field in TDB into AX; zero it out in TDB
 If ( DirectedYield hTask == 0 )

 goto Walk_through_task_list
 If ( event count in DirectedYield TDB ! = 0 )
 goto startup_this_task
 //---------------TOP HALF OF TASK WALK-------------
 Walk_through_task_list:
 Point to the first task in the linked list of tasks
 Try_next_task:
 if ( not pointing past the end of the task list)
 goto Does_this_task_have_an_event?
 //---------------THE IDLE LOOP--------------------
 ShrinkHeap () // In KRNL386 only
 DiscardFreeBlocks() // In KRNL386 only
 IsUserIdle()
 if ( fPokeAtSegments ! = 0 )
 if ( USER is idle )
 Call routine to load a boot time module segment
 goto Walk_through_task_list
 INT 28h
 INT 2Fh, AX = 1689h
 goto Walk_through_task_list
 //-------------BOTTOM HALF OF TASK WALK-------------------
 Does_this_task_have_an_event?:
 if ( event count field == 0 )
 point to next TDB
 goto Try_next_task
 //-------------TASK SWITCHING CODE---------------------
 // Found a potential task to switch to
 // Make sure it's OK to switch
 if ( found task == current task )
 goto Reschedule_done
 startup_this_task:
 if ( there is a locked task, and it's not the current task )
 goto Reschedule_done
 If ( InDOS flag )
 goto Try_next_task
 // It's OK to switch tasks now
 Increment InScheduler flag
 Delete & re-insert task into TDB list to give it proper priority.
 Lock the global heap
 Save SS:SP in current TDB
 Call function that checks the following
 if ( 80x87 present )
 save control word in outgoing TDB
 if ( Current disk not set in TDB )
 Save the current DOS disk/directory in the TDB
 Call routine that sends task switch out notification
 Set new Windows current TDB value
 Set new Windows PDB value from the PDB value in the incoming TDB
 if ( 80x87 present )
 load control word from incoming TDB
 Call routine that sends task switch in notification
 Switch to SS:SP from incoming TDB
 Decrement InScheduler flag
 Tell the display driver that we've switched // In KRNL386 only
 Unlock the global heap
 Reschedule_done:
 Restore registers from stack
 Return to caller


Reschedule() takes no parameters, and returns no values. Any routine that
calls Reschedule() has no idea of what happens between the time Reschedule()
is CALLed and when the next instruction begins executing. Reschedule() may
decide that there is no reason to switch tasks, and simply return.
Alternatively, it may switch away from a task for hours on end. To the caller,
Reschedule() is an atomic operation. Time effectively does not exist for the
task when it is in Reschedule().
To provide this transparency of operation, as well as keep the system flowing
smoothly, Reschedule() has to accomplish four things:
Find the next task that should be scheduled, and make sure no "extraordinary"
conditions prevent it from being switched to.
Save the complete state of the "outgoing" task, and restore the state of the
"incoming" task. Not doing so results in chaos.
If no task needs to be scheduled, go into an "idle loop" that allows
background actions to take place.
Update the values that Windows maintains for the current task.
Unfortunately, the code to do each of these things is not broken up into
distinct sections. Instead, the work for each of the above items is completely
intermeshed, and lies in various states of completion throughout much of
Reschedule(). In the following section, I'll cover each of the steps taken by
Reschedule(), while also trying to keep the big picture in mind.


Entering the Scheduler


Upon entry to Reschedule(), the following registers are saved on the stack of
the calling task: BP, DS, SI, DI, AX, CX, ES, BX, DX. Furthermore, the act of
calling Reschedule() causes the CS and IP registers to be placed on the stack.
The ToolHelp routines TaskGetCSIP() and TaskSetCSIP() rely upon the stack
frame created by Reschedule() to obtain and set the CS:IP values to be used
when the task starts up again. The 32-bit registers are not saved across task
switches. If your code uses the 32-bit registers, then it's important that you
be aware of when a task switch might occur and plan accordingly.
After saving the registers on the stack, a routine is called that updates the
profile (.INI) files if they have been changed since the task was last
switched to. Following that, the current hTask value maintained by Windows is
loaded into DS, and the task-database signature found at offset OFAH is
verified to be correct. If the task database (TDB) looks incorrect, then
Reschedule() immediately jumps to the code that begins looking for a new task
to switch to. If the signature bytes in the TDB look okay, then the WORD at
offset OAAH is zeroed out, but not before retrieving its value into AX. This
value is nonzero if DirectedYield() was called, and indicates which hTask the
caller wants scheduled next. However, Reschedule() will not blindly schedule
this new hTask; it first verifies that the event count in the potential new
TDB is not 0. If it is, then Reschedule() treats this invocation as if it were
invoked by a regular Yield() call. All this implies that the task specified in
a DirectedYield() call may not be the one that's scheduled next. As stated in
the Windows 3.1 documentation, if you really want to make sure a particular
task is scheduled, then use a PostAppMessage() to ensure that the event-count
field in the TDB is nonzero.
The next major section of code following this "startup" sequence is the heart
of Reschedule(), and is responsible for finding a suitable task to switch to.
As mentioned earlier, all the task databases in the system are kept in a
linked list. Reschedule() starts at the head of the chain, and iterates
through each TDB, looking for one with a nonzero event count in the WORD at
offset 6. Note that this field is not the same as the message-count value
stored in the message queue of each task. For the remainder of this article,
I'll refer to this walking of the task list as the "walk-through-task-list"
code. When a TDB with a nonzero event count is found, Reschedule() attempts to
switch the current task to that TDB.


The Idle Loop


If no events are waiting to be processed and all the tasks have been iterated
through, then the walk-through-task-list code falls into the idle-loop
section. The actions of this section of code depend on whether you're running
in Standard or Enhanced mode windows, and on whether you're using virtual
memory. The code described in the next two paragraphs exists only in the
KRNL386 .EXE version.
Upon entering the idle loop, the code checks the values in some of the KERNEL
paging flags, and if conditions are right, calls an internal routine called
ShrinkHeap() which walks the global heap. If it finds enough free memory, it
unlinks the free blocks from the global heap.
If the paging system is in use, and if certain other paging flags are set,
then the idle loop proceeds to call another internal routine:
DiscardFreeBlocks(). The job of DiscardFreeBlocks() is to find free blocks of
paged memory and give them back to the DPMI server. During the call to
DiscardFreeBlocks(), hardware events may have occurred. To deal with this,
Reschedule() JMPs back to the walk-through-task-list code and starts anew the
process of looking for a suitable task to switch to.
Having done the previous heap housekeeping, Reschedule() makes a call to the
USER routine IsUserIdle(). In Windows 3.1, this is where USER places its
checks to see if a screen saver should be activated. The return value from
IsUserIdle() is a BOOL, and is True if a mouse button is held down, False
otherwise.
If the return from IsUserIdle() is True, and if a flag called fPokeAtSegments
is nonzero, then once every 20h times through the idle loop, an internal
routine called PokeAtSegments() is called. This routine walks the module list,
and loads any discardable segments that haven't been previously loaded. The
module-list walk stops when it encounters the SHELL module. This presumably
causes PokeAtSegments() to load segments only for the modules required for
Windows to boot up. Once all the segments have been loaded, the
fPokeAtSegments flag is set to 0, so that the idle loop doesn't try to call
PokeAtSegments() any more. If PokeAtSegments loaded a segment during a
particular iteration of the idle loop, then the code JMPs back to the
walk-through-task-list code.
Two things remain to be done in the idle loop. The return value from
IsUserIdle() is saved away on the stack before calling INT 28, the MS-DOS idle
interrupt. When the INT 28h returns, the IsUserIdle() return value is
retrieved from the stack and used to decide what will be in the BX register
for an INT 2Fh, AX = 1689h call. If IsUserIdle() returned True, then BX = 0,
otherwise BX = 1. The INT 2Fh, 1689H call is documented in the INT2FAPI.INC
file in the DDK, as the "Windows kernel idle call." When the INT2Fh returns,
the idle loop JMPs to the walk-through-task-list code.


We've Found a Task ... Now What?


Upon finding a TDB that has a nonzero event count, Reschedule() starts the
process of saving away the "context" of the outgoing task and restoring the
context of the incoming task. Before this can be done, though, a few things
need to be checked. If the incoming task is the current task, then there's no
reason to go through the full process of saving and restoring the task
context. Instead, Reschedule() restores the registers saved on the stack upon
entry and returns.
The next important thing to check is whether there are any "locked" tasks in
the system. A locked task is the only task in the system allowed to receive
messages. All other tasks are shut out until the task is unlocked. A task can
be locked by the LockCurrentTask() in KERNEL, or by the LockMyTask() function
in USER. A system modal message box is an example of a locked task in action.
Reschedule() checks whether any tasks in the system are locked. If the
incoming task is not the same as the locked task, then the saved registers on
the stack are restored, and Reschedule() returns without having switched
tasks.
The last thing checked before starting the process of switching tasks is a
check to see if KERNEL is currently in its INT 21 handler. This is
accomplished by checking the KERNEL_INDOS flag, which is also checked in other
"critical sections" of code elsewhere in KERNEL. If KERNEL is currently in its
INT 21 handler, the walk of the task list is resumed at the next task in the
list of tasks, rather than at the start of the task list. It's not readily
apparent why the walk doesn't start again at the head of the task list.
Once we've gotten past the above gauntlet of idle loops and other checks,
Reschedule() now starts the actual process of saving away the context of the
current task and waking up the incoming task in the same state in which it was
left. The switching code is as much of a "critical section" as any other piece
of code, so it is only fitting that a global variable InScheduler is
incremented to indicate that work is in progress. The first thing that the
switching section does is readjust the priority of the task. The task priority
is given by the WORD value at offset 8 in the TDB. Tasks with a lower priority
number will get an opportunity to run before tasks with higher priority
numbers. The priority of a typical task is 0, but it can range between -32 and
15. The priority of a task can be set by the undocumented SetPriority()
function in KERNEL. Up until now, I haven't discussed how this priority system
is implemented, at least not explicitly. Earlier, I mentioned that the list of
tasks is walked, looking for the first task that has at least one event
waiting for it. As it turns out, the task list is always kept in
priority-sorted order. The tasks with the lower-priority numbers come first in
the list, thereby causing their TDB to be checked for events before tasks with
higher-priority numbers. Reschedule() causes the task list to be kept in order
by momentarily deleting the incoming TDB from the task list and then
reinserting it. The function that inserts tasks into the list inserts the TDB
in priority order.
Because most tasks share the same priority value, it is important to give them
each a fair chance to be selected for scheduling. This is done by incrementing
the priority value in the incoming TDB before removing/reinserting it into the
task list; after reinsertion, the priority is decremented back to its previous
value. This has the net effect of placing the incoming task later in the list
than any other tasks that share the same priority value. (A lower numerical
value means a higher priority.)
Upon reprioritizing the TDB list, Reschedule() next sets a flag in the segment
that manages the information for the global heap. By setting this flag, the
global heap LRU functions are prevented from changing the global heap during
the remainder of the task-switching code.
At this point, it is time to finish saving the context of the outgoing task.
As you will recall, most of the 16-bit registers for the outgoing task were
pushed onto the stack upon entering Reschedule(). The remainder of the job
involves three steps. First, the current SS:SP values are placed into the
DWORD at offset 2 in the outgoing TDB. Then, an internal routine called
SaveState() is called. Inside SaveState(), the FSTCW instruction is used to
save the control word of the 80x87 into the outgoing TDB. Of course, this is
only done if an 80X87 is installed. I use the word "installed" to mean that
the math-coprocessor bit was set when KERNEL did an INT 11H at boot time.
Additionally, if the high bit of the current drive field in the outgoing TDB
is not set, then INT 21h, AX = 19h, and INT 21h, AX = 47h are used to store
the current drive and directory values into the outgoing TDB. The final act of
the outgoing process is to call a routine that causes the "outgoing-task"
notification to be sent. This notification can be caught by TOOLHELP.DLL as an
NFY_TASKOUT notification, or by a RegisterPtrace()/ToolhelpHook() callback
function with AX = 0Dh.
Reschedule() now turns to waking up the incoming task. As you might expect,
the steps to start it up are almost a mirror image of the steps to save away
the outgoing task. First, the global variable containing the current TDB is
set to the value of the incoming TDB. Next, the PDB (or PSP if you prefer) of
the incoming task is retrieved from the incoming TDB, and stored in the global
variable that Windows uses for its current PDB value. If an 80x87 is
installed, the control word for the incoming task is loaded from the TDB via
an FLDCW instruction. Continuing on, the corresponding "incoming notification"
is sent (NFY_TASKIN for Toolhelp, A = 0Eh for the
RegisterPtrace()/ToolhelpHook() callbacks).
The SS:SP registers are switched to the values stored away in the incoming
TDB. The InScheduler flag is decremented. If we're running KRNL386.EXE, the
UserRepaintDisable() function in the Display module is called with a 0
parameter, indicating that hardware updates are now okay. The flag that
prevents mucking with the global heap is decremented. The final act of
Reschedule() is to pop the register values for this task off the "new" stack.


Summary


As you might have guessed, the Windows scheduler is integrally tied to many
parts of Windows. In our journey, we have glanced over many areas, including
tasks, modules, the global heap, notifications, and interrupts. All are worthy
of study in their own right. If you really wish to understand how Windows
works, you must be familiar with all of them, as well as many other things.
It's a steep learning curve, but in the end, I believe it's worth it.
Hopefully this article has given you some idea of what really goes on beneath
the hood and inspired you to do more exploration on your own.


References


Schulman, Andrew, David Maxey, and Matt Pietrek. Undocumented Windows.
Reading, MA: Addison-Wesley, 1992.


_INSIDE THE WINDOWS SCHEDULER_
by Matt Pietrek

Figure 1:


 //------------------STARTUP SECTION-------------------
 Save registers on stack of outgoing task
 Update task profiles
 If ( TDB signature not OK)
 goto Walk_through_task_list
 Get DirectedYield field in TDB into AX; zero it out in TDB
 If ( DirectedYield hTask == 0 )
 goto Walk_through_task_list
 If ( event count in DirectedYield TDB != 0 )
 goto startup_this_task
 //---------------TOP HALF OF TASK WALK-------------
Walk_through_task_list:
 Point to the first task in the linked list of tasks
Try_next_task:
 if ( another task in list )
 goto Does_this_task_have_an_event?
 //---------------THE IDLE LOOP--------------------
 ShrinkHeap() // In KRNL386 only
 DiscardFreeBlocks() // In KRNL386 only
 IsUserIdle()
 if ( fPokeAtSegments != 0 )
 if ( USER is idle )
 Call routine to load a boot time module segment
 goto Walk_through_task_list
 INT 28h
 INT 2Fh, AX = 1689h
 goto Walk_through_task_list
 //-------------BOTTOM HALF OF TASK WALK----------------------
Does_this_task_have_an_event?:
 if ( event count field == 0 )
 point to next TDB
 goto Try_next_task
 //-------------TASK SWITCHING CODE------------------------
 // Found a potential task to switch to
 // Make sure it's OK to switch
 if ( found task == current task )
 goto Reschedule_done
startup_this_task:
 if ( there is a locked task, and it's not the current task )
 goto Reschedule_done
 If ( InDOS flag )
 goto Try_next_task
 // It's OK to switch tasks now
 Increment InScheduler flag
 Delete & re-insert task into TDB list to give it proper priority.
 Lock the global heap
 Save SS:SP in current TDB
 Call function that checks the following
 if ( 80x87 present )
 save control word in outgoing TDB
 if ( Current disk not set in TDB )
 Save the current DOS disk/directory in the TDB
 Call routine that sends task switch out notification
 Set new Windows current TDB value
 Set new Windows PDB value from the PDB value in the incoming TDB
 if ( 80x87 present )
 load control word from incoming TDB
 Call routine that sends task switch in notification
 Switch to SS:SP from incoming TDB

 Update the "current TDB" global variable
 Decrement InScheduler flag
 Tell the display driver that we've switched // In KRNL386 only
 Unlock the global heap
Reschedule_done:
 Restore registers from stack
























































August, 1992
MOVING FROM ASSEMBLY TO C


A technical-support veteran discusses using C for embedded systems




Beth Mazur


Beth provides support for the Whitesmiths 8-bit C cross compilers at
Intermetrics Microsystems Software. She can be reached at 733 Concord Ave.,
Cambridge, MA 02138.


Most embedded systems programmers learned their craft using assembly language
programming. Why? Because for a long time, assemblers were the only tools
around for writing code for target systems. Now, however, high-level language
tools specifically designed for embedded systems development are available.
Consequently, C is increasingly becoming the language of choice for many
embedded systems projects.
What's C's attraction? For starters, C is more maintainable and portable--and
just as efficient--as assembly. Still, many assembly programmers shy away from
moving to C, citing its steep learning curve or the lack of language
facilities.
This article has its roots in my two years on the front lines of Intermetric's
technical support for Whitesmith's C cross-compilers; the issues I discuss
here stem from the most frequently asked questions. For the most part, I'll
use Motorola's M68HC11 to illustrate examples, although much of the
information applies to other 8-, 16-, and 32-bit processors as well.


Using C


C cross-compilers let you write code on a host PC, UNIX workstation, or other
computer and compile for use with any number of 8-bit target processors. Since
not all of these processors offer the same support for the C language, you
need to first understand how C handles the stack, recursion and reentrancy,
the heap, C keywords of special interest to embedded systems programmers, the
runtime startup, automatic data initialization, locating, function prototypes,
and library support. You then need to decide how important these are to your
application.
For example, most C compilers use a stack. As in the assembly world, the stack
is used for temporary storage. In C, the stack is used to pass parameters and
as storage for local variables (called "autovariables" in C). Because of the
dependency on a stack, you must consider your processor when planning your
application. If you plan on using on-chip RAM only (256 bytes on the
M68HC11A8), you'll need to minimize your use of local variables, parameter
passing, and nested subroutine calls. However, if you'll be using external
RAM, you can allocate a section of memory there for stack usage.
Alternatively, many C compilers can generate code that uses static memory
instead of a stack. This option minimizes the RAM requirements of an
application, but generally makes functions compiled this way non-recursive and
non-reentrant. If your application does not require recursion or reentrancy,
you can take advantage of compile-time models that use static memory for
parameter passing and local variables. Otherwise, you'll have to consider
stack usage when deciding how much RAM your application will need.
A program is recursive if the compiled program can call itself during run time
and execute correctly. The factorial program in Example 1 shows a recursive
routine. If the local variable temp is stored on the stack, the program will
execute correctly. Each time the program calls itself, storage for temp will
be created in a new area of the stack. However, if temp is stored in a static
area, say memory location 0x2012, each successive call to the program may
overwrite the previous value of temp (depending on the evaluation order of the
compiler). In this case, the factorial of any number would be 1.
Example 1: Factorial program that illustrates recursion.

 int factorial (int fact)
 {
 int temp;
 temp=fact;
 if (!fact)
 temp=1; /* fact = 0? */
 else
 temp*=factorial (temp-1); /* fact > 0 */
 return (temp);
 }

A program is reentrant if it can be entered by more than one task
simultaneously without loss of information. The factorial program in Example 1
is reentrant if each task that calls factorial() has its own individual stack
area. Reentrancy is important in the embedded systems world if you plan on
calling functions from both your main application code and your interrupt
service routines, or if you will be using an operating system and running
concurrent tasks.
The heap is an area of RAM used by the C runtime library routines malloc() and
calloc() to request memory dynamically. This is useful when you do not know
the size of an array or structure at compile time and would like to avoid
preallocating a large area of RAM that may not be used. The heap is an area of
memory, similar to the stack, that must be set up before your program begins
executing. Typically the heap starts a certain number of bytes away from the
stack pointer; during program execution, the stack and heap grow toward each
other, as shown in Figure 1. In this scenario, it's common for the stack and
heap to collide, which will generally cause unexpected results! If you plan on
using memory-allocation routines, you'll need to carefully consider the stack
and heap when planning the RAM requirements of your application.
One of the benefits of using C is that you will now have access to the many
data types in the language. These include integer types: char, short int, int,
and long int; and floating-point types: float and double. Combining these data
types with C's array, structure, and pointer facilities lets you define
complex data structures and manipulate them easily.
For the most part, you may define and access your program variables without
concern for how they are stored, as the compiler will treat each type
consistently. If you want to manipulate your data from outside the domain of
the C compiler, however, you'll need to know how the compiler represents the
different data types. For example, is an int a 16- or 32-bit quantity? (Some
compilers allow you to specify this at compile time.) Are floating-point
numbers stored in IEEE Standard form or in a form used only by the compiler
vendor? You'll need to know the answer to these questions if, for example, you
plan on processing data from an A/D converter which may use a format
incompatible with your program. If this is the case, you'll need to write an
assembly-language routine that will translate the data into the format used by
the compiler.
There are four keywords in C that are of special importance to the embedded
system programmer--const, extern, volatile, and register. These keywords
modify your variable declarations and notify the compiler that the particular
variable has a special property.
The const keyword declares that a variable cannot be modified during program
execution. As the name implies, this is most often used in embedded systems
programming to declare that the variable is a constant that will not be
changed and can therefore be stored in ROM. Declaring strings as const may be
helpful if you are trying to minimize RAM usage. A compiler will generate an
error message if your program tries to modify a variable that has been
declared const; see Example 2(a). The extern keyword identifies variables
defined in another module. In Example 2(a) I've defined pi in one module. If
pi is referenced in another module, tell the compiler that pi has been defined
outside this module and that it is a const type. Otherwise the compiler is
free to assume that pi is a regular float variable that can be modified. This
is done by specifying both extern and const in the second module, as in
Example 2(b).
Example 2: (a) (An error message will result if your program tries to modify a
variable that has been declared const; (b) using extern and const to tell the
compiler that pi has been defined outside this module.

 (a)

 const double pi = 3.14159;
 main()
 {
 pi = 1.234; / * will generate error at compile-time */
 }

 (b)


 extern const double pi;
 double circum(double radius)
 {
 double circ;
 circ = 2.0 * pi * radius;
 }

As a general rule, external declarations should be the same as the original
definition, except that you need to include the keyword extern and omit any
initialization.
The volatile keyword is used to declare that references to a variable should
not be optimized away. This is important in embedded systems when a variable
may be memory-mapped I/O or storage shared between independent tasks. Example
3 shows the result of declaring porta as volatile. Here, the assembly code
generated for C line 7 has been optimized out. Because porta was declared
volatile, the code for the second assignment to porta was not deleted.
Example 3: The result of declaring porta as volatile.

 ; 1 volatile char porta;
 ; 2 char portb;
 ; 3 test()
 ; 4 {
 _test:
 ; 5 porta=1;
 ldab #1
 stab _porta
 LL4:
 ; 6 porta=2;
 ldab #2
 stab _porta
 LL6:
 ; 7 portb=1; / * optimized out as redundant */
 ; 8 portb=2;
 ldab #2
 stab _portb
 ; 9 }
 rts
 ; 10

The register keyword is used to specify that a variable should be stored in a
register, as it will be used extensively in the program. Placing a heavily
used variable in a register can reduce code size and increase execution speed.
However, many compilers will ignore the register keyword if the target
processor has a limited number of registers.
Even if you write your entire application in C, one minimal piece of assembly
code will generally be required. This is the runtime startup program, and its
purpose is to set up the environment for your C program. At a minimum, the
startup will load the stack pointer. Your runtime startup may also include
code to manipulate target registers such as the M68HC11 EEPROM block-protect
register, BPROT, that must be zeroed within the first 64 cycles after reset to
enable EEPROM writes.
Data manipulation may also occur in the startup program. Many startup programs
will zero out any uninitialized variables you have defined, since the C
language specifies that these variables have the value 0 when the program
begins executing. The startup routine may also be the place to set up
initialized global variables. This is called automatic data initialization.
Most C compilers make a distinction between initialized and uninitialized
global variables; see Example 4. In the embedded systems world, this is
because variables that will be modified must reside in RAM, yet RAM contents
are usually erased or corrupted when power is removed. For uninitialized
global variables, this is not a problem, as you do not expect these variables
to have any meaningful value at startup time (other than 0). However, this is
not true of initialized variables. These must have the value that you
specified in your program.
Example 4: Most C compilers make a distinction between initialized and
uninitialized global variables.
 /* Initialized variables are declared with a value.
 The variable will have this value when the program begins executing.
 Uninitialized variables are not declared with a value. */
 int days_of_year = 365; /* this is an initialized variable */
 int number_of_days; /* this is uninitialized */

One solution is to store the values of these variables in ROM and then
initialize the RAM locations during the runtime startup process. In order to
minimize the ROM storage requirements, at compile time initialized global
variables are typically generated into one data section, and uninitialized
global variables are generated into a different data section (often called the
"bss" section--a carryover term from the original C compiler development on
the PDP-11). Only variables in the initialized data section need to have their
values copied into ROM. Figure 2 illustrates a typical M68HC11 memory map
before and after the runtime startup has copied data from ROM to RAM.
One key feature of ANSI C compilers is their support for function prototypes.
Function prototypes are function declarations that provide an explicit list of
a function's parameters and their types. The benefit of providing a function
prototype is that it enables the compiler to check that the arguments passed
to a function match the prototype, preventing parameter mismatch problems
between calling and called functions; see Example 5.
Example 5: Function prototypes enable the compiler to check that the arguments
passed to a function match the prototype.

 long a,b;
 int i,j,k;
 int subroutine(int first, long second);
 main()
 {
 i=subroutine(j,a); /* no action will be taken */
 i=subroutine(j,k); /* compiler will promote k to long */
 i=subroutine(a,b); /* will generate error, since long
 cannot be passed for first param */
 }

Function prototype checking means that the compiler can make its best effort
to comply with your prototype. If it can widen a parameter to match the
prototype, it will do that. However, the compiler cannot truncate a parameter
without loss of information, so it will flag those calls as errors.

Most C compilers offer an extensive C runtime library that offers a variety of
routines that you can call from your program. These include I/O functions like
printf() and scanf(), character functions like isalpha(), string functions
like strcpy(), and math functions like sqrt(). Most compilers load only those
library routines referenced by your program, so you don't have to worry about
entire libraries being copied into your application. Some compilers also
include the source to the C library so that you can modify routines if
necessary.
If you will be using the I/O functions, you should check to see whether you
will need to modify any routines. You may need to modify getchar() and
putchar() for your target hardware; the other I/O routines are usually coded
to call putchar() and getchar() and don't need to be changed. Example 6 shows
C versions of putchar and getchar for the M68HC11.
Example 6: C versions of putchar and getchar for the M68HC11.

 #define RDRF 0x20 /* Receive Data Register Full */
 #define TRDE 0x80 /* Transmit Data Ready Empty */
 #define SCSR *(char *) 0x102e /* SCI Status Register */
 #define SCDR *(char *) 0x102f /* SCI Data Register */
 int putchar (int c)
 {
 if (c == '\n')
 putchar('\r'); /* put newline and carriage return */
 while (! (SCSR & TRDE)) /* wait until ready to receive */
 ;
 SCDR = c; /* output character */
 return (c);
 }
 int getchar()
 {
 int c;
 while (! (SCSR & RDRF)) /* wait for character */
 ;
 c = SCDR; /* receive character */
 if (c == '\r')
 c = '\n'; /* translate to newline */
 return (c);
 }

Once your program has been compiled, you will need to link and locate it. In
assembly programs, you may have used an assembler directive like ORG to locate
your code and data sections in one program file. In C, compiled code is
usually relocatable--this means that the location of instructions and
variables is not known at compile-time. Locating code and data is the job of
the linker.
Compilers make this job easy by allowing you to locate code either by module
or by section. This capability means that you can change the location of your
program simply by relinking--your code does not need to be recompiled.
However, this does mean that some code will need to be in a separate module.
For example, in the M68HC11, the interrupt vector table is located at 0xFFD6.
Generating the addresses for this table usually means compiling or assembling
a separate module that includes only the addresses for the interrupt service
routines.


Combining C and Assembly


Often, the size or speed of a particular routine is critical, and you want to
use assembly language because it offers maximum control. Most C compilers
support a variety of features that let you combine the best of C and assembly.
In addition to becoming familiar with these features, you'll need to
understand the compiler's interface conventions so that you can call assembly
programs from C and vice versa.
There are three main interface conventions to be aware of when combining both
C and assembly-language routines: naming global symbols, passing parameters,
and returning values. (The following examples may or may not be valid for the
compiler you use.)
Many C compilers automatically prepend an underscore character to all global
symbols defined in C. This is done to avoid unintentional conflict with
symbols defined at the assembly level. In the C declaration in Example 7(a), a
M68HC11 C compiler will generate the code in 7(b). If your assembly routine
needs to reference a variable defined in C, you will need to generate an
external instruction in your assembly code that includes the underscore.
Example 7: (a) Using this C declaration, an M68HC11 C compiler will generate
the code in (b); (b) generated code; (c) a subroutine call to func().

 (a)

 i = 1; / * i is declared as an int */

 (b)

 ldab #1
 stab _i

 (c)

 jsr _func

Similarly, a subroutine call to func() will look like Example 7(c) .
Therefore, if func() is written in assembly language, you will need to
publicly declare it as _func within your file.
If you write your entire application in C, you won't have to worry about
parameter passing since the generated code will observe the conventions
defined by your compiler. If you plan on writing some of your routines in
assembly language, however, you will need to learn what these conventions are
and write your code accordingly. Typically, C compilers will pass parameters
by pushing them on the stack in reverse order (last parameter pushed first).
This is how your assembly program will need to receive them.
If there is only one parameter being passed, it may be passed in a register
rather than on the stack. Also, many ANSI compilers with only pass parameters
as a multiple of words. This means that variables defined as type char may be
widened to type int; compilers may also widen other types, like float to
double.
One way you can discover how the compiler passes parameters is to look at the
assembly code generated. Since these conventions vary widely from on vendor to
another, writing a dummy C program like that in Listing One (page 120) and
compiling it to assembly code can save you some time in writing your own
assembly programs. Note that if optimization is turned on, the generated code
may not be what you expect. If possible, compile without optimization to avoid
code from being altered.
Compilers also expect functions that return values to do so according to
convention. In Listing One, subroutine returned an int passed in the D
register. Other variable types are returned either in registers or on the
stack, depending on the size of the return value and the registers available
for the specific processor.
Compilers often reserve processor registers for their own use. This is
typically done to point to global data areas or for use as an external stack
pointer. You will need to verify which registers are reserved by your compiler
and avoid using them in your assembly programs. If you must use a reserved
register, it is your responsibility to save the value of this register and
then restore it before your program returns to the calling program.

Many C compilers offer inline assembly support. This feature allows you to
insert a line or more of assembly code within your C program. For example,
suppose you wanted to disable interrupts before a call to a subroutine.
Typically this would look like Example 8.
Example 8: Using inline assembly to disable interrupts before a call to a
subroutine.

 main()
 {
 ...
 _asm("sei\n"); /* disable interrupts */
 func(); /* call function */
 _asm("cli\n"); /* enable interrupts */
 ...
 }

You can write your interrupt vector table in C or assembly language.
Generally, you will write a module that defines data locations that will
contain the addresses of your interrupt service routines. At link time, the
linker will resolve these references and fill in the correct values. Listing
Two (page 120) is a C version of a M68HC11 interrupt vector table. This module
can be compiled and then located at 0xFFD6 to serve as a generic interrupt
vector table and modified later as you write your interrupt service routines.
Finally, because many processors support memory-mapped I/O, it is useful to be
able to define variables in C whose location corresponds to a particular I/O
register or port. This can be done in C by defining a pointer that references
an absolute address. Listing Three (page 120) shows how this is done. The
absolute address specified in the #define statement will be substituted for
the symbol value.


_MOVING FROM ASSEMBLY TO C_
by Beth Mazur



[LISTING ONE]

; 1 /* Here is a method of determining how the compiler expects parameters
; 2 to be passed between C and assembly routines. For example, suppose
; 3 you need to pass 3 integers to an assembly routine, where they will
; 4 be manipulated, and an integer value returned. The trick is to write
; 5 a C routine that simulates this. Then compile to assembly and see how
; 6 the compiler treats the parameters (you can use the generated
; 7 assembly code as a template and add your code to it). */
; 8 int h,i,j,k;
; 9 main()
; 10 {
_main:
; 11 h = subroutine(i,j,k);
 ldd _k ; push k on stack first
 pshd
 ldd _j ; push j on stack next
 pshd
 ldd _i ; leave i in register
 jbsr _subroutine
 ins ; clear parameters off stack
 ins
 ins
 ins
 std _h ; result in register
; 12 }
 rts
; 13 int subroutine(int a, int b, int c)
; 14 {
_subroutine:
 pshx ; save X
 pshd ; save i on stack
 tsx ; transfer SP to X
 .set OFST=0
; 15 return(a+b+c);
 ldd OFST+0,x ; code to copy i back to D
 addd OFST+6,x ; add j
 addd OFST+8,x ; add k, result left in D

 pulx ; restore D into X from stack
 pulx ; restore X
 rts
; 16 }






[LISTING TWO]

extern void reset(); /* symbol defined in startup routine */
extern void default(); /* generic routine generates a return from interrupt */
void (* const vec_tab[])() = {
 default, /* SCI */
 default, /* SPI */
 default, /* Pulse accumulator input edge */
 default, /* Pulse accumulator overflow */
 default, /* Timer overflow */
 default, /* Timer output compare 5 */
 default, /* Timer output compare 4 */
 default, /* Timer output compare 3 */
 default, /* Timer output compare 2 */
 default, /* Timer output compare 1 */
 default, /* Timer input capture 3 */
 default, /* Timer input capture 2 */
 default, /* Timer input capture 1 */
 default, /* Real time interrupt */
 default, /* IRQ */
 default, /* XIRQ */
 default, /* SWI */
 default, /* illegal op-code */
 default, /* COP watchdog timer fail */
 default, /* COP monitor clock fail */
 reset, /* RESET */
 };






[LISTING THREE]

; 1 /* Here is a method of accessing the M68HC11 I/O ports that is portable
; 2 and can be compiled by most C compilers. c is declared as an int
; 3 because that's how putchar is defined by ANSI. The (char) cast is done
; 4 to avoid "illegal assignment" warnings from strict compilers. */
; 5 #define SCSR *(char *) 0x102e /* SCI Status Register */
; 6 #define SCDR *(char *) 0x102f /* SCI Data Register */
; 7 #define TDRE 0x80 /* Transmit Data Register Empty bit */
; 8 int putchar(int c)
; 9 {
_putchar:
 pshx
 pshd
 tsx
 .set OFST=0

L1: ; line 14, offset 3
; 10 while (!(SCSR & TDRE))

 ldab 102eH
 bitb #128
 beq L1
; 11 /* loop until ready to transmit */
; 12 SCDR = (char) c;
 ldab OFST+1,x
 stab 102fH
; 13 return(c);
 ldd OFST+0,x
 pulx
 pulx
 rts
; 14 }
 .public _putchar
 .end



Example 1:

int factorial(int fact)
{
int temp;
temp=fact;
if (!fact)
 temp=1; /* fact = 0? */
else
 temp*=factorial(temp-1); /* fact > 0 */
return (temp);
}

Example 2:

(a)

const double pi = 3.14159;
main()
{
 pi = 1.234; /* will generate error at compile-time */
}


(b)

extern const double pi;
double circum(double radius)
{
 double circ;
 circ = 2.0 * pi * radius;
}


Example 3:

; 1 volatile char porta;
; 2 char portb;

; 3 test()
; 4 {
_test:
; 5 porta=1;
 ldab #1
 stab _porta
LL4:
; 6 porta=2;
 ldab #2
 stab _porta
LL6:
; 7 portb=1; /* optimized out as redundant */
; 8 portb=2;
 ldab #2
 stab _portb
; 9 }
 rts
; 10


Example 4:

/* Initialized variables are declared with a value. The variable will have
this value when the program begins executing. Uninitialized variables are
not declared with a value. */
int days_of_year = 365; /* this is an intialized variable */
int number_of_days; /* this is uninitialized */


Example 5:

long a,b;
int i,j,k;
int subroutine(int first, long second);
main()
{
i=subroutine(j,a); /* no action will be taken */
i=subroutine(j,k); /* compiler will promote k to long */
i=subroutine(a,b); /* will generate error, since long
 cannot be passed for first param */
}


Example 6:


#define RDRF 0x20 /* Receive Data Register Full */
#define TRDE 0x80 /* Transmit Data Ready Empty */
#define SCSR *(char *) 0x102e /* SCI Status Register */
#define SCDR *(char *) 0x102f /* SCI Data Register */
int putchar(int c)
 {
if (c == '\n')
 putchar('\r'); /* put newline and carriage return */
while (!(SCSR & TRDE)) /* wait until ready to receive */
 ;
SCDR = c; /* output character */
return (c);
}

int getchar()
{
int c;
while (!(SCSR & RDRF)) /* wait for character */
 ;
c = SCDR; /* receive character */
if (c == '\r')
 c = '\n'; /* translate to newline */
return (c);
}


Example 7:

(a)

i = 1; /* i is declared as an int */

(b)

ldab #1
stab _i


(c)

jsr _func


Example 8:


main()
{
 ...
 _asm("sei\n"); /* disable interrupts */
 func(); /* call function */
 _asm("cli\n"); /* enable interrupts */
 ...
}






















August, 1992
HIGH-SPEED NETWORKING


Header prediction and forward-error correction for very high-speed data
transfer




William Frederick Jolitz


Bill is the developer of 386BSD, the Berkeley UNIX Research operating system
for the 386/486 PC. He is currently writing a book on 386BSD internals. You
can reach Bill on CompuServe, 76703,4266.


Networking hardware technologies of the future promise data-transfer rates of
gigabits per second in place of today's megabits--yet many of the techniques
used in our software and hardware architectures don't scale well to these high
data rates. The basic problems we face are an order-of-magnitude increase in
processor speed, coupled with a three order-of-magnitude increase in network
bandwidth. So even if our network-protocol processing cost is reduced to 1
percent (with today's networks), and the processor speed improves as expected,
we'll still reach the saturation level, with no room left to run an
application!
Once we bring I/O and application demands into the picture, things become
quite problematic. I/O is at the bottom of the computer system's "food chain"
for processor cycles. Applications demand massive I/O and want to supervise
the demand for the processor as well. The obvious loser in this competition is
the operating system, which is being squeezed from both the top (by
applications) and bottom (by device hardware).
In this article, I'll discuss problems involved with high-speed networking,
focusing on certain software improvements in the Berkeley TCP/IP
implementation that attempt to deal with some of the challenges. You can see
this software in operation in any 386BSD system. Variations of this approach
have already been used with FDDI (100 megabits/sec) and HiPPi (880
megabits/sec).


Network Profusion Magnifies the Problem


Ironically, the very success of computer networking has magnified the scale of
these problems. While the number of host computers on networks is growing at a
more-or-less "linear" rate, the geographic span of computer networks is
growing much faster. This problem isn't restricted only to workstations; even
in the PC arena, networks are a formidable consideration. (Novell estimates
that 45 percent of all PCs sold today will be connected to a network.)
The transmission rates of new long-distance communications technologies cause
the original ARPANET 56-kilobits/second network and the current T1 (1.544
megabit/sec) service common on the Internet today to pale in comparison. Among
these technologies are T3 (44.736 megabits/second) and the Synchronous Optical
Network (SONET), which is ordered in units of 51.84 megabits/second (OC-1), up
to 2.488 gigabits/second (OC-48). (T1 has considerable historical
significance, however, since it was created as the original digital "trunk"
service between telephone exchanges--hence its name, T1.)
All these new services result in more bits on the wire per unit of time. By
increasing the data rate by three orders of magnitude, we now have literally
tens of megabytes of data in flight between any two urban centers. Since we
can't do anything to reduce the delay on a long-distance connection (unless
someone learns to travel faster than the speed of light), we have to look to
other solutions.
For example, let's send a message from San Francisco to New York over
different high-speed networks. Assuming a 70-millisecond "flight" time, we
will find 12 Kbytes in flight for a T1 connection, 385 Kbytes for a T3, and
almost 20 Mbytes for an OC-48 connection! If we discover an error in this
message upon reception, we are forced to receive a barrage of useless data
before the sender is informed of the problem and can respond. Worse yet, the
sender would need to remember the previous "in flight" data to be able to
respond. Clearly, high-speed networks require new approaches to leverage their
advantages appropriately; otherwise, the problems begin to resemble those of
the sorcerer's apprentice.


LANs Suffer Too, but Differently


Not only are long distance networks getting faster, so are local area networks
(LANs). FDDI (made more cost effective by a twisted-pair 100 Base 2 version)
holds the promise of 100 megabit/second technology for workstations and PCs.
(Indeed, PC LANs may exceed the PC disk-drive transfer rates in use!) Many
workstation vendors are exploring ATM (Asynchronous Transfer Mode) and SONET
to provide scalable LAN bandwidth. One method is to downsize the
telephone-company model of communications by using a simplified switch at the
center of a star network radiating to workstations. The switch allows each
"ray" of the star to be independently upgraded to faster bandwidths as they
become available.
But the transition of LANs to very high-speed networks is not strongly
affected by volume of traffic or delay; instead, high speed networks suffer
from a different malady. A notable characteristic of very-high bandwidth LAN
network transfers is for them to be "bursty"--usually, one transfer will
command all the resources of the computer's processor and network I/O device
for a brief period of time, to the exclusion of others. These "loudmouth"
transfers (such as image downloads from a server or database group-commit
operations) command the attention of client, server, and any gateways and/or
network in between.
Another characteristic of LAN technologies is the ever-present cost factor.
While 100+ MIPS on the desktop is within sight, 15-20 MIP PCs and workstations
are far more realistic. If we could streamline network-protocol processing
overhead, we might be able to leverage 0.1-gigabit/sec LAN technologies
available on current workstations/PCs, and use the same trick for the
1-gigabit/100 MIP LAN/computer combinations as they become available.
Knowing this, network and operating-system designers can implement
single-element caches at key points in the networking software implementation.
This will "short-circuit" the lookup and evaluation of data structures in
processing a protocol. This approach favors the most common case, in which the
packet being processed is tied to the program associated with the previous
packet processed. This will clear out much of the overhead in our increasingly
complicated network protocols.
None of these problems are irreconcilable. In fact, there are many competing
solutions we should be aware of as the world gradually migrates into gigabit
networks. Both simple and complex solutions can be found by "thinking
gigabit."


Example: Single-element Caches in 386BSD TCP


A simple example of thinking gigabit can be found within 386BSD's Transmission
Control Protocol (TCP) implementation, the current version of the University
of California's Internet networking implementation. This protocol corresponds
to level 4 of the famous OSI seven-level model (see Table 1), which concerns
itself with the speed, quality, and reliability of data transfer between
systems. This is the correct layer to attempt to accelerate, since its sole
concern is moving information at the fastest possible rate. (See the textbox
"TCP/IP In Brief.")
Table 1: Seven layers of the ISO model.

 Layer Description
 ----------------------------------------------------------------------

 Physical Defines how data is electrically moved between two
 physically connected machines.
 Datalink Breaks data up into manageable packets and ensures that
 they arrive intact and in order.
 Network Determines route data will take to reach target machine.
 Transport Maintains virtual connections between local and remote
 machine.
 Session Maintains communication sessions and may include ability
 to reestablish broken connections automatically.
 Presentation Specifies the format in which data will be

 represented. This is especially important when
 communicating across different platforms.
 Application The user application level which determines the contents
 and meaning of data.

Associated with each TCP "session" is a pcb, or protocol control block. On
receipt of a new TCP segment, we need to find the associated Internet pcb, or
inpcb. We must search hundreds of possible connections to find the appropriate
buffer; see inpcb() in Listing One, page 122. This costly search could be
shortened by assuming bursty behavior, checking whether the last TCP segment
came from the same source, and then using the previous search, which is still
valid; see Listing Two, page 122. Figure 2 describes the TCP segment format.
You might be tempted to elaborate upon these simple kinds of improvements or
to spread the theme throughout the networking implementation.


Header Prediction: Stretching the Present into the Future


There is a header (or trailer) associated with the data of a packet that
contains the control information relevant to the protocol processing of this
and related packets. Usually, this information is processed as it is received.
If it is a connection-oriented or "virtual circuit" protocol (as opposed to a
"connectionless," or datagram protocol), the state for this connection must be
updated, and new packets might need to be scheduled for transfer. (Perhaps
either an acknowledgment or more data needs to be sent.) This represents the
computational overhead of the protocol, and in most cases it occupies the
front end of the process of accepting new data on reception, in conjunction
with receipt of a packet. This is normal when we don't know what the next
packet might contain, or when it will arrive.
Header prediction is founded on the sensible expectation that the next packet
will come along on the heels of the previous one, and that the next packet's
header will likely be "guessable" by the software. These assumptions are true
in cases associated with mass transfers in the first place; corrupted or lost
packets are low probability events, and the more elegant features of a given
transport-level protocol are not usually exercised by the applications
programmer. So, when possible, header prediction uses a good heuristic to
process the protocol with a short-circuit evaluation.
Essentially, the heuristic involves tracking expected state changes associated
with each communications "stream." Thus, a model of each stream's next
protocol header is prepared before the packet arrives. When it arrives, the
incoming header must simply be matched to its intended form. If the match is
correct, the precomputed "new" state is assigned, the packet's data is
delivered, and packet processing is complete. This process may take very few
instructions, thus reducing the critical path. However, if the match is
incorrect, all the processing we attempted to avoid must now be done, and our
speed advantage is lost. Fortunately, this occurs less than a fraction of a
percent of the time, so it's of little consequence.
Listing Three (page 122) shows the code in 386BSD that attempts limited header
prediction. It searches for two common cases: receipt of a TCP segment that
can be transferred to the application immediately, without further processing;
and response to a simple acknowledgment to the last outstanding segment
transmitted. For these cases, processing the TCP segment can be collapsed down
to as little as 40-50 instructions, instead of the several thousand generally
involved.
More Details.
The incoming packet (ti) is compared with recorded information about this TCP
session (tp) kept in a protocol control block. We check to see that it:
Uses no other features of the protocol (flags).
Is a consecutive, in-sequence order received packet (ti->ti_seq ==
tp->rcv_nxt).
Is not the subject of a retransmission (tp->snd_ nxt == tp->snd_max, that is,
the outstanding transmission).
Also, we verify that the session is at steady state, with both sides
proceeding in lockstep (ti->ti_win && ti->ti_win == tp->snd_wnd). If so, we
can process the packet, which is stored in a buffer (m) overlaid by a data
structure (ti) that exposes the details of the packet's TCP/IP headers.
The heuristic works best with large packets of data that arrive at a
consistent pace, without variation, as occurs most with fast transfers. With
header compression, a modern 100 MIP workstation might be able to handle
multigigabit transfers with little hardware assist.


New Challenges in Very High-Speed Networking


Once we understand more fully the nature of very high-speed network traffic,
we must find ways (such as new protocol processing techniques) to better use
the present arrangement of computer hardware and software. After all, people
are not going to just throw out all their equipment and start from scratch.
Demands for greater bandwidth, round-trip time exchanges, and bursty networks
are just a few of the challenges faced by very high-speed networks. But they
all illustrate this underlying problem: As you scale a base technology used to
build a complex entity, risky side effects grow as a power function, or worse
yet, as an exponential! Take, for example, the pragmatic redesign of the
Internet in response to its growth (in terms of number of hosts and additional
demand of service). Numerous times, linear scaling has provoked nonlinear
effects, thus necessitating evolution of the network standards and
redeployment of the network structure itself. (You could make a fair argument
as to why OSI has not displaced TCP/IP simply on the basis of not keeping up
with the required changes that have strained the Internet in this way.)


Forward-error Correction: When You Care Enough to Not Retransmit


At still higher data rates than header prediction can handle, we might need to
abandon the current paradigms in protocols like TCP. One such paradigm is the
network delay and processing overhead of handling retransmission of failed
data. One simple method to avoid resending garbled data is to not resend it at
all! Instead, additional information is sent in the form of an error-detection
and correction-encoding scheme. Thus data integrity is preserved without the
need for retransmissions. The cost of this technique is contained in the
additional processing of the encoding/decoding and the overhead of
transmitting redundant data-state information.
A serpentine, or "barbershop pole" algorithm is possible; see Figure 3. Since
we won't know where the error might occur, we must transmit all the data at
least twice, or transfer ways to reconstruct it. For Figure 3, we just
redundantly transmit all information, spacing the data at the largest time
offset allowed by the buffers allocated on the sending side. This resembles
the original retransmission scheme, except that when we exceed the memory
space of the transmitting computer system, we send the (delayed) packet. Thus
the retransmission memory is a FIFO queue of packets.
A scheme like this relies on probabilities: It assumes that the error
condition is transitory (soft) within the FIFO's lifetime. If it isn't, you
can drop all the data in flight (!), abandon the connect (chances are it's
dying anyway), and return to the original retransmission scheme, taking the
performance hit.
Forward-error correction addresses network delay, but doesn't deal with
increasing protocol-processing performance. Suppose you had a workstation that
you wanted to receive different real-time video feeds, each to a separate
window, all in software. In this case, the assumptions that allow header
prediction are not present. Can we cope with this even greater demand for
network bandwidth? Yes, we can; protocol engines are one solution.


Protocol Engines: Specialized Hardware to Bypass the Bottlenecks


At some point, processing the network protocols pushes the limits of the
average workstation too far. We might consider factoring in increasing amounts
of dedicated hardware to more lithely handle the necessary processing. This is
warping the hardware to fit the situation; we can call this a "protocol
engine."
A protocol engine departs from the traditional frame of computer
system/network integration by substituting varying degrees of special-purpose
hardware to remove bottlenecks that otherwise limit performance. In this case,
both hardware and software are mutable to the degree that the desired
performance objective needs to be achieved. If a protocol implementation's
software consumes too many hardware instructions, we rebuild the hardware--we
provide new facilities that make the software's use of hardware fall into
line. Examples of this mountain-to-Mohammed strategy are providing separate
data paths for the data and header of a packet or providing a separate
functional unit to checksum a packet as it is processed. We might also wish to
overlap routing lookup with ordinary packet processing, killing the processing
if a route is found that exists to another network interface.
We could conceivably process every step of protocol implementation in
hardware, potentially parallelizing every operation possible. Such an endeavor
would be extraordinarily expensive--and not necessarily justified. As
mentioned earlier, high-speed traffic is characterized by bursty behavior. An
ultimate solution such as a fully hardware-implemented protocol stack would be
used intensively only during those brief bursts. Given current expectations,
this would be expensive overkill. In fact, the virtue of protocol engines lies
in their potential to scale to vaster aggregate amounts of bandwidths by means
of increasing application of VLSI hardware.
Protocol engines are a controversial topic; many prominent critics argue that
they are either underjustified, or a continuation of the outmoded front-end
processor arrangement popular in early networks. Front-end processors would
entirely process the protocols on a separate microprocessor, usually on the
same board as the network interface. This allowed a nonnetworked computer
system to interface to a network by adding a device driver and some software
"glue." The value of this approach was that it avoided changing the operating
system to work with the network. A vague argument for offloading the protocol
processing was frequently made. However, these units were usually slower than
native-mode protocols because the main processor was considerably faster than
the front end, and the cost of communication between the main processor and
the front end was just as expensive as having the main processor run the
protocols itself! A true protocol engine is a difficult and considerable
achievement, since it must outperform main-processor technology without
shifting the burden back to the main processor.


Conclusion


Testbed networks that employ the elements described here are currently in
operation at various technology centers around the world. In the U.S. alone,
interest in the competing projects for the National Research and Educational
Network (NREN), as well as the interest in fiber-based networks like FDDI and
HiPPi, have catalyzed this activity. ATM and SONET are watchwords for scalable
network bandwidth to take us into the gigabit/sec world. Improvements like
single-element caching and header compression are already in Berkeley
networking implementations that you can obtain today (386BSD). In short,
gigabit networking is far closer than you think.


TCP/IP In Brief



The Transmission Control Protocol (TCP) and its companion Internet Protocol
(IP) form the core protocols upon which the Internet network is built. In the
'70s, TCP/IP replaced Network Control Program (NCP) on the embryonic ARPANET.
Since then, it has evolved from a university curiosity to the predominate open
systems protocol of choice, forming the largest de facto network standard in
the world today. It predates the now-standard OSI model for describing
computer network structure; see Table 1.
IP provides the "legs" to move packet traffic around; it provides a minimal
infrastructure to shoot "datagrams" across an arbitrarily connected data
network. Datagrams are just "small" packages of data with source and
destination addresses on the "catenet," or catenation of networks; see Figure
1. IP's only concern, being solely an OSI level-3 or network-layer protocol,
is to provide a fabric for the simplest possible communication of datagrams
between computers on a network. As such, datagrams going across the Internet
travel without guarantees of any kind. They can arrive out of order or be
lost, corrupted, or misaddressed.
As the "planner," or scheduler of traffic across the network that IP provides,
TCP is concerned solely with transporting data contained within IP datagrams.
The next level up in the OSI model (level 4--TCP) uses the IP datagram to send
its data messages (called "segments"), taking care to account for the chaos
possible when transiting the network via IP. As these segments arrive, they
are buffered and handed off to an application program as a part of the stream
of data between the two application programs. TCP uses a floating "window" of
virtual buffer space to flexibly handle flow control, and to avoid the logjams
that result when network delay becomes significant. While TCP is the
predominate transport protocol used on the Internet, many others (UDP,
TP4,...) are also in use, and similarly make use of IP.
Since TCP provides communication to application programs in the form of a
bidirectional data stream, it requires a connection to engage the stream and a
disconnection to terminate the stream. Thus it is a connection-oriented
protocol (or virtual circuit), the opposite of a datagram service. Header
prediction makes it possible to streamline connection-oriented protocols. This
is because data streams commonly fall into predictable patterns when
connections are in place and supply/consumption is at a steady state. In other
words, the clever transport-protocol features that ensure well-behaved
communication are infrequently used; usually, the "null" or pass-through case
occurs.
--W.F.J.



_HIGH-SPEED NETWORKING_
by William F. Jolitz


[LISTING ONE]

From file /sys/netinet/in_pcb.c:

/* Find a internet protocol control block associated with a given session.
 * Session is determined as a match between foriegn (faddr, fport) and local
 * (laddr, lport) source/destionation identifiers. Allows wildcard matches
 * when some but not all of the parameters are known. */
struct inpcb *
in_pcblookup(head, faddr, fport, laddr, lport, flags)
 struct inpcb *head;
 struct in_addr faddr, laddr;
 u_short fport, lport;
 int flags;
{
 register struct inpcb *inp, *match = 0;
 int matchwild = 3, wildcard;

 for (inp = head->inp_next; inp != head; inp = inp->inp_next) {
 if (inp->inp_lport != lport)
 continue;
 wildcard = 0;
 if (inp->inp_laddr.s_addr != INADDR_ANY) {
 if (laddr.s_addr == INADDR_ANY)
 wildcard++;
 else if (inp->inp_laddr.s_addr != laddr.s_addr)
 continue;
 } else {
 if (laddr.s_addr != INADDR_ANY)
 wildcard++;
 }
 if (inp->inp_faddr.s_addr != INADDR_ANY) {
 if (faddr.s_addr == INADDR_ANY)
 wildcard++;
 else if (inp->inp_faddr.s_addr != faddr.s_addr 
 inp->inp_fport != fport)
 continue;
 } else {
 if (faddr.s_addr != INADDR_ANY)
 wildcard++;
 }
 if (wildcard && (flags & INPLOOKUP_WILDCARD) == 0)
 continue;
 if (wildcard < matchwild) {
 match = inp;
 matchwild = wildcard;
 if (matchwild == 0)

 break;
 }
 }
 return (match);
}






[LISTING TWO]

From file:/sys/netinet/tcp_input.c

 ...
 /* * Locate pcb for segment. * */
findpcb:
 inp = tcp_last_inpcb;
 /* first, see if this segment looks familiar */
 if (inp->inp_lport != ti->ti_dport 
 inp->inp_fport != ti->ti_sport 
 inp->inp_faddr.s_addr != ti->ti_src.s_addr 
 inp->inp_laddr.s_addr != ti->ti_dst.s_addr) {
 inp = in_pcblookup(&tcb, ti->ti_src, ti->ti_sport,
 ti->ti_dst, ti->ti_dport, INPLOOKUP_WILDCARD);
 if (inp)
 tcp_last_inpcb = inp;
 ++tcppcbcachemiss;
 }
 ...






[LISTING THREE]

From file:/sys/netinet/tcp_input.c:

 ...
/* Header prediction: check for the two common cases of a unidirectional
 * data xfer. If the packet has no control flags, is in-sequence, the window
 * didn't change and we're not retransmitting, it's a candidate. If the length
 * is zero and the ack moved forward, we're the sender side of the xfer. Just
 * free the data acked & wake any higher level process that was blocked
 * waiting for space. If length is non-zero and the ack didn't move, we're the
 * receiver side. If we get packets in-order (the reassembly queue is empty),
 * add data to the socket buffer and note that we need a delayed ack. */
 if (tp->t_state == TCPS_ESTABLISHED &&
 (tiflags & (TH_SYNTH_FINTH_RSTTH_URGTH_ACK)) == TH_ACK &&
 ti->ti_seq == tp->rcv_nxt &&
 ti->ti_win && ti->ti_win == tp->snd_wnd &&
 tp->snd_nxt == tp->snd_max) {
 if (ti->ti_len == 0) {
 if (SEQ_GT(ti->ti_ack, tp->snd_una) &&
 SEQ_LEQ(ti->ti_ack, tp->snd_max) &&
 tp->snd_cwnd >= tp->snd_wnd) {

 /* this is a pure ack for outstanding data. */
 ++tcppredack;
 /* need to update round trip timers */
 if (tp->t_rtt && SEQ_GT(ti->ti_ack,tp->t_rtseq))
 tcp_xmit_timer(tp);
 /* free sent data that was acknowledged */
 acked = ti->ti_ack - tp->snd_una;
 tcpstat.tcps_rcvackpack++;
 tcpstat.tcps_rcvackbyte += acked;
 sbdrop(&so->so_snd, acked);
 tp->snd_una = ti->ti_ack;
 /* free this packet */
 m_freem(m);
 /* If all outstanding data are acked, stop
 * retransmit timer, otherwise restart timer
 * using current (possibly backed-off) value.
 * If process is waiting for space,
 * wakeup/selwakeup/signal. If data are ready
 * to send, let tcp_output decide between more
 * output or persist. */
 if (tp->snd_una == tp->snd_max)
 tp->t_timer[TCPT_REXMT] = 0;
 else if (tp->t_timer[TCPT_PERSIST] == 0)
 tp->t_timer[TCPT_REXMT] = tp->t_rxtcur;
 if (so->so_snd.sb_flags & SB_NOTIFY)
 sowwakeup(so);
 if (so->so_snd.sb_cc)
 (void) tcp_output(tp);
 return;
 }
 } else
 /* This is a pure, in-sequence data packet with nothing on the
 * reassembly queue; we have enough buffer space to take it. */
 if (ti->ti_ack == tp->snd_una &&
 tp->seg_next == (struct tcpiphdr *)tp &&
 ti->ti_len <= sbspace(&so->so_rcv)) {
 /* update received state and statistics */
 tp->rcv_nxt += ti->ti_len;
 ++tcppreddat;
 tcpstat.tcps_rcvpack++;
 tcpstat.tcps_rcvbyte += ti->ti_len;
 /* Deliver data by dropping TCP and IP headers, then
 * add data to application's socket buffer, and wakeup
 * application if necessary. */
 m->m_data += sizeof(struct tcpiphdr);
 m->m_len -= sizeof(struct tcpiphdr);
 sbappend(&so->so_rcv, m);
 sorwakeup(so);
 /* request that a acknowledgement be sent with next
 * data packet outbound -- we delay in hopes of
 * "piggy backing" on top of a data packet.
 tp->t_flags = TF_DELACK;
 return;
 }
 }
 ...


































































August, 1992
COMPILER-SPECIFIC C EXTENSIONS


Borland C++ for DOS systems programming


 This article contains the following executables: TSRPLUS.ARC


Al Stevens


Al is a contributing editor to DDJ and can be contacted at 411 Borel Ave., San
Mateo, CA 94402.


Extensions to the C and C++ languages take many forms, depending on what the
extenders have in mind. The ANSI C committee includes the Numerical C
Extensions Group, whose task it is to define numerical extensions to the
language beyond the ones already built in. Other language extensions support
particular development platforms. For example, Borland's Object Windows
Library uses an extension to C++ class definition that supports the
declaration of a message-response member function.
The Borland C++ compiler includes a number of other extensions to the C
language that support DOS systems programming, that branch of programming that
includes device drivers, memory-resident programs, and other low-level
activities. To illustrate the use of the extensions, I'll describe TSRPLUS, a
public-domain swapping terminate-and-stay-resident (TSR) driver that compiles
with all versions of Borland and Turbo C. It is the resident part of a TSR
application that swaps the memory of the interrupted program for the TSR
application's image, executes the TSR application, and swaps the original
program back in. The source code includes the TSRPLUS driver and a brief
application program to serve as an example. Because of its length, the source
to TSRPLUS is not printed in this issue, but is available electronically under
the filename TSRPLUS.ARC.
Note that this article is not a treatise on portable code. The code that you
write with these techniques is completely nonportable to other computer
architectures and other operating systems. It is mostly nonportable to other
compiler products. And in some rare cases, it is potentially nonportable to
past and future versions of Borland C (BC). When you do systems programming
this close to the hardware and operating system, portability is the least of
your concerns. Similarly, this discussion does not address how TSRs work.
There are a number of good works on this subject. An understanding of what
allows a TSR to pop up and what rules it must obey is helpful here but not
necessary. Where I discuss those issues, it is only to illustrate how I have
used the language extensions of Borland C++ to solve their problems. You can
learn about TSRs in several of my books and in Andrew Schulman's more recent
and comprehensive Undocumented DOS (Addison-Wesley, 1990).
Most of the BC systems-programming extensions have been in the compiler since
the first version of Turbo C. They provide the programmer with close access to
the hardware, BIOS, and operating system. The extensions include register
pseudovariables, inline functions, the interrupt function type, and inline
assembly code. With them, a programmer can avoid most of the assembly language
functions that systems programs normally must call when standard C can neither
reach the hardware nor meet the timing performance requirements of the problem
at hand.
BC and other compilers include functions that support access to BIOS and DOS
in their runtime libraries. These include such functions as int86, bioskey,
getvect, and so on. There is no standard for the types, names, and parameters
of these functions, but the BC library includes versions compatible with their
earlier compilers as well as versions compatible with Microsoft C. Why use
language extensions instead of the library functions? In most cases you can
achieve the same results by using the language extensions, and you will have
smaller, faster code. Sometimes you want to do something for which there is no
library function. Some library functions reference other functions or global
variables that force other object modules to be linked as a side effect, if
not in the latest version of the compiler, then perhaps in a future version.
This can cause the executable code to be larger than it needs to be. Using the
language extensions can bypass such side effects. Some library functions
depend on features provided by the start-up code, such as stack, heap, and
environment variable pointers. Later, we'll replace the startup code to get
the smallest possible program, and we will not use those things. Calling some
library functions from this program would result in unresolved references when
you link.


Register Pseudovariables


BC's register pseudovariables are fixed variables that directly address the
microprocessor's registers. Their names include _AX, _BX, _ES, _FLAGS, and so
on. You can assign an integral value--including address segments and
offsets--to one of these pseudovariables, which puts the value into the
corresponding hardware register. You can use a pseudovariable in an
expression, and its contents are treated as if they came from an unsigned int.
When would you want to use register pseudovariables? A common use is to send
parameters to interrupts and to read the results that interrupts return in
registers. We will do some of that later. But you must be careful. The
compiler assumes not only that you know what you are doing but that you know
what it is doing as well. The compiler itself uses registers in many different
ways. Your use of a register and the compiler's use of the same register must
not conflict. In some cases, the compiler is smart enough to see that you are
using registers and it will avoid their use. For example, the compiler can use
hardware registers for automatic variables. If you use the corresponding
register pseudovariables, the compiler will decide not to use them and will
put the automatic variables on the stack frame or in registers that you do not
use.
In other cases, the compiler is not so smart. Sometimes it seems to get less
so with successive versions of the compiler. For example, the code fragment in
Example 1(a) compiles correctly with Turbo C 2.0, but not with Borland C++
3.0. To see why, we can use the -S command-line option to look at the compiled
assembly language code from the two compilers. Turbo C 2.0 generates the code
shown in Example 1(b), and Borland C++ 3.0 generates that shown in Example
1(c).
Example 1: (a) This code fragment compiles correctly with Turbo C 2.0, but not
with Borland C++ 3.0; (b) code generated by Turbo C 2.0 using the -S
command-line option; (c) code generated by Borland C++ 3.0 using the -S
command line option; (d) code to resolve the problem of the compiler assigning
registers when there is no corresponding mov instruction.

 (a)
 _AX = 123;
 _ES = _DS;

 (b)
 mov ax,123
 push ds
 pop es

 (c)
 mov ax,123
 mov ax,ds
 mov es,ax

 (d)
 _ES = _DS;
 _AX = 123;

In the TC example, the compiler uses a push and pop to assign DS to ES. In the
BC example, the compiler moves DS to ES through AX, the same register to which
you just assigned a value. Your value is overwritten before you have a chance
to use it. The older compiler is no smarter than the newer one with respect to
which pseudovariables you used. It just uses a different technique to assign
registers where the machine language has no corresponding mov instruction. In
this case, the newer technique generates a conflict between your use of the
_AX pseudoregister and the compiler's use of the AX register. Example 1(d)
shows how you can fix the code. Now, both compilers generate correct code.
That does not mean, however, that future versions of the compiler will not
find another way to trip up your use of register pseudovariables. At all
times, use register pseudovariables only when you know exactly what their use
will lead to.


Inline Functions


BC has several macros that generate inline code. With them you can directly
access memory, interrupts, and hardware devices. The compiler generates inline
code for the machine instructions that perform the tasks of the macros.
The geninterrupt macro takes an interrupt number as an argument and executes
the corresponding int machine instruction. You normally use register
pseudovariables in conjunction with this macro. For instance, you can allocate
a block of DOS memory with the code in Example 2. This is an example of how
the BC language extensions generate code as good as you can write in assembly
language. You might suspect that the FLAGS & 1 expression will use the AX
register, thus interfering with the AX assignment that follows, but not so.
The compiler codes a simple JC (jump if carry) opcode for the expression.
Example 2: Allocating a block of DOS memory.


 _AH = 0x48; // Allocate memory function
 _BX = 10; // # of paragraphs to allocate
 geninterrupt (0x21); // call DOS
 if ((_FLAGS & 1) == 0) // test carry bit
 segment = _AX; // segment of the allocated block
 else
 // ... error ...

The inport, inportb, outport, and outportb macros read and write hardware I/O
ports. Their most common use is to access the interrupt controller and read
the keyboard port. If you are using BC to write a device driver for a custom
hardware device, you could make extensive use of these macros.
The enable and disable macros generate the sti and cli machine instructions to
enable and disable interrupts. You will use them whenever you need to suspend
and resume interrupts for any reason. Interrupts are normally enabled when a
program is running. Sometimes you will need to disable interrupts so that you
may do something that cannot be interrupted. For example, anytime you change
the stack segment and pointer registers, you should disable interrupts so that
an interrupt does not occur while the stack integrity is compromised. In
Example 3, the interrupt-enabled condition is controlled by a bit in the FLAGS
register. When an interrupt occurs, it pushes the FLAGS and the CS:IP
registers on the stack, disables interrupts, and writes the contents of the
interrupt vector into the CS:IP registers. That starts the interrupt service
routine (ISR) running with interrupts disabled. The iret instruction from the
ISR pops the flags and the registers, which enables interrupts and returns to
the interrupted location. The result is that most ISRs execute with interrupts
disabled. If you are writing an ISR that involves extensive processing, you
will need to enable interrupts from within the ISR. Pop-up TSR programs are
examples of programs that execute as the result of an interrupt. If you did
not enable interrupts before running the TSR program, it would not be able to
use any of the system services.
Example 3: The interrupt-enabled condition is controlled by a bit in the FLAGS
register.

 disable();
 _SS = oldss; // interrupts must not occur now
 _SP = oldsp; // otherwise SS and SP will be wrong
 enable();

There are macros that allow you to retrieve and write the contents of memory
by direct access to the memory address. They are the peek, poke, peekb, and
pokeb macros. With them you specify the segment and offset of the address. The
peek and peekb macros return the contents of the memory word or byte. The poke
and pokeb macros accept a word- or byte-value parameter that they write into
the memory location. You will use these macros primarily to read and write
video memory and the BIOS data areas. There are other far locations that you
will need to read and write, such as DOS memory blocks, and you will usually
use far pointers or the movedata function to access them.
The getvect and setvect functions read and write the contents of the specified
interrupt vectors. You will use these to hook your ISR to an interrupt and to
chain your ISR to the previous holder of the interrupt vector. Example 4(a)
shows the initialization code for your program. Your ISR, which now executes
when the interrupt occurs, will chain to the old interrupt the way shown in
Example 4(b). It might do the chain first before it does its own processing of
the interrupt; it might do it last; or it might do it only if certain
conditions are satisfied. Some ISRs will not chain the interrupt at all.
Example 4: (a) Initialization code; (b) chaining to the old interrupt; (c)
returning the value of the original interrupt vector.

 (a)
 void interrupt (*oldISR)(void);
 oldISR = getvect(VECTOR);
 setvect(VECTOR, newISR);

 (b)
 (*oldISR)();

 (c)
 setvect(VECTOR, oldISR);

Before your program terminates, it must return the original value to the
interrupt vector using the call shown in Example 4(c).


The Interrupt-function Type


The interrupt-function type tells the compiler to compile the function as an
ISR. An interrupt function assumes that it is being called as the result of an
interrupt, perhaps from an unrelated process. It may assume nothing about the
values of registers. Therefore, upon entry, the interrupt function pushes all
the registers on the stack, initializes the DS register to point to the
program's DGROUP segment, and sets up the stack frame in the BP register. Upon
exit, the interrupt function pops all registers from the stack and executes
the iret machine instruction to return to the interrupted location.
Often an ISR needs to read the values of registers to get its parameters and
needs to return its results in registers. Up to a point, you can read the
parameters with the register pseudovariables. This works only as long as the
compiler does not use the same registers itself. You cannot return values from
the ISR by writing to the register pseudovariables because the interrupt
function pops the original values into the registers just before it returns.
When the interrupt function pushes all the registers, it leaves them on the
stack just as if the function had been called with those registers as
parameters. You can declare the function with the IREGS data type, see Example
5(a), in its parameter list. The interrupt function declaration looks like
Example 5(b). You can read the value of the registers upon entry by reading
the corresponding members of the structure, as in Example 5(c). You can change
the value of registers that will be returned to the caller by changing the
contents of the structure members, as in Example 5(d).
Example 5: (a) Declaring the function with the IREGS data type in its
parameter list; (b) the interrupt-function declaration; (c) reading the value
of the registers upon entry by reading the corresponding members of the
structure; (d) changing the value of registers that will be returned to the
caller by changing the contents of the structure members.

 (a)
 typedef struct {
 int bp,di,si,ds,es,dx,cx,bx,ax,ip,cs,fl;
 } IREGS;

 (b)
 void interrupt newISR(IREGS ir)
 {
 // ...
 }

 (c)
 if (ir.ax == 5)
 // ....


 (d)
 ir.ax = 3; // return 3 in ax
 ir.fl = 1; // return the carry bit on

When the return sequence pops the values back into the registers, these new
values will go from the structure on the stack frame into the machine
registers.
If you are chaining to an old ISR, you must make sure that the registers on
entry are preserved and that the registers on exit are put into the copy of
the structure on the stack frame. If you do any processing before the chain,
you must reset the real registers from the stack frame. Example 6(a)
illustrates how you do that. If the old ISR returns some values in registers
that the caller needs to receive, you must make similar provisions upon the
old ISR's exit, as in Example 6(b).
Example 6: (a) Resetting the real registers from the stack frame; (b) making
similar provisions upon the old ISR's exit; (c) handling the case when oldISR
uses the DS register; preserving the register.

 (a)
 void interrupt newISR(IREGS ir)
 {
 // do some processing that might change registers
 _AX = ir.ax; // restore the registers that the old ISR needs
 _BX = ir.bx;
 (*oldISR)(); // chain to the old isr
 }

 (b)
 void interrupt newISR(IREGS ir)
 {
 (*oldISR)(); // chain to the old isr
 ir.ax = _AX; // caller will get ax
 ir.fl = _FLAGS; // and the flags
 }

 (c)
 void interrupt newISR(IREGS ir)
 {
 void interrupt (*tmpISR)();
 tmpISR = oldISR; // use function pointer on the stack
 _DS = ir.ds; // reset DS
 (*tmpISR)(); // chain to the old isr through tmp ptr
 }

 (d)
 void interrupt newISR(IREGS ir)
 {
 void interrupt (*tmpISR)();
 int oldds = _DS; // save ISR's DS
 tmpISR = oldISR; // use function pointer on the stack
 _DS = ir.ds; // reset DS
 (*tmp ISR)(); // chain to the old isr through tmp ptr
 _DS = oldds; // reset DS
 // ... do further processing
 }

If the old ISR uses the DS register for input, you need an additional step.
The call to oldISR is through a pointer in the data segment of the ISR. If you
change the DS register, the call will fail, and the program will probably
crash. Example 6(c) shows you what you then must do. If your ISR does some
processing after the chained interrupt service returns, you must take steps to
preserve the DS register, as in Example 6(d).
There is one more tricky aspect to all this. Some ISRs, most notably the VGA
BIOS, receive parameters and return values in the BP and DS registers. You
must intercept and chain these ISRs with a separately compiled assembly
language program that maintains its function pointer in the code segment
rather than in the data segment or the stack frame. You should similarly
intercept any chained interrupts where there is no comprehensive standard for
their use. The 0x2f interrupt is an example of such an interrupt.
An interrupt function does not have to be executed as the result of an
interrupt. You can call an interrupt function from within your program. The
compiler generates a pushf instruction followed by the far call to the
function, emulating what happens when an interrupt occurs. You can use this
behavior to advantage. The swapping TSR driver stores the address of an
interrupt function in the TSR's application module. The interrupt function is
the application's pop-up entry point. After swapping the application module
into memory, the TSR driver will execute it by calling the entry code through
the interrupt-function pointer. The interrupt function's entry logic prepares
the function for execution by setting up the registers.


Inline Assembly


BC recognizes the asm keyword, which tells it that the statement is an
assembly language instruction. The compiler inserts the instruction into the
object code. You can use the names of C variables in these instructions, and
the compiler will generate the proper references. You may not use the asm
keyword to declare a variable or access things like DGROUP. As with register
pseudovariables, you must know what you are doing when you use inline assembly
code.


A Swapping TSR Driver



Now we will put these techniques to use. The accompanying program is an
example that uses the principles we have been discussing. It is a swapping TSR
driver program called TSRPLUS. I originally developed its concept in 1989 and
published it in a book called Extending Turbo C Professional. Since then, I
have modified the program several times and used it in many programs. There
are several versions of TSRPLUS, each of which handles the swapping problem
differently. This version installs a 5K TSR driver program that swaps in a
much bigger pop-up application when the user presses the hot key.
A swapping TSR consists of two pieces: the permanently resident TSR driver and
the transient pop-up application image. The resident part watches the keyboard
interrupt for the hot key and manages memory swapping. The transient part does
the job of the pop-up application. The swap file can be any mass storage
medium. If you can use EMS, XMS, or a RAM disk, the swapping operation is
faster. If you use a disk file, the swapping operation will take longer,
depending on the sizes of the pop-up application image and interrupted
program.
The pop-up application image and the swapped-out interrupted program include
copies of the interrupt vectors. This is because an interrupt vector might
have been hooked by and point into the program that you are going to
temporarily replace. If you do not restore the interrupt vectors to a
condition compatible with the pop-up program and the hooked interrupt occurs,
the system will crash.


The Pop-up Application


The pop-up application program is a normal DOS program that does not use any
of the DOS functions from 0 to 12 and does not spawn other programs by calling
DOS. It links with a module that allows it to register itself with the TSR
driver program as a pop-up program. The module supplies the program's main
function. The program must provide three functions: one for any initialization
code that the program needs and that calls the register function; one to be
executed upon pop-up from the hot key; and one to be executed if the user runs
the pop-up program from the command line without the TSR driver being
resident.
Other than those differences, the pop-up application program looks like any
other DOS program. It is an .EXE file that will execute as a command-line
program or that can be a pop-up.
You load the swapping TSR program into memory by running the TSR driver
program. It sets itself up to be a TSR and then calls DOS to execute the
pop-up program. DOS loads the pop-up program into memory just above the TSR
driver. The pop-up program does its initialization and then calls into the TSR
driver to register itself. The registration includes a far pointer to the
application's pop-up entry address. The TSR driver writes the pop-up program's
image and the current interrupt vectors to the swap file. The driver returns
to the pop-up program, which exits, returning to the TSR driver. The TSR
driver terminates, declaring itself resident.
When the user presses the hot key, the TSR driver handles all the tests and
context switching necessary to pop up a TSR. Then it swaps the interrupted
program from memory to the swap device, reads the pop-up application image
into memory, and calls the pop-up application to execute. When the pop-up
application returns to the TSR driver, the driver reads the swapped image of
the interrupted program back into memory and returns to the interrupted
location.


Swapping


There are two ways to swap memory. One way swaps just as much memory as the
pop-up program needs. This method leaves the DOS memory control block chain in
disarray as long as the pop-up program is running. The pop-up program cannot
allocate any DOS memory when you use this technique. The other way swaps the
entire memory-control block chain-out. If you are at the DOS command line when
you press the hot key, very little memory swaps out. If you are running a
large program, a lot of memory swaps out. The pop-up program loads up to and
including its terminating MCB. It can, therefore, allocate and deallocate DOS
memory while it is resident. The example program uses this strategy.
When the pop-up program first runs, it tests to see if the TSR driver is
resident. If so, the pop-up program registers itself in the manner just
described. If not, it can run as a DOS command line program. To the user, the
only difference is the way that the program is executed. This approach allows
the same .EXE file to work in the TSR environment, from the command line, or
in a window of a multitasking environment such as Desqview or Windows.


Memory Organization


The memory organization of a C program and the memory requirements of a TSR
are incompatible. The typical C program consists of all the code, followed by
all the data, followed by the heap and stack. A TSR contains initialization
code and resident code. Ideally , you would release the memory occupied by the
initialization code to DOS when the TSR issues the terminate-and-stay-resident
function call. TSRs written in assembly language can do that with little
trouble. To get the same effect in C requires some manipulation of the startup
code and the order in which things link.
Borland C's startup code does a lot of things for you. It sets up the heap,
the stack, the global variables, the divide-by-zero handler, the pointer to
the environment variables, the arguments to main, the file-handle table, and
the external uninitialized data space. After everything is set up, the startup
code calls your main function. When the main function returns, the startup
code cleans every thing up, tests for null-pointer assignments, and returns to
DOS. In the process of doing all this, the startup code declares some public
variables and procedures and refers to some others from the runtime library.
You get the source code to the startup code with the compiler. It is in a file
named C0.ASM. Most of what it does is not necessary for the TSR-driver
program. The TSR driver does not use a heap, parses its own command-line
arguments, and finds its own environment variables. Therefore, the TSR driver
uses a highly modified version of the startup code, C0T.ASM. The modified
startup code declares the program's starting address, sets up the segment
registers and the global _psp variable, sets the external uninitialized data
space to 0s, and calls the main function. That's all it does, because that is
all the TSR driver needs in the way of startup code.
The TSR executes its initialization code and retains only the resident part
when it becomes a resident program. This is not so easy because the compiler
and linker organize all the code ahead of the data. To separate the
initialization code from the resident code and retain the data segment for the
resident program, you must first change the order in which the linker builds
the segments. You need the stack and data to come ahead of the code. That way
you can truncate the initialization code without losing any of the data space.
The linker determines the order of segments based on the order of their
declaration in the first object file it encounters. Remember that we changed
the startup code. That code will not be the first module seen by the linker
because the startup code is toward the end of the code segment so that its
memory is returned to DOS.
There are two other assembly language modules in the TSR-driver program. One
of them, INT2F.ASM, contains the assembly code for the video and 0x2f ISRs.
This module must be the first one that the linker sees. The code in Example 7
at the front of the module will cause the linker to arrange the segments the
way we want them.
Example 7: The code at the front of the module will cause the linker to
arrange the segments.

 _data segment para public 'data'
 _data ends
 _bss segment word public 'bss'
 _bss ends
 _bssend segment byte public 'bss'
 _bssend ends
 _stack segment stack 'stack'
 _stack ends
 _text segment byte public 'code'
 _text ends

The next problem we run into is that the linker will link all our code
followed by the functions from the runtime library into the code segment. We
need to separate the initialization code from the resident code and get the
runtime-library code loaded between the two parts. We will handle that
operation in the makefile, as shown in Example 8.
Example 8: Using the makefile to separate the initialization code from the
resident code and get the runtime-library code loaded between the two parts.

 tsr.exe : tsr.obj emm.obj xms.obj int2f.obj tsrinit.lib
 tlink /m /s int2f emm xms tsr,tsr.exe,tsr,$(CLIB) tsrinit

 tsrinit.lib : c0t.obj tsrinit.obj emminit.obj xmsinit.obj init.obj
 tlib tsrinit +tsrinit.obj +c0t.obj +emminit.obj +xmsinit.obj +init.obj

The makefile says that the TSR.EXE program depends on the object files that
constitute the resident part of the TSR driver program and the tsrinit.lib
library file. The TSRINIT.LIB file contains the object files for the TSR
driver's initialization code. The TSRINIT.OBJ file must be first, and the
INIT.OBJ file must be last in this library. These files, besides anything else
they might contain, have the addresses of the beginning and end of the
initialization code.
The tlink command links the resident code first, then searches the C runtime
library, and finally searches the TSRINIT library. This sequence arranges the
source modules in the code segment the way we want them.


Operation



You run the TSR driver program from the command line and it loads the swapping
pop-up application. The TSR driver has several command line switches that
modify its behavior. These are: -x, don't use XMS for the swap file; -e, don't
use EMS for the swap file; and +p<path>, the DOS path for the disk swap file.
If you use -x and -e and do not specify a path, the TSR driver will write the
swap file to the subdirectory specified by the TEMP environment variable.
Because of the way the TSR driver program and the pop-up application interact
with memory and the DOS memory-control block, you cannot load the program into
high memory. If you did, the system would probably crash. Rather than allow
that to happen, the TSR driver program tests to see if it is loaded high, as
in Example 9(a). The TSR driver program hooks and chains several interrupts
with the table and macro statements in Example 9(b). This example shows how
the C preprocessor can emulate the C++ inline function to a limited extent.
The program calls the newvectors macro and passes the address of the table
using newvectors(vectors);.
Example 9: (a) Testing to see if the program is loaded high; (b) hooking and
chaining several interrupts with the table and macro statements.

 (a)
 if (_CS > 0xa000) {
 dispstr("\a\r\nCannot loadhigh");
 return;
 }

 (b)
 EXTERN struct vectors {
 int vno;
 void (interrupt **oldvect) (void);
 void (interrupt *newvect) (void);
 } vectors[] = {
 {TIMER, &oldtimer, (void interrupt (*)())newtimer},
 {INT28, &old28, (void interrupt (*)())new28},
 {KYBRD, &oldkb, (void interrupt (*)())newkb},
 {DISK, &olddisk, (void interrupt (*)())newdisk},
 {VIDEO &oldvideo, (void interrupt (*)())newvideo},
 {TSRPLUSINT, &old2f, (void interrupt (*)())new2f},
 {0, NULL, NULL}};

 #define newvectors(vecs) \
 { \
 register struct vectors *vc = vecs; \
 while (vc->vno) { \
 *(vc->oldvect) = getvect(vc->vno); \
 setvect(vc->vno, vc->newvect); \
 vc++; \
 } \
 }

There is an equivalent oldvectors macro that restores interrupt vectors from
the table. The program uses another table to hook and restore the critical
interrupt, break, and ctrl+c interrupt vectors.


Registering the Application


The TSR driver program executes the pop-up application program so that the
pop-up can register itself as a swapped TSR. First the TSR driver program
reduces its own size by calling the DOS 0x4a function to change the size of
the PSP's memory-control block, as shown in Example 10. This step is necessary
because DOS assigns to a program's PSP all available memory, leaving no room
to run the pop-up application. In a normal C program, the startup code takes
care of this reduction, using the sizes of the stack and heap to determine the
size of the program. The TSR driver's startup code does not, however, so the
driver does it here.
Example 10: The TSR driver program reduces its own size by calling the DOS
0x4a function to changes the size of the PSP's memory-control block.

 /* ------ compute program size ------- */
 highmemory = _CS + ((unsigned)&codeend / 16);
 sizeprogram = highmemory - _psp;

 /* ------ adjust MCB for TSRPLUS ------- */
 _ES = _psp;
 _BX = sizeprogram;
 _AX = 0x4a00;
 geninterrupt(DOS);

The program computes its new size by adding the paragraph address of the
codeend label--the last entry in the code segment--to the value in the
code-segment register. That value is the segment address of the top of the
program. By subtracting the address of the PSP from the high address, the
program computes the minimum paragraph size in which it can execute. By
telling DOS to reduce itself to that size, the program assures that the next
program run will load as close to it as possible.
Next, the TSR-driver program uses the DOS 0x4b function to execute the pop-up
application program. The pop-up application program was linked with the
tsrbuild.c code, which calls the TSR-driver program with a 0x2f interrupt to
register itself. It passes the address of a structure that contains its entry
point and PSP address. The TSR-driver program records this information and
makes a copy of the pop-up program's interrupt vectors and program image on
the swap file. It returns to the pop-up application which terminates,
returning control to the TSR-driver program.



Terminating and Staying Resident


Now the TSR-driver program computes a new size for itself a second time. This
size will truncate the initialization code when the TSR driver becomes
resident. The new size is computed as the distance from the PSP address to the
main function, as shown in Example 11(a). The main function is the first
function in the initialization code and is, therefore, at the address of the
top of the resident code. With the size of the resident portion computed, the
TSR can terminate and declare itself resident, as shown in Example 11(b).
Example 11: (a) The TSR driver program computes a new size for itself as the
distance from the PSP address to the main function; (b) with the size of the
resident portion computed, the TSR can terminate and declare itself resident.

 (a)
 sizeprogram = _CS+((unsigned) main >> 4)-_psp;
 if ((unsigned) main % 4)
 sizeprogram++;

 (b)
 _DX = sizeprogram;
 _AX = 0x3100;
 geninterrupt(DOS);

Now that the TSR driver resides in memory and an executable image of the
pop-up application program is stored in the swap file, the system can return
to the DOS command line and run other programs. When the user presses the hot
key, the program can pop up. After the TSR driver is resident, its keyboard
ISR watches the keyboard-input port and the word in the BIOS data space that
stores the current shift key's state, like Example 12 . This interrupt
function uses several of the Borland C extensions. It reads the value of the
keyboard-input port by calling the inportb macro. It reads the BIOS data
area's shift-state mask by calling the peekb macro. It resets the keyboard
hardware and the interrupt controller by calling the outportb macro. And it
chains to the old keyboard ISR by calling it through an interrupt-function
pointer. This function does not pop the application program up. It simply sets
a flag that says the user pressed the hot key. The hooked timer and 0x28 ISRs
pop up the program when the hot-key indicator is set and DOS is in a safe and
stable condition for a pop-up.
Example 12: After the TSR driver is resident, its keyboard ISR watches the
keyboard-input port and the word in the BIOS data space that stores the
current shift key's state.

 /* ----- keyboard ISR ------ */
 static void interrupt newkb(void)
 {
 unsigned char kbval = inportb(0x60);
 if (!hotkeyhit && !running) {
 if (Keymask && (peekb(0, 0x417) & 0xf) == Keymask)
 if (Scancode == 0 Scancode == kbval)
 hotkeyhit = TRUE;
 if (hotkeyhit) {
 /* --- reset the keyboard ---- */
 kbval = inportb(0x61);
 outportb(0x61, kbval 0x80);
 outportb(0x61, kbval);
 outportb(0x20, 0x20);
 return;
 }
 }
 (*oldkb)();
 }

After it is safe to pop up, the TSR-driver program switches the DOS context
from the interrupted program to itself. First it changes the stack. The
interrupted program might not have a deep enough stack, and the swap will
probably write over that space anyway. Changing the stack is simple. The
initialization code saved the stack's segment and pointer register values
before the TSR-driver program became resident. The code in Example 13 saves
the interrupted stack location and changes it to the TSR driver's stack. It is
not necessary to disable interrupts to protect the stack's integrity.
Interrupts are currently disabled because the program is operating from within
an ISR. The TSR driver saves and resets the DTA, PSP, and Ctrl-Break setting.
Then it starts the swap sequence.
Example 13: Save the interrupted stack location and change it to the TSR
driver's stack.

 intsp = _SP;
 intss = _SS;
 _SP = tsrsp;
 _SS = tsrss;

To swap the interrupted program and the pop-up application program, the TSR
driver uses this sequence: It writes the interrupted program's memory to the
swap file. Then it writes the system-interrupt vector table to the swap file.
Next it reads the pop-up application program's interrupt vector table. Finally
it reads the pop-up application program's program memory. This sequence is
critical. The swapping input/output operations could enable interrupts. If the
new code is in memory along with the old interrupt vectors, an unexpected
interrupt could jump to the wrong place.
After the pop-up application program is swapped into memory, the TSR-driver
program executes it by calling through the far interrupt pointer that holds
the address of the application's entry location. That code is in the source
file tsrbuild.c that the application links with. It does a similar context
switch between the stacks, DTAs, and PSPs of itself and the TSR-driver
program, and then it executes its application. When the application returns,
the program switches the context back and returns to the TSR-driver program.
When the pop-up application returns to the TSR-driver program, the driver
swaps the interrupted program back in, swapping the code first and then the
interrupt vectors. Once again, this sequence is critical. The code for the
interrupted program's ISRs must be in place before any interrupt vectors point
to it. Next, the TSR driver switches the context of the PSP, DTA, and the
stack, and returns to the interrupted program.


Unloading the TSR


The pop-up application program decides when to unload itself. It calls the TSR
driver through the 0x2f interrupt with a function code that says the driver
should unload itself. The driver sets a flag. When the pop-up application
returns to pop down and after the driver has swapped the interrupted program
back in, the driver tests to see if it can unload itself. That test makes sure
that all the interrupt vectors are the same as they were when the TSR driver
declared itself resident, and that there is no program loaded above the TSR
driver in memory. In other words, the TSR-driver program can unload only when
the TSR was popped up from the DOS command line and only when no other TSR
programs are loaded above it. The test uses the peek and peekb macros to walk
the DOS memory-control block chain and uses getvect to compare the contents of
the interrupt vectors with their earlier values.


When Assembly is Needed



Sometimes you cannot get by without some assembly language. We've already
discussed the startup code. There are two other assembly language modules that
the TSR driver uses. You saw how the INT2F module controls the placement of
segments. It also provides the ISRs for the 0x10 and 0x2f interrupts. These
interrupts use registers in ways incompatible with the interrupt function. The
assembly language ISRs chain to the old ISRs without disturbing register
integrity.
The INIT module provides the _code-end variable, and it provides a function
that copies the chained 0x10 and 0x2f interrupt vector contents into
interrupt-function pointers in the code segment's address space. This allows
the TSR driver's assembly language ISRs to chain without needing data-segment
references to the pointers.


Conclusion


You debug the pop-up application program as a DOS command-line program by
using the source-level debugger. I debugged the TSR driver by using Turbo
Debugger with the load and execute of the overlaid pop-up application stubbed
out. I debugged those parts in the old way--by displaying messages on the
screen at critical places in the program. It was slow, but sometimes that's
the best or only way.
Finally, the TSRPLUS source code includes an example pop-up application
program. It is a simple D-Flat application, which means that it uses the
user-interface library that I have been publishing in Dr. Dobb's Journal for
the past year. Its purpose is to show you how to use the TSRPLUS driver. It
doesn't do anything other than pop up and down and let you remove it from
memory with a menu selection. Link it with the D-Flat library, version 12 or
greater.


_COMPILER-SPECIFIC C EXTENSIONS_
by Al Stevens

Example 1:

(a)
 _AX = 123;
 _ES = _DS;

(b)
 mov ax,123
 push ds
 pop es

(c)
 mov ax,123
 mov ax,ds
 mov es,ax

(d)
 _ES = _DS;
 _AX = 123;

Example 2:

 _AH = 0x48; // Allocate memory function
 _BX = 10; // # of paragraphs to allocate
 geninterrupt(0x21); // call DOS
 if ((_FLAGS & 1) == 0) // test carry bit
 segment = _AX; // segment of the allocated block
 else
 // ... error ...

Example 3:

 disable();
 _SS = oldss; // interrupts must not occur now
 _SP = oldsp; // otherwise SS and SP will be wrong
 enable();

Example 4:

(a)
 void interrupt (*oldISR)(void);
 oldISR = getvect(VECTOR);
 setvect(VECTOR, newISR);


(b)
 (*oldISR)();

(c)
 setvect(VECTOR, oldISR);

Example 5:

(a)

 typedef struct {
 int bp,di,si,ds,es,dx,cx,bx,ax,ip,cs,fl;
 } IREGS;



(b)
 void interrupt newISR(IREGS ir)
 {
 // ...
 }


(c)

 if (ir.ax == 5)
 // ....



(d)
 ir.ax = 3; // return 3 in ax
 ir.fl = 1; // return the carry bit on


Example 6:

(a)

 void interrupt newISR(IREGS ir)
 {
 // do some processing that might change registers
 _AX = ir.ax; // restore the registers that the old ISR needs
 _BX = ir.bx;
 (*oldISR)(); // chain to the old isr
 }



(b)

 void interrupt newISR(IREGS ir)
 {
 (*oldISR)(); // chain to the old isr
 ir.ax = _AX; // caller will get ax
 ir.fl = _FLAGS; // and the flags
 }



(c)

 void interrupt newISR(IREGS ir)
 {
 void interrupt (*tmpISR)();
 tmpISR = oldISR; // use function pointer on the stack
 _DS = ir.ds; // reset DS
 (*tmpISR)(); // chain to the old isr through tmp ptr
 }



(d)

 void interrupt newISR(IREGS ir)
 {
 void interrupt (*tmpISR)();
 int oldds = _DS; // save ISR's DS
 tmpISR = oldISR; // use function pointer on the stack
 _DS = ir.ds; // reset DS
 (*tmpISR)(); // chain to the old isr through tmp ptr
 _DS = oldds; // reset DS
 // ... do further processing
 }

Example 7:

 _data segment para public 'data'
 _data ends
 _bss segment word public 'bss'
 _bss ends
 _bssend segment byte public 'bss'
 _bssend ends
 _stack segment stack 'stack'
 _stack ends
 _text segment byte public 'code'
 _text ends

Example 8:


 tsr.exe : tsr.obj emm.obj xms.obj int2f.obj tsrinit.lib
 tlink /m /s int2f emm xms tsr,tsr.exe,tsr,$(CLIB) tsrinit

 tsrinit.lib : c0t.obj tsrinit.obj emminit.obj xmsinit.obj init.obj
 tlib tsrinit +tsrinit.obj +c0t.obj +emminit.obj +xmsinit.obj +init.obj


Example 9:

(a)

 if (_CS > 0xa000) {
 dispstr("\a\r\nCannot loadhigh");
 return;
 }




(b)

 EXTERN struct vectors {
 int vno;
 void (interrupt **oldvect)(void);
 void (interrupt *newvect)(void);
 } vectors[] = {
 {TIMER, &oldtimer, (void interrupt (*)())newtimer},
 {INT28, &old28, (void interrupt (*)())new28},
 {KYBRD, &oldkb, (void interrupt (*)())newkb},
 {DISK, &olddisk, (void interrupt (*)())newdisk},
 {VIDEO, &oldvideo, (void interrupt (*)())newvideo},
 {TSRPLUSINT, &old2f, (void interrupt (*)())new2f},
 {0, NULL, NULL}};

 #define newvectors(vecs) \
 { \
 register struct vectors *vc = vecs; \
 while (vc->vno) { \
 *(vc->oldvect) = getvect(vc->vno); \
 setvect(vc->vno, vc->newvect); \
 vc++; \
 } \
 }

Example 10:

 /* ------ compute program size ------- */
 highmemory = _CS + ((unsigned)&codeend / 16);
 sizeprogram = highmemory - _psp;

 /* ------ adjust MCB for TSRPLUS ------- */
 _ES = _psp;
 _BX = sizeprogram;
 _AX = 0x4a00;
 geninterrupt(DOS);

Example 11:

(a)

 sizeprogram = _CS+((unsigned) main >> 4)-_psp;
 if ((unsigned) main % 4)
 sizeprogram++;


(b)


 _DX = sizeprogram;
 _AX = 0x3100;
 geninterrupt(DOS);


Example 12:

 /* ----- keyboard ISR ------ */
 static void interrupt newkb(void)

 {
 unsigned char kbval = inportb(0x60);
 if (!hotkeyhit && !running) {
 if (Keymask && (peekb(0, 0x417) & 0xf) == Keymask)
 if (Scancode == 0 Scancode == kbval)
 hotkeyhit = TRUE;
 if (hotkeyhit) {
 /* --- reset the keyboard ---- */
 kbval = inportb(0x61);
 outportb(0x61, kbval 0x80);
 outportb(0x61, kbval);
 outportb(0x20, 0x20);
 return;
 }
 }
 (*oldkb)();
 }


Example 13:

 intsp = _SP;
 intss = _SS;
 _SP = tsrsp;
 _SS = tsrss;





































August, 1992
PARALLEL C EXTENSIONS


Parallelized code retains its serial structure




Barr E. Bauer


Barr uses high-performance computers to design pharmaceuticals for
Schering-Plough Research Institute. He can be reached at 60 Orange Street
B1-3-85, Bloomfield, NJ 07003.


Silicon Graphics has implemented parallel extensions to the C compiler for the
4D Power Series computers that enable both SIMD (loop) and MIMD (independent
block) parallelization. Code containing these extensions can be ported into
serial C environments, recompiled, and executed without change. In direct
contrast to older methods of parallelization that generally involved heavy use
of platform-specific system calls combined with code restructuring,
parallelized code using IRIS Power C retains its serial structure. In this
article, I will introduce you to the basics of parallelizing C code with IRIS
Power C, discuss data dependence as an important limitation, and end with two
program examples to convince you that the phrase "portable parallel C" is not
an oxymoron.


Parallelization


I'll start this discussion by describing in general terms how SGI does
parallelism on a shared-memory multiprocessor workstation. A parallelized
program is a serial program that contains regions of parallel execution. On
entering a parallel region, the serial program is transformed from a single
process to multiple processes, each running one of the parallel-execution
threads. These regions and their parallel-execution behavior are determined by
#pragma directives included in the code and interpreted by the compiler with
the -mp option; otherwise they are ignored.
At the system level, changes have been made to UNIX to support process-based
parallelism on one hand and multiple processors on the other, while
maintaining system generality. A parallel program's processes (threads) are
allowed access to shared memory. They maintain data structures that
transparently coordinate interprocess data access and process synchronization.
Otherwise, they behave like other processes in the scheduler's queue. Parallel
execution itself is done in sched, the UNIX preemptive-multitasking scheduler,
which has been modified to distribute processes over multiple processors
rather than the usual one. A parallelized program's threads timeshare with
other tasks run by the system, either parallel or serial, resulting in a
flexible execution environment. On an eight-processor system, this environment
can execute eight serial programs in the time of one, a single parallelized
program up to eight times faster, or a mix of serial and parallel programs
significantly faster.


Parallel Models


SGI implements the SIMD and MIMD parallel models. SIMD is the easiest to
understand and implement. In a for loop that processes members of an array,
all threads execute the same code on different array elements. The individual
iterations of the loop are divided between the available threads and executed
on them. For instance, if four threads are available, a 1000-iteration loop is
parallelized by executing iterations 0-249 by thread 0, 250-499 by thread 1,
and so forth. If each iteration has the same execution time, a four-thread
system can be close to four times faster than serial execution. IRIS Power C
currently limits loop parallelization to the for-loop construct because it is
the only loop construct within C which lets you define an index, a maximum
number of iterations, and an increment.
MIMD is implemented as independent-block parallelism. A block of code within a
function can be executed on its own thread, independent of other blocks. In
contrast to loop parallelization, the code in each block may operate
differently on global and local (private) data. Program speedups are less
easily determined, being highly dependent on the nature of the code.
Implementation of MIMD parallelization is limited by the lack of a suitable
programming landmark, such as a loop, to attract attention.
The two models can be mixed and matched in a program either in separate
parallel regions or even within a single parallel region. Two threads might
execute independent blocks of code using the MIMD model, while six threads
execute a for loop using the SIMD model. The limiting factor seems to be
keeping the program structure straight. To avoid confusion, I use one model
per parallel region.


Parallel Data Types and Data Dependence


Variables within code executed in parallel are either shared between threads
or local to a specific thread. Shared means just that: Each execution thread
points to a single location in shared memory associated with that variable.
The values within the variable persist past the parallel region of the
program. Unless access to the code containing the shared variable is
restricted, all threads can read or write to the variable. Shared array
variables are common in loop parallelism.
Local variables within the thread point only to a private copy of the
variable. The local variable exists in all threads created within the parallel
region. The values within a local variable are not defined beyond the parallel
region. Local variables are used exclusively within the parallel region and
can be of any data type. Loop indexes and temporary variables are often local.
For parallelism to work correctly, the code in each thread must not modify the
values of variables in use by other threads. If this happens, the variable
becomes dependent on more than one thread, its value becomes unpredictable,
and the results possibly incorrect. An example in SIMD parallelization is when
each iteration processes a unique array element, such that the iteration's
execution order does not affect the results: Iteration i processes shared
variable a[i], iteration i+1 does a[i+1], and so forth.
Two specific situations lead to dependencies: modification of shared scalar
variables during parallel execution, and recurrence. (There are other
situations, but these two are the most troublesome.) The shared
scalar-variable sum appearing in the sum reduction (see Example 1) might have
thread 0 reading one value of sum, while thread 1 writes a new value, and the
like. The value of sum is not systematically determined, leading to wrong
results. Controlling thread access to the statement containing the dependence
will prevent this type of dependence, which can occur in both loop and block
parallel constructs.
Example 1: Shared scalar-variable sum.

 for (sum=i=0; i<max; i++)
 sum += a[i];

Recurrence happens when one iteration of a loop depends on a previous
iteration; see Example 2. There is no good way to correct it besides rewriting
the code or running the loop in serial. Generally, the i - 1 (or i + 1)
indexing identifies a recurrence.
Example 2: Typical case of recurrence, where one iteration of a loop depends
on the previous iteration.

 for (i=1; i<max; i++)
 a[i] = b[i] * a[i-1];

To test for dependence in loop parallelization, index the for loop backwards
during serial execution. If the result is correct, there are probably no
dependencies.


Fundamentals


You introduce parallelism into a program in two steps. First, define the
region in the source code in which parallelism will be active and declare the
parallel behavior of all variables for the region. Second, select for loops or
blocks located within the parallel region for parallelization. The first step
establishes the environment and the second speeds up execution. Multiple for
loops, independent blocks, or combinations can be established within a single
parallel region, and a program can have as many parallel regions as necessary.

The most time-consuming part of the program is the best place to introduce the
parallel region. Speedup of this block will generally result in a dramatic
improvement in overall program performance, even if it is a relatively small
amount of code. Parallelization of less-sensitive blocks may yield little if
any performance return. Profiling serial execution locates the desired "hot
spots."
The format of a typical parallel region is shown in Example 3. Place the
parallel-region code into a code block preceded by #pragma parallel and its
modifiers. The main directive and its modifiers can be spread out or placed on
one line. I prefer the spread-out version because it handles long variable
lists better.
Example 3: Format of a typical parallel region.

 Code Block

 becomes

 #pragma parallel
 #pragma shared (arrayvar1, arrayvar2..., grandtotal)
 #pragma local (index1, index2, subtotal1,...)
 #pragma byvalue (shared_constant1,...)
 {
 Code Block
 }
 or, the equivalent

 #pragma parallel shared (list) local (list) byvalue (list)
 {
 Code Block
 }

Declare the behavior of variables within the parallel region with the
modifiers to #pragma parallel. The lists consist of the variables of any given
type, separated by commas. The byvalue modifier is a special shared type used
for scalars that maintain constant value within the parallel region, such as a
for loop bounds limit, yielding a net savings on overhead. All values not
declared are assumed to be shared, but fully declaring all variables helps
clarify intent.
The parallel region treats code outside the loop or block parallel constructs
as local to each thread. Local code is duplicated in and executed by all
threads. Local code is useful, but can result in incorrect execution. For
instance, the for loop in Example 4 is part of the local code resulting in sum
= 50000 four times using four threads, which may not be what you intended.
Also, the value of i outside the parallel region is not defined.
Example 4: Local code can result in unexpected execution.

 /* strictly local code */
 #pragma parallel
 #pragma local (i, j)
 {
 for (i=sum=0; i<1000; i++)
 sum += i;
 printf ("sum = %d\n", sum);
 }



The pfor Block


Within the parallel region, one or more for loops can be set for loop
parallelization. Only those for loops in which the exact number of iterations
is known at compile time can be parallelized. To take advantage of loop
parallelization, while and do while loops must be rewritten into for loops, if
possible.
The general format for parallelizing a simple for loop is shown in Example 5.
The #pragma pfor directive precedes the unchanged for loop placed within a
code block. The pfor directive takes the mandatory modifier iterate, which
identifies the index variable, the starting value of the index, the maximum
number of iterations, and the value of the increment. Note that iterate syntax
looks like, but is not the same as the for statement syntax.
Example 5: Parallelizing a simple for loop.

 for (i=0; i<max; i++)
 b[i] = const*a [i];

 becomes

 #pragma parallel shared (a,b) local (i) byvalue (max, const)
 {
 #pragma pfor iterate (i=0; max; 1)
 for (i=0; i<max; i++)
 b[i] = const*a [i];
 }

The above transformations can be ported from the Power Series environment to
other serial environments. In a serial environment, the compiler ignores the
two pragmas (typically with warning messages), as well as the additional code
block created by the added braces, and compiles the original for loop. The
resulting code executes as expected without removal of the parallelism
directives. The compiler on a Power Series machine, with multiprocessing (-mp)
active, transformed the code for parallel execution.



The Independent Block


Block parallelization allows for parallelization of C constructs like
linked-list manipulation and other operations not amenable to loop
parallelization. The principal weakness is a lack of an attention-drawing
landmark, such as a for loop, to provide hints as to the parallelization
approach.
Independent blocks are created by placing the selected block code within a
code block preceded by #pragma independent. Each independent block created
executes on a single thread. Any number of blocks can be created in a single
parallel region. If the available threads outnumber the blocks, each executes
on its own thread and the remaining threads move on to the next parallel
construct in the parallel region. If blocks outnumber threads, then the
independent blocks execute as threads become available. If a pfor block
follows a group of independent blocks in a single parallel region, the
execution behavior can potentially become complex.
In Example 6, code blocks 1 and 2 will execute simultaneously, while any
additional threads, such as would be found on a four-processor machine, will
idle.
Example 6: Code blocks 1 and 2 will execute simultaneously. Any additional
threads will idle.

 code block 1
 code block 2

 becomes

 #pragma parallel shared (list) local (list) byvalue (list)
 {

 #pragma independent
 {
 code block 1
 }
 #pragma independent
 {
 code block 2
 }
 }

Independent-block parallelization is also portable. A serial compiler ignores
the pragmas (again with warning messages) and the added code blocks created by
the braces and processes the original two code blocks. The serial execution
behavior remains block 1, then block 2. Block order is important for
maintaining portability: If, for instance, block 1 adds elements to a linked
list, and block 2 removes elements off the same linked list, parallel
execution may be correct regardless of the starting order. However, serial
behavior will be wrong if block 2 tries to remove elements from a linked list
not yet filled with elements by block 1.


Critical Blocks


Shared scalar variables present a problem when used within a parallel region.
Any thread can write to the scalar in any order. This behavior can be
particularly chaotic for sum-reduction loops. For instance, the parallelized
loop in Example 7 will execute incorrectly. One thread could be reading the
value of sum while others are writing, making the value of sum unpredictable.
Example 7: Using a shared scalar variable within a parallel region.

 sum = 0.0;
 #pragma parallel shared (sum, a) local (i) byvalue (max)
 {
 #pragma pfor iterate (i=0; max; 1)
 for (i=0; i <max; i++)
 sum += a[i];
 }

Isolate data dependencies by placing the statement containing the dependence
within a code block preceded by #pragma critical. Only one thread at a time
has access to the code within the critical block; threads that arrive at the
critical block while it is occupied wait until it is free. The code within the
critical block is ultimately executed by all threads.
Portability for the critical block into the serial environment is maintained.
The serial compiler ignores both #pragma critical and the additional code
blocking and sees only the simple for loop conducting a summation. Critical
blocks make it possible to parallelize code afflicted with dependencies. This
mechanism can also be applied to independent blocks, local code, and mixed
constructs.
Applications such as a sum reduction containing a critical block may perform
poorly if more than one thread waits during execution, as is often the case
for simple code. The problem and an effective cure are presented in Listings
One and Two (page 124).


Examples


The first case is an expanded version of the sum-reduction example that
demonstrates how to both manage a dependence and use local code to improve
performance. These are "before" (Listing One) and "after" (Listing Two)
examples, both of which fill arrays that are then used in a sum reduction. The
kth loop takes the place of real code and lengthens the execution time of each
thread enough to overcome parallelization overhead. In Listing One, the
dependence is isolated within a critical block located within the deepest
(jth) loop nest. In Listing Two, the reduction was rewritten to determine a
dependence-free subtotal for each thread, followed by a dependent grand
summation in local code isolated within a critical block. Table 1 shows the
effect on performance of moving the critical block out of the innermost loop
and into local code on a four-processor 4D/240.
Table 1: Sum-reduction execution results.

 Case Execution times % CPU Speedup Sum
 user sys total utilization
 -------------------------------------------------------------------------

 1. Example 1

 (serial) 73.8 3.4 77 100 1.00 6039797.7653373638
 2. Example 1
 (parallel) 147.9 5.0 46 326 1.67 6039797.7643598625
 3. Example 2
 (serial) 74.0 3.2 77 99 1.00 6039797.7653373638
 4. Example 2
 (parallel) 77.7 3.4 27 295 2.85 6039797.7641029358

Each example was compiled without change into serial and parallel versions.
Both parallel versions exhibit speedups over the serial versions, with close
but not identical results (read on, please). Moving the critical block out of
the pfor block results in a substantially faster parallel version of Listing
Tw. The high CPU utilization relative to an actual speedup in the parallel
version of Listing One reveals the time threads spent in contention for the
critical block. There is a small discrepancy in the computed sum between the
parallel and serial versions. This is a result of different summation
sequences, which produce different round-off error accumulations. Round-off
error is a problem for parallel reductions; precision-sensitive calculations
cannot be done in parallel if identical results between parallel and serial
calculations are required--a limitation to portability.
Listing Three, page 124, shows a more elaborate use of independent blocks to
process a linked list in the producer-consumer problem. In this example, a
single producer fills a pipe (FIFO queue) with "products" consumed by multiple
"consumers" running in their own independent blocks. The product is a data
structure. In this case, it holds an int, but it could hold more complicated
data. Critical blocks limit access to the root pointer of the linked-list to
one producer/consumer(s) at a time. The producer and consumers all begin at
the start of the simulation. The simulation ends when the consumers each
receive a stop token sent with the product. The producer loads three tokens
into the queue before quitting. Use of stop tokens instead of a null pointer
to terminate the simulation allows the pipe to occasionally run dry when, for
instance, the consumers are faster than the producer.
The execution results of Listing Three are shown in Table 2. The execution
threads were varied while retaining the four independent blocks. Execution
results of the serial version are identical to those of the single-thread
version. Note that even the two-thread case (one active consumer, the other
two wait) achieves substantial speedups relative to the serial version,
although performance substantially improves with more active consumers.
Table 2: Execution results for the producer-consumer problem.

 Employed Threads
 4 3 2 1
 --------------------------------------------------------

 Consumer 1 3,333 3,863 5,913 10,000
 Distribution* Consumer 2 3,333 3,863 4,087
 Consumer 3 3,334 2,274
 List depth* 6,001 6,826 8,178 10,000
 Time (s) 26 31 47 94
 CPU utilization (%) 362 294 198 99
 Speedup (x) 3.62 3.03 2.00 1.00

 *Units are "products."

Independent block ordering is critical for successful portability. The
producer (the first independent block) completely fills the queue, then quits.
The first consumer (the second independent block) then consumes all the
products in the queue, then quits. The remaining two consumers are left only
with their stop tokens.
Listing Three is an example of heteroparallelism, the most complicated form of
parallelism, portable into the serial environment. Program structure remains
clear, and parallel performance gains are substantial. Nonobviousness will be
the main barrier to implementing block parallelization.
All three examples compile in the CONVEX C200 and 386-PC environments. All
executed correctly as serial programs with the exception of Listings One and
Two on the PC, which lacked 38 Mbytes of available memory.


Additional Reading


Bauer, B.E. Practical Parallel Programming. San Diego, CA: Academic Press,
1992.


_PARALLEL C EXTENSIONS_
by Barr E. Bauer


[LISTING ONE]

/* Loop parallelism sum reduction critical block within loop nest
 * B. E. Bauer 1992 -- compiled using the following on the IRIS:
 * parallel: cc -O2 -o example1p -mp example1.c
 * serial: cc -O2 -o example1 example1.c */
#include <stdio.h>
#define MAX 1536
double a[MAX][MAX], b[MAX][MAX];
main()
{
 int i, j, k, k1;
 double sum=0.0, temp;
 /* shared arrays filled; values not important, just order */
 for (i=0; i<MAX; i++) {
 for (j=0; j<MAX; j++) {
 a[i][j] = (double)i;

 b[i][j] = (double)j;
 }
 }
 k1 = 256; /* inner loop delay factor */
 /* start of parallel region */
 #pragma parallel shared(a,b,sum) local(i,j,k,temp) byvalue(k1)
 {
 /* pfor block */
 #pragma pfor iterate(i=0; MAX; 1)
 for (i=0; i<MAX; i++) {
 for (j=0; j<MAX; j++) {
 temp = a[i][j]-b[i][j];
 for (k=0; k<k1; k++) /* lengthen execution time */
 temp += 0.01;
 #pragma critical
 {
 sum += temp; /* summation in inner loop */
 }
 }
 }
 }
 printf("\nDone! sum = %24.15f\n", sum);
}






[LISTING TWO]

/* Loop parallelism sum reduction critical block moved into local code
 * B. E. Bauer 1992 -- compiled using the following on the IRIS:
 * parallel: cc -O2 -o example2p -mp example2.c
 * serial: cc -O2 -o example2 example2.c */

#include <stdio.h>
#define MAX 1536
double a[MAX][MAX], b[MAX][MAX];
main()
{
 int i, j, k, k1;
 double sum=0.0, temp, st; /* st = local code subtotal */
 /* shared arrays filled; values not important, just order */
 for (i=0; i<MAX; i++) {
 for (j=0; j<MAX; j++) {
 a[i][j] = (double)i;
 b[i][j] = (double)j;
 }
 }
 k1 = 256; /* inner loop delay factor */
 /* start of parallel region */
 #pragma parallel shared(a,b,sum) local(i,j,k,temp,st) byvalue(k1)
 {
 st = 0.0; /* st initialized in each thread */
 /* pfor block */
 #pragma pfor iterate(i=0; MAX; 1)
 for (i=0; i<MAX; i++) {
 for (j=0; j<MAX; j++) {

 temp = a[i][j]-b[i][j];
 for (k=0; k<k1; k++) /* lengthens execution time */
 temp += 0.01;
 st += temp;
 }
 }
 /* local code grand summation */
 #pragma critical
 {
 sum += st;
 }
 }
 printf("\nDone! sum = %24.15f\n", sum);
}





[LISTING THREE]

/* Block parallelism, producer-consumer problem
 * B. E. Bauer 1992 -- Taken from "Practical Parallel Programming" pp 214-225
 * (C) 1992 Academic Press, Inc. All Rights Reserved. Used with permission. */

#include <stdio.h>
#define MAXCONSUMERS 3
#define MAXPRODUCTS 10000
#define STOP_TOKEN -1
#define NULL_ITEM (node_t *)NULL
/* the product is defined here */
struct node
{
 int productid;
 struct node *next;
};
typedef struct node node_t; /* the typedef is convenient */
node_t *root = NULL_ITEM; /* linked list root */
int max_depth = 0;
/* ----- create a new "product" -------------------------------- */
node_t *make_product(node_t *item, int prodid)
{
 item = (node_t *)malloc(sizeof(node_t));
 if (item != NULL_ITEM) {
 item->productid = prodid;
 item->next = NULL_ITEM;
 }
 else
 printf("problems with production, boss...\n");
 return (item);
}
/* ----- add a "product" to the end of the list ---------------- */
node_t *ship_product(node_t *new)
{
 int cur_depth = 0;
 node_t *list = root;

 if (root == NULL_ITEM)
 root = new;

 else
 {
 while (list->next != NULL_ITEM)
 {
 list = list->next;
 cur_depth++;
 }
 cur_depth++;
 list->next = new;
 }
 if (cur_depth > max_depth)
 max_depth = cur_depth;
}
/* ----- pop the product off the beginning of the list --------- */
node_t *receive_product()
{
 node_t *item = root;
 if (root != NULL_ITEM)
 root = root->next;
 return(item);
}
/* ----- consume the product by freeing its memory ------------- */
void consume_product(node_t *item)
{
 if (item != NULL_ITEM)
 free (item);
}
/* ----- the producer ------------------------------------------ */
void producer(int cnt)
{
 node_t *temp;
 int i;
 /* make "products" cnt times (limits simulation) */
 for (i=0; i<=cnt; i++)
 {
 if ((temp = make_product(temp,i)) == NULL_ITEM)
 break;
 #pragma critical
 {
 ship_product(temp);
 }
 }
 /* load on stop tokens */
 for (i=0; i<MAXCONSUMERS; i++)
 {
 if ((temp = make_product(temp, STOP_TOKEN)) == NULL_ITEM)
 break;
 #pragma critical
 {
 ship_product(temp);
 }
 }
 printf("producer calls it quits\n");
}
/* ----- the consumer (created 3 times) ------------------------ */
long consumer(int myid)
{
 /* local variables */
 int consuming = 1, j;

 long consumed = 0; /* count of "products" */
 node_t *item;
 double temp;
 printf("consumer %d starts the shopping day\n", myid);
 while (consuming) /* loop until stop token seen */
 {
 for (j=0; j<32000; j++)
 temp *= (double)j;

 #pragma critical
 {
 item = receive_product();
 }
 if (item != NULL_ITEM)
 {
 if (item->productid != STOP_TOKEN)
 {
 if (item->productid)
 ++consumed;
 printf("consumer %d consuming product %d\n",
 myid, item->productid);
 }
 else
 {
 consuming = 0;
 printf("consumer %d ran out of products\n", myid);
 }
 consume_product(item);
 }
 }
 printf("consumer %d consumed %d products\n", myid, consumed);
 return(consumed);
}
main()
{
 long total, sum[MAXCONSUMERS];
 int i;
 printf("business starts with %d products\n",MAXPRODUCTS);
 #pragma parallel shared(sum)
 {
 #pragma independent
 {
 producer(MAXPRODUCTS); /* start producer */
 }
 #pragma independent
 {
 sum[0] = consumer(1); /* start consumer 1 */
 }
 #pragma independent
 {
 sum[1] = consumer(2); /* start consumer 2 */
 }
 #pragma independent
 {
 sum[2] = consumer(3); /* start consumer 3 */
 }
 }
 for (i=0, total = 0L; i<MAXCONSUMERS; i++)
 total += sum[i]; /* sum up the products consumed by each */

 printf("business over, %d products exchanged\n", total);
 printf("maximum products in list = %d\n", max_depth-MAXCONSUMERS);
}


Example 1:

(a)

for (sum=i=0; i<max; i++)
 sum += a[i];

(b)


for (i=1; i<max; i++)
 a[i] = b[i] * a[i-1];


Example 2:

(a)

Code Block becomes

#pragma parallel
#pragma shared (arrayvar1, arrayvar2..., grandtotal)
#pragma local (index1, index2, subtotal1,...)
#pragma byvalue(shared_constant1,...)
{
 Code Block
}

or, the equivalent

#pragma parallel shared(list) local(list) byvalue(list)
{
 Code Block
}


(b)

/* strictly local code */
#pragma parallel
#pragma local(i, j)
{
 for (i=sum=0; i<1000; i++)
 sum += i;
 printf("sum = %d\n", sum);
}


Example 3:

for (i=0; i<max; i++)
 b[i] = const*a[i];

becomes


#pragma parallel shared(a,b) local(i) byvalue(max,const)
{
 #pragma pfor iterate(i=0; max; 1)
 for (i=0; i<max; i++)
 b[i] = const*a[i];
}




Example 4:


code block 1
code block 2

becomes

#pragma parallel shared(list) local(list) byvalue(list)
{
 #pragma independent
 {
 code block 1
 }
 #pragma independent
 {
 code block 2
 }
}


Example 5:

(a)

sum = 0.0;
#pragma parallel shared (sum, a) local(i) byvalue(max)
{
 #pragma pfor iterate (i=0; max; 1)
 for (i=0; i<max; i++)
 sum += a[i];
}

(b)

sum = 0.0;
#pragma parallel shared (sum, a) local(i) byvalue(max)
{
 #pragma pfor iterate (i=0; max; 1)
 for (i=0; i<max; i++)
 #pragma critical
 {
 sum += a[i];
 }
}


































































August, 1992
PROGRAMMING PARADIGMS


The Living Room and Other Markets




Michael Swaine


Apple is starting a consumer-products division. Apple is not well known in
this market, and traditional consumer-goods vendors might say that this market
is not well known at Apple. Perhaps a word of advice would be appropriate.
Perhaps it would even be appreciated.
Rumor has it that one can see, taped to a wall here and there in Apple's
Cupertino quarters, copies of a letter from Stan Cornyn, president of Warner
New Media. Excited by the prospect of Apple getting into consumer products,
Cornyn had written to share his thoughts on how Apple could avoid getting
Betamaxed out of the market. I here reproduce the letter, which Cornyn doesn't
mind the world seeing, in full.


21 Rules for Apple Consumer Electronics


1. Once you make a box, don't change the way it works for at least ten years.
2. Start from cheap boxes and progress to expensive ones, as the Japanese
automobile industry has done.
3. Find an exposure method as powerful as top 40 radio and video arcades to
show what your box can do.
4. Hire a vice-president of CD-ROM sales. Fund the position. Insist that the
VP sell 500,000 CD-ROM drives in 1992, no matter what it takes.
5. Lowering the price of your box is not enough. You must also increase its
value for all to see.
6. Once your CD-ROM drives have been sold, help publishers learn the addresses
of buyers. Otherwise publishers can't target their marketing efforts and will
begin to sleep around.
7. Real consumers do not want tools. They want the product of tools.
8. Forego early profits for long-term market share, and don't bitch about it.
9. Court mass publishers. If they can sell a million copies, you have sold a
million boxes.
10. Stop selling to school administrators. Sell to students.
11. Make your box part of the home-information system. Make portable boxes
that can connect with telephones, stereos, and cable.
12. Present your CD-ROM players to the public as Ultra CD audio players that
can do extra things. That way people will catch on faster.
13. Fifty titles is not enough. Two thousand is more like it. Devise a plan to
obtain 2,000 titles fast.
14. Color pictures look better than black and white.
15. Having better-looking pictures isn't enough. Remember Betamax? More titles
is the answer.
16. Can you sell this box at Wal-Mart? If not, rethink it. Maybe paint it
pink.
17. Whoever makes a box that easily converts to digital video is the winner.
Whoever doesn't is planning obsolescence.
18. CD systems using TV screens will always be cheaper. Emphasize the
advantages of monitor display.
19. Don't call it a computer. Call it, for instance, a Time Machine. Not
P3-TV. 4D-TV is a better choice.
20. Start product development with a breakthrough marketing campaign. Force
your engineers to catch up with you.
21. Careful market research results in one sure thing: You will be late to
market.


Divide and Conquer


The story that this list can be seen taped to the walls at Apple is not
surprising. Apple seems to be very serious about the consumer market,
recognizing, as its legions of unsolicited advisors would like it to
recognize, that the consumer market is not just another niche, not even just
another market in the sense that video professionals are another market, but
something quite different from anything Apple is used to. Apple is going to go
after the consumer market via a new, separate division of the company. That's
a promising sign, although the initial forays into the consumer market look
suspiciously like spin-offs or repackagings of existing computer-division
products and technologies.
In any case, along with the spin-off companies Kaleida and Taligent, it looks
like Apple's new strategy is divide and conquer, in the amoebic, rather than
the Napoleonic, sense.
Here's a mid-year summary of developments in three areas holding a lot of
Apple's attention lately: multimedia technology, the home and/or consumer
market, and its more traditional hardware and operating-system efforts.
Kaleida, the multimedia company, has a CEO: Nat Goldhaber. Theoretically, the
company was launched last October and apparently some not-so-theoretical work
was getting done under its nameplate, but it's hard to take a company
seriously until it has a CEO. Goldhaber has been bouncing around the Mac
universe for a while; he was one of the founders of Centram, the developer of
TOPS.
QuickTime has caught on. One Apple competitor, Silicon Graphics, is reportedly
negotiating to license it, with the fairly obvious intent of using it to make
SGI's low-end Indigo (described here in February 1992) even more competitive
with Apple's high-end Quadra line for video professionals. Currently, the
Indigo is clearly a better deal than a Quadra on a hardware basis, but SGI has
nothing like Apple's third-party software base. SGI machines, however, have
very impressive video-editing and video-production capabilities, a point not
lost on the some 300,000 video production professionals in the U.S., among
whom SGI is highly respected. That includes the Hollywood types: SGI hardware
was used in producing the special effects for Terminator II. SGI is also
reportedly negotiating for new Apple imaging model technology.
Apple, eyeing those 300,000 video professionals and the future in-house video
professionals in business and the potential VCR home-movie producers, would
like to get some of that Hollywood credibility. The theory is that respect
trickles down. If that's the theory, then Apple's visibility at this year's
National Association of Broadcasters conference didn't hurt. Not only was
Apple there in force, but there were also a lot of third-party products that
showed off the Mac's capabilities for editing video, teleprompting, and
communicating with various entrenched formats and devices.


The Brave Little Toaster Division


Apple's PDAs, or personal digital assistants, are actually going to be called
Apple Intelligent Assistants, or at least the first ones are. These handheld
things, due out early next year but pre-viewed at CEBIT in Germany and CES in
the U.S., are not really consumer products, but they do show Apple taking a
step in the direction of toasterization. The original Mac, which many people
thought looked like a toaster, was supposed to be an appliance. It wasn't, and
subsequent tweaks moved it farther and farther from Toasterland, jumping the
tracks at Computerville Junction. PDAs (aka AIAs) look like the first step
Apple has made back in that direction since 1984.
Here's what the weeklies think the first AIA will be like: Due to ship in
January 1993. Price under $700 and falling. RISC-based and faster than a Mac
IIfx, hardware by Sharp, multitasking OS by Apple. OS to be licensed to other
manufacturers. Pen-based, with no boxes to fit letters into, no training
required (or allowed?), and no visible file system. Infrared link, serial
port, able to dock to a Mac and maybe to a PC running Windows. Titles on SRAM
cards being developed by third parties like Random House.

Apple's approach to these things apparently didn't sit well with Bill Atkinson
when he was still with the company, and was reportedly the reason for his
founding General Magic with Andy Hertzfeld and Marc Porat, where they have
been pursuing the approach they think ought to be taken to handheld computing
devices. Although they're very secretive and deny all published reports about
what they're doing, their idea apparently has to do with an operating system
for devices that communicate with other devices: cellular phones, handheld
computers, and so on. Rather than producing a product and putting their label
on it, they are pursuing licensing agreements. How this fits with Apple's
efforts is unclear, but Apple (along with Sony and Motorola) is a big investor
in General Magic.
Meanwhile, Apple is working on the multimedia Macs that it expects to put on
the shelves for Christmas. The Apple version of the MPC, apparently. Perhaps
as evidence that it is trying to learn how to address this market, the company
recently: 1. Put together a computer+printer+software bundle for under $1000
(the OS and bundled software are on ROM); 2. has been evangelizing heavily for
content providers for its multimedia Macs; and 3. brought out new, more
powerful versions of its machines at prices lower than those of the machines
they replaced. Apple is trying, folks.


They Also Serve


And on the traditional Apple front, trying to convince corporate buyers that
the Mac is not a toy, Apple has been hearing from other advisors. Consultants
recently presented Apple with a list of priorities, high among which was,
build us a dedicated server. Mac managers generally look elsewhere than Apple
for server hardware and software, but Apple hopes to change that with its
dedicated server, expected to ship in January 1993. It's a 68040 box without
24-bit color and other frills, and with A/UX as its native OS, but with
support for Mac System 7 and System 6.
Oh, yes, A/UX. Apple's version of UNIX. One of the many Apple operating
systems. Apple will be spending more time in the future supporting various
operating systems, apparently. The Mac OS is getting reengineered and fitted
with a UNIX-like kernel, and should be around for a while. The operating
system being developed for the PowerPC architecture by the Taligent joint
venture with IBM is being described as addressing a different market, although
who knows what the story will be when it actually arrives.
When that day comes, a product from a Bell Labs spin-off, Echo Logic, should
come in handy. FlashPort lets you port Mac applications to the PowerPC in
days. This is not to be confused with the emulation option: The new operating
system is supposed to run Mac software (and non-Mac software, too) at native
680x0 speeds via emulation modules called "personalities." FlashPort is an
option between emulation and total reengineering: It involves a multistage
analysis of program logic achieved through examining registers and is supposed
to produce 90 percent of the performance of code written for the PowerPC. It
will probably be very expensive, but Echo Logic is considering setting up
porting centers where you can buy a port.
Apple, meanwhile, is making interesting moves toward another operating
environment. QuickTime is not the only piece of Mac system software being
ported to Windows; there are rumors of an Intel version of the Mac OS. Perhaps
such rumors make Microsoft nervous; perhaps that's what they're for.
Meanwhile, though, Apple's software subsidiary, Claris, is definitely porting
its products to Windows, starting with a well-received port of FileMaker.
That's all strange enough, without NeXT negotiating something or other with
Microsoft.


Is This the Pro Shop?


I want to wrap this up with a look at one Mac product, LinksWare.
LinksWare. Sounds like golfer garb, doesn't it? Little alligators on the
shirts, sun visors, bad color combinations. Actually it's both a company and a
product. The company is LinksWare Corp., 812 19th Street, Pacific Grove, CA
93950; 408-372-4155. The product is a clever idea, and that's chiefly what I
want to present here: the idea of LinksWare.
Let me underscore that. This is decidedly not a review of the product, which I
have yet to get to work quite as advertised on my system. Those last three
words are significant, and could be the whole explanation; my Mac setup is a
little nonstandard, and LinksWare is a small company with limited resources.
The many-hatted LinksWare author Tracy Valleau shouldn't spend those resources
on accommodating weird hardware and software setups. If it looks like an
actual review of LinksWare is warranted, I'll test it on another machine.
Although LinksWare the company is small, LinksWare the product, or at least
the idea behind LinksWare the product, is big. It's Ted Nelson's and Doug
Engelbart's and Vannevar Bush's idea of hypertext. LinksWare is a tool for
creating hypertext and hypermedia links between files. But it's a cleaner
implementation, conceptually, than many I've seen.
LinksWare is not a separate environment like HyperCard. It uses files created
by all the major Macintosh word processors and painting and drawing programs,
and it can also create links to video segments recorded in Apple's QuickTime
movie format. It doesn't modify the original files, so it can link to files on
CD-ROM. It doesn't require any rekeying of data. It also doesn't require any
programming or HyperCard-like manipulations.
It makes a clear distinction between the developer and the consumer of
hypertext/hypermedia products. LinksWare, the developer version, costs $189
and there are no royalties on documents produced (or rather, linked) with it,
nor any limitations on how you can distribute your linked hypertext. The
supplied LinksWare Reader, which takes up 203K on a disk, uncompressed, is
basically a full version of the product except that it does not allow editing
or adding of links. LinksWare Reader can be distributed freely, including
being bundled with your hypermedia documents.
LinksWare doesn't read every file format. It depends on Claris's XTND and
DataViz file-conversion technology to read files produced by other programs.
XTND works via translators kept in a folder. LinksWare is sold with many of
the more common translators, but more can be added by dropping them in the
folder. Translators are typically provided by interested vendors free of
charge and distributed on electronic services. Many Mac products, particularly
Claris products, come with a folder of translators.
Here's the short story on how it works, from the developer's viewpoint: To
create a link using LinksWare, you click on a word or select an area of a
graphic, then click on the file you want linked. In the current version, only
words can be used as links in text files, but multiword links are planned for
the next revision. LinksWare lets the hypertext author decide on how links
should be shown to the reader: with italics, underlining, or whatever.
And here's how it works, from the customer's viewpoint: When perusing the
hypertext created by LinksWare, the reader notices that a word is a link and
clicks on it. The associated file is opened and displayed. A keypress
dismisses it.
The full story, especially from the developer's viewpoint, is a little longer.
LinksWare has several utilities for maintaining links: There is a file-finding
routine that resurrects links when the files have been moved or renamed, and
menus give direct access to link words and linked files. A hypertext document
can span up to 795 separate files, with up to 127 links from each text file or
64 links from a graphic. There is a limit of 6000 link words and 10,000 links
altogether.
That's the description of the product. Not a terribly complicated idea, is it?
And yet, conceptually, it's closer to real hypertext than most of the products
that are called hyper-something or other. If only it were built into the
operating system; but no, not the operating system, because it ought to be
machine-independent. If only it were--universal.
Like Xanadu, for which we are still waiting, Ted.

































August, 1992
C PROGRAMMING


A Pensive Look at Pens and C++


 This article contains the following executables: DFLT13.ARC D13TXT.ARC


Al Stevens


This month is my fourth anniversary as DDJ's "C Programming" columnist, and it
is also the annual C issue. For this special time, I'm going to suspend work
on D-Flat and talk about some C++ issues and a potential pen-based D-Flat
library. Both subjects are forward-looking, I hope, and appropriate for this
issue. Next month wraps up the continuing saga of D-Flat. After that, we'll
start looking at D-Flat++, the rewrite of the CUA interface library in C++.
After four years of almost exclusive coverage of C-language programming in
this column, I am thinking that the growing popularity of C++ needs more
attention from this forum. I have been using C++ in my own development for
quite some time and am happy with and confident in my personal preference for
it. The time I devote to D-Flat++ and your reaction to it will chart the
course of this column for the future. If you are predisposed to resist, I ask
that you bear with me. Let me show through the comparison of D-Flat and
D-Flat++ how the notational improvements of C++ improve the craft. I might not
be able to demonstrate it in this column, but I have learned, too, that a
design process that concentrates on the objects first rather than the
procedures is more intuitive and produces a sounder and more maintainable
software system. This is not news. Others have been saying this for years. But
now there are C++ compilers at every level, and the rest of us can find out
for ourselves.


PenD-Flat


Pen-based computers are here for a while, and we should be aware of what that
means to software development. A pen-based development environment will
resemble one for an embedded system, where you develop on something other than
the target hardware. No one would want to squint and scratch their way through
a small clipboard-sized, handheld, compile-and-debug session with Borland C++,
for example. I guess you could do your programming while you ride the exercise
bike, lounge at the beach, or take that nature walk, but you'd soon lose
patience as you grappled with the pen and pad to change code, set breakpoints,
and look at variables. Besides, not many of the little critters will have the
megabytes of disk and RAM to install the new breed of gargantuan compilers.
No, the pen-based platform is not ideal for software development. You still
need all the tools that you need for traditional programming, but you also
need something hooked to the AT that simulates the pen and pad.
This should give you a clue about what kind of applications are best served by
pen-based software and what kinds are not. The notion that pen computers will
proliferate in executive offices in the hands of keyboard-shy CEOs is silly.
Today's executives might be less than computer literate, but that is changing.
Before pen-based technology is good enough for serious word processing and
spreadsheets, it will be overtaken by two things--voice-based computing and a
new generation of technology-savvy executives. Therefore, do not plan to
launch a pen-based word processor product. No one will buy it. Pen computing
will wear a blue collar and will embrace vertical applications. It will
realize its potential in walk-around applications where mobility and freedom
from cabling are important and where there are some data to be entered that
cannot be scanned. The UPS person in my hometown uses a pen tablet to record
deliveries and recipient signatures. It's an experimental program. In some
cities, you can already get a traffic ticket written and printed by a
pen-based computer--RoboCop. Combine a pen with a bar-code reader, and you
have an excellent warehouse-inventory input device. Scan the inventory
number's bar code and write the quantity-on-hand. Aluminum siding, screen
enclosure, driveway paving, and roofing salespeople could use small pen-based
CAD applications to design and cost home improvements while sitting in the
customer's living room. Vertical applications every one. Opportunities for
programmers.
Most pen-based systems have graphical operating environments at their
foundation, and you need a GUI development environment. Microsoft PenWindows
is one example. PenDOS, from Communications Intelligence Corporation (Redwood
City, California), is a pen-based operating environment that runs in text mode
on top of MS-DOS. When used along with a cooperating digitizing tablet and
pen, PenDOS provides a pen interface to text-mode DOS applications. Typical
DOS applications need little or no modification to run with PenDOS. Most mouse
and keyboard operations are handled by the environment as if you entered them
with traditional devices. PenDOS emulates mouse actions by the movement and
tapping of the pen on the screen's surface. It emulates keyboard input by
popping up a Writing Window into which you manually write text with the pen.
The environment uses character-recognition algorithms to translate your
scribbles into keyboard characters. Your entries are displayed in a text-entry
window, and when you like what you've written on a line, you tap the Send
button. PenDOS stuffs the characters into the application as if you had typed
them. It works well, recognizing a variety of handwriting styles and requiring
little practice to become comfortable with its use.
Even a pen-based application needs a user interface. I was not sure that the
interfaces that work with the keyboard, screen, and mouse would be meaningful
on the pen/tablet platform, so I decided to use D-Flat and PenDOS to
experiment with a pen-based CUA interface. In theory, a D-Flat application
would run under PenDOS with little or no trouble. This combination would offer
the developers of pen-based applications a CUA interface that uses pen taps
and movements instead of mouse actions and hand-written pen text input instead
of keyboard input.
I used the Wacom HD-648A Handwrite Digitizer to test PenDOS with D-Flat. This
device has a VGA-compatible LCD screen and a serial pen-input device. The
drivers that come with PenDOS work with the Wacom. The pen emulates a serial
mouse. This particular Wacom device is intended for developers who would test
their programs with traditional PC hardware. Once debugged, the applications
could be embedded into specialty pen-based DOS computers. The ideal testing
configuration has the Wacom connected to the VGA port and the system console
assigned to a monochrome monitor--the typical two-monitor testing platform.
That allows you to use the keyboard and monochrome screen to debug while the
Wacom VGA device displays the application screens.
With PenDOS and the Wacom digitizer installed, the display satisfactorily
emulated the LCD VGA. I tried out some DOS commands through the Writing
Window. It works, but users will probably not like entering DOS commands that
way. You'll need to boot into the embedded application, I think. I ran
D-Flat's Memopad example program unmodified under PenDOS. The pen properly
selected menus, chose menu commands, pressed command buttons, and selected
items from list boxes, check boxes, and radio buttons. I could move and resize
windows easily with point-and-drag operations. Double-clicking is
double-tapping with the pen. If anything, the CUA mouse operations are easier
and more intuitive with a pen pointed directly at a horizontal screen than
they are with a mouse on the desk moving a cursor on a vertical screen.
(A hint. If you lose pencils, combs, car keys, eyeglasses, and your TV's
remote control with predictable regularity the way I do, then you'd better tie
the pen to the pad with a strong string. Until I did, that pen disappeared at
least daily and then showed up in the darnedest places.)
All D-Flat needs to make it a complete pen-based user interface is an edit-box
control that uses handwriting input similar to the PenDOS Writing Window, and
the PenDOS "gesture" characters for text editing with a pen. I have to return
the borrowed Wacom digitizer, so there is not time to do that, but I might get
another crack at it later, perhaps as a part of the D-Flat++ project.


C/C++ in Action


I attended the C plus C++ in Action conference in Teaneck, New Jersey during
the week of April 27. This conference, which has a London edition in June and
one in Santa Clara, California in September, is organized and operated by the
Wang Institute of Boston University. Although it aims at both C and C++, the
strong emphasis of the conference and the attention of the attendees was
definitely on C++. I sat on a panel that discussed C and C++ as a family of
languages. Christopher Skelly, the technical chairman of the conference was
moderator. The panel included Bjarne Stroustrup, Jim Brodie, Dmitry Lenkov,
P.J. Plauger, Jim Coplien, and me. Bjarne, of course, designed C++. Brodie is
chairman of the X3J11 ANSI C Standards committee. Lenkov is chairman of the
X3J16 ANSI C++ Standards committee. Plauger is a well-known author and speaker
on C and a member of X3J11 and its ISO counterpart. Coplien is an AT&T
researcher and instructor and the author of Advanced C++ Programming Styles
and Idioms. As you might expect, this motley panel contained overlapping camps
with overlapping agenda--language designer, language standardizers, authors
about language, and language users.
The panel discussed several concerns, among them the feeling in the C/C++
community that the independent efforts to standardize and extend the two
languages could make them less compatible than they are now, creating a
watershed between C and C++ and polarizing rather than assimilating the two
groups. The panel's consensus was that C and C++ should become no less and no
more compatible than they are today.
There was a lot of attention given to what might happen with C and C++ in the
next several years, and less given to what programmers in the audience had to
deal with on the following Monday morning when they went back to work. The
questions from the audience indicated--to me, at least--that they were as
interested in the latter issue as in the former. The panel's discussion was
directed more to the former.
One member of the audience asked why, in a session he attended, most of the
other attendees held up their hands when asked who was a C++ programmer, yet
few of them admitted to understanding or using object-oriented design and
programming. Wasn't OOP, asked the programmer, the primary advantage that C++
had over C? And, if so, how come so few C++ programmers understood or
practiced it?
Most of those programmers have learned the notational extensions that C++ adds
to the C language. They know about C++ comments, references, global scope
resolution, default function arguments, inline functions, mixing declarations
and procedural statements, const, anonymous unions, unnamed function
parameters, and structs and enums as discrete types. And they agree that C++
is easy to learn and use--as long as you don't have to design and use classes.
The gray areas in C++ get darker when you get into classes and inheritance,
and that is where many programmers have the most trouble.
And that, in a nutshell, is the number one failure of the C++ evangelism, I
believe. The most powerful feature of C++ is the one that is most difficult to
learn to use, and we are not moving fast enough to correct the learning
problem. There are real advantages to using C++ over C. There are few
programming circumstances where C++ will not do a better job. One of those
circumstances is when the programmers do not understand the advantages and are
not sold on giving it a try. It is a public relations problem as much as
anything else. Programmers are immunized to hype.
The question was asked, "When should you decide to use C and when should you
decide to use C++?" Most of us had seen or heard about projects where the
decision to use C++, and particularly object-oriented design, was made because
someone in a position of influence had been told that C++ is not only trendy
but a panacea as well. In virtually every such instance, the project failed.
If the problem domain is already supported by a mature library of C functions,
then C is a good choice. If the programming staff has neither experience with
OOP/C++ nor the inclination to learn, then C is still a good choice. If
neither condition exists, then C++ is a good choice. But the one compelling
decision point is the bent and the ability of the programmers to use C++, and
that is not always foregone.
To further illustrate, let me retreat to an earlier conference session where
Jim Coplien described some advanced ways to make C++ do things that you might
want it to do. This was an intense, informative, and entertaining session.
Jim's voice and delivery remind me of TV news-journalist Michael Kinsley, who
baits his co-host and guests on "Crossfire" every week, although Jim gets his
point across without making you want to punch him in the nose. I'll discuss
two of his C++ techniques without trying to explain how they work. In the
first place, I might get it wrong. That notwithstanding, there is a much
better source for this information, and I highly recommend Jim's book,
Advanced C++ Programming Styles and Idioms, which addresses these two methods
and much more. I'll discuss the book later in this column.
The first technique, called the "counted pointer," shares data values between
the first instantiation of an object and any copies you might make through
assignment. It overloads the -> operator so that you can use it to get at the
common representation. With the class defined according to Jim's
specification, you can write the code in Example 1.
Example 1: The counted-pointer technique.

 String s1;
 String s2 = "Hi";
 s1 = s2;
 int len = s1->Length();

Note that although s1 is not a pointer, the notation treats it like one. Never
mind why you would want to do this. I'm not sure I would, but Jim gave
examples of where this approach solves a particular problem.
The second technique laments the lack of a virtual constructor in C++ and uses
what is called the envelope/letter idiom to give the effect of virtual
constructors. The using program deals only with objects of an envelope class,
yet the objects themselves take on different characteristics, depending on the
context in which they are constructed, as defined in letter classes. This goes
a bit farther than simply overloading constructors, because you can add
contexts later without modifying the envelope class, thus giving the effect of
a virtual constructor mechanism. Example 2 shows some code that uses objects
of a class defined with the envelope/letter idiom.
Example 2: Objects of a class defined with the envelope/letter idiom.

 Number a, b, c; // 3 Number variables
 a = 1; // a is integral
 b = 2.3; // b is real
 c = (Complex) a+b; // c is complexm


All three variables are of type Number, yet each has a different data
structure and behavior. Once again, don't worry about understanding why a
programmer would want to do this. The point is that you are being told that it
is possible with C++.
Now, imagine a discussion between a C++ guru and a C programmer who has been
told that C++ is the future. The guru, having attended Jim's session or read
Jim's book, foolishly begins to explain some of these methods to the C
programmer expecting these features to be a strong selling point to a C++
rookie.
Guru: You can overload operators in C++. You can even overload the addition
operator to perform subtraction on your numerical classes.
Programmer: That's good?
G: No, that's bad. It would not be intuitive and would be poor programming
practice. You might write a program that no one else could read.
P: So overloading operators that way is always bad.
G: No, sometimes it's good. There are times when you want to overload an
operator in nonintuitive ways. Consider the counted pointer. Its -> operator
is overloaded so that you use it to the right of an object that is not a
pointer.
P: I don't understand why one example is good and the other is bad.
G: Because finding a solution to the problem outweighs the need to write
intuitive code. If it offends you, don't use it.
P. Then why are you telling me about it?
G: Because you might need it some day, and to show you another
less-than-obvious way that C++ is extensible--how it lets you go beyond the
original language design.
P: What are some other ways?
G: Consider the letter/envelope idiom that gives you the effect of virtual
constructors. You can instantiate three numerical types by saying, Number a,
b, c; and then you can say a = 123, and a will be an integer. You can say b =
456.78, and b will be a float. You can cast a+b to a complex number, assign it
to c, and c will be complex. There you have three variables, all of type
Number, and each one has a different internal data representation and
different behavior, based on the context in which you assigned something to
it. Neat, huh?
P: (Getting hot) Wait a minute. I could do that 15 years ago with Basic and I
didn't have to jump through all these stupid envelope/letter-class hoops. I
want to use C++ because it has strong typing, and they tell me that's what I
need. What happened to all that strong typing?
G: (Smugly) With C++ you have the freedom to bypass it.
P: If I want freedom, I'll stick with C.
And there, dear hearts, is where you lose a convert. This is typical of the
hype that obscures C++. Bjarne observes that programming shops have a lot of
reasons not to switch to C++, and he effectively counters each one of them.
But his message is delivered to a kindred audience. The attendees of these
conferences are predisposed to like Bjarne and the language that sprang from
his labors, even when they don't fully understand it. But there is a strong C
constituency out there, and, until recently at least, that constituency was
growing. Many of them don't live, breathe, and sleep programming and every new
paradigm shift, and they don't go to the conferences to worship at the graven
C++ image. Those are the folks we need to reach, and they are mainly
skeptical.
Consider what C programmers see when they look at source code. Unless there
are some tricky preprocessor macros, what they see is what they get. The C
source code reveals everything the C compiler is doing for the programmer. If
a function is going to be called, the programmer must call it. If a variable
is going to be declared, the programmer must declare it. Not necessarily so
with C++, where a lot goes on under the surface. The simple statement that
assigns one object to another can launch a barrage of constructor functions
and hidden temporary objects. Until the programmer is comfortable with that
sort of semi-organized chaos, there will be some furrowing of beaded brows
when he or she steps through the code with a debugger.
Then there are the many hidden features in C++ that you cannot reveal up
front, lest you scare the pants off the programmer the way our C++ guru did a
few paragraphs back. There is a lot to learn after you learn the language.
First you solo. Then you get your pilot's license. Then you learn how to fly.
Learning C++ is a process of discovery. Teaching C++ is managing that process.
For example, I want to call a derived virtual function from within a base
constructor. Why not? It seems like a good idea. I'm stepping through the
program now. How come I'm executing the base's function instead of the
overriding virtual derived one? It's right there in the code. The compiler
knows about it. (Slow down. Let it sink in. The base class's constructor is
running before the derived class has been constructed. There is no derived
virtual function yet.) What's that you say? There is, too. I compiled it
myself and it's in the memory map. Don't tell me it doesn't exist yet. (Well,
yes it does, but not in the name of the object you are instantiating at this
time. Virtual functions are called through the object's pointers to the
derived class's virtual function table, and your object hasn't built those
pointers yet.) How was I supposed to know that? (Now you know.)
Here's another example. I'll just call a constructor function from another
constructor function for the same class. One of them does most of the work, so
I'll let the others do their unique processing and then call the principal
one. What's this? It doesn't work. The principal constructor never executes.
So, I step through the code with the debugger. There's the constructor. I'm
executing it. How come what it is constructing right here before my very eyes
never gets constructed in my object? Well, it's done. Now what? I'm in the
destructor function! What am I doing here? (Slow down again. When you called
that constructor function, you instantiated an unnamed, hidden, temporary
object. That's why the original object does not get constructed properly; you
are constructing one you never see. When you exited from the original
constructor function, the temporary object went out of scope, and its
destructor was called.) Why can't I just call that constructor and get it
called the way I want? (Because the syntax for calling a constructor function
is one of the several ways that you instantiate an object.)
These are just some of the traps and pitfalls that await the unwary fledgling
C++ programmer. Word about them leaks out, and programmers get intimidated.
You have to discover every one of them for yourself. They are neither
intuitive nor obvious in the language syntax, and even when an instructor or
tutorial book tells you about them, you forget until they bite you.
Programming is supposed to get easier with each new shift, not harder. Hah.


Coplien's Book


The literature of C++ and object-oriented design and programming is improving.
As promised, here are my reactions to James O. Coplien's Advanced C++
Programming Styles and Idioms (Addison-Wesley, 1992). To begin with, do not
use this as your first book on C++. Its title should tell you that. The book
assumes that you already know the syntax of the C++ language, and its emphasis
is on programming idioms, which use the features of C++ to "express
functionality outside the language proper, while giving the illusion of being
part of the language." This is extensibility to the extreme. C is extensible
in that you can add functions that modify data values and program flow. C++ is
more extensible in that you can add data types with their own behavior and
representations. The programming idioms that Coplien advances extend C++
further still, apparently changing the behavior of the language to solve
problems not addressed in the language design. The book begins by discussing
some traditional idioms that even the newest C++ programmer will
recognize--copy constructors that solve the memory-management problem and
iostreams that use familiar overloaded insertion and extraction operators, for
example. But gradually, the book gets into the more arcane, such as the
counted pointer and envelope/letter classes discussed earlier.
That you can do these tricks with C++ and that the practice gains credibility
by endorsements such as this book are what give pause to many an outside
observer. Some of this stuff is the bungee jumping of programming. Programming
on steroids. If you do not like C and you do not want to like C++, this book
will reinforce your bias. If you like C and C++, this book will reinforce
those feelings, too. Congratulations, Jim, you've written a book that everyone
will like.
Coplien's close association with C++ since its inception allows him to provide
personal insight into some of its history and how parts of the language
evolved. Even if you think you already know C++ pretty well, I'd recommend
this book. There is always more to learn, and the book is well written -- a
delight to read, in fact -- and may very well be the most important
programming book since K&R.
































August, 1992
STRUCTURED PROGRAMMING


Making Patents Work




Jeff Duntemann, KG7JF


Watch close, gang (and no giggling!). The boy is about to do something he
doesn't do often: change his mind in public. Furthermore, the subject in
question is one I thought I'd fight to the death in opposition: software
patents. I think about it a lot, because software patents could still mean the
end of the American edge in software development, especially for the small
developer. I'm glad, at times, that I make my living writing about programming
rather than actually doing it, because it's entirely possible that software
patents could bring progress in programming technology to a screeching halt in
another ten years.
It doesn't have to be that way.
Software patents could actually work for the small developer and not against
him. The problem, in fact, is not with software patents at all, but with the
patent system itself, as it is currently inflicted upon both hardware and
software. This notion has been burning a hole in my pocket, so forgive me if I
step back from Turbo Vision for half a column and explain.


What are Patents For?


People who have defended software patents from the outset claim that patents
are absolutely necessary to ensure that an inventor's investment in time and
effort is rewarded and not simply appropriated by someone else. Right on,
brother. Trouble is, patents don't do that. They don't even come close.
Patents currently ensure nothing except that the lawyers will get paid, and
that the guy with the deepest pockets wins the war 85 or 90 percent of the
time.
This is crazy. An occasional inventor wins the war against big infringers,
like that chap who invented a certain kind of pliers and won big against Sears
Roebuck some years ago. We regularly hear about the rare cases like this, and
almost never about the inventors who run out of money halfway through an
infringement case and simply give up, having not only lost their invention to
infringement but frequently all their savings and credit fighting that
infringement.
The current patent system is brutally stacked against individual inventors,
who nonetheless petulantly defend it because they see it as all they've got.
What they don't see (remarkably, if they call themselves inventors) is that
the patent system could well be improved several thousand percent, simply by
refocusing it on the jobs it is supposed to do: 1. Ensuring that research and
invention is rewarded; 2. creating a business climate in which engineering art
can progress briskly.


Stealth Patents


Currently, patents work diametrically against both of those stated aims. The
biggest sin committed by our patent process against Job #1 is that patent
applications are kept secret until patents are granted. People can invest a
great deal of time and money independently inventing something, only to
discover that someone else suddenly owns the idea, and everything they've done
is lost, or else at the mercy of the new patent holder.
Such "stealth patents" rob the independent inventor of his or her research
investment, and I've never heard a good explanation of why this must be so.
Patent applications should be published immediately, so that persons
independently researching the same concept have a chance to give up the fight
before they lose their shirts, or else advance the engineering art by
innovating around the published application. Also, making patent applications
public immediately would allow prior art to be discovered and put forth by
people outside the patent office before the patent is granted. Right now,
prior art must be proven in court after a patent is granted, and all court
fights are frightfully expensive, ultimately benefiting only our smug ruling
class of trial lawyers.


Locking Out the Little Guy


The key problem with patents, however, is that they're used by large concerns
to control markets and lock out small startup companies and individual
inventors. IBM and other large companies frequently cross-license their patent
portfolios at little or no cost to one another, in a sort of you-use-my-stuff
and I'll-use-your-stuff deal. However, the small firm holding one minor patent
has little to offer the Big Guys and must accept their royalty terms, usually
a percentage for each patent licensed. Having to pay cash to license a
complicated web of patents can make a product economically impossible. The
large companies can freely use one another's patents as part of technology
exchange licenses. The little guy is locked out. None of this should be
surprising. Big companies like IBM and TI are awesomely inefficient, and they
fear nothing so much as the hungry, low-overhead technology partnership
working miracles in their jeans in a crufty part of South Phoenix.


Crazy Claims and Prior Art


I heard a talk given to hopeful inventors by an "invention consultant" some
years ago, and it was an eye-opener. The most important thing about a patent,
according to the consultant, was to draft the claims portion of the patent
such that innovating around the patent was impossible. The idea is to make
them as broad as possible without causing the examiners to reject the claims
as unrealistic. In most cases, the chap went on, the patent office will let
extremely broad claims pass--and once you have the patent, you can take
infringers to court with the legal edge on your side.
If patent claims are intended to make further innovation impossible, they're
working directly against the core of the patent idea, which is to make
innovation rewarding and encourage the advance of engineering art. Standards
for breadth of claims must be tightened, and existing patents with overly
broad claims should be rescinded as fraudulent.
Prior art has proven a serious problem in software patents, in part because
most software innovation has been protected as trade secrets rather than
patents, and thus not published at all. But simple ignorance on the part of
the patent office has allowed many absurd applications to obtain the legal
strength of a granted patent.
Paul Heckel's ridiculous Zoomracks patent is the best example I've seen. Paul
patented what amounts to a picture drawn on a text screen of something I used
many years ago as a clerk at Xerox: the Ring-King Visible Records Rack, a
steel plate covered with plastic card-holders that overlapped such that only
the bottom quarter inch of each card was exposed. You could scan the rack for
the name of the customer, then flip the card up and read the rest of the data.
Zoomracks works pretty much this way. There's some additional gobbledegook
about compressing data by yanking out vowels that looks a great deal like
1950s secretarial shorthand.
Should drawing a (marginal) screen picture of some device commonly used in the
outside world be patentable? Is shorthand patentable?
Don't be silly. Yet the patent office let it pass.


So What's Obvious?


When our current patent system was designed, virtually all inventions were
mechanical in nature, and relatively simple at that. Any reasonably educated
person could tell if an invention was "obvious" or not. Well, technology has
unfolded in countless ways, and the nature of obviousness is now something
known best to regular practitioners in the field. This is doubly true today of
software patents, which is an area that has not, until very recently, been
subject to patents, and which most patent examiners know very little about.
What needs to be done is to convene volunteer examiner panels in each of
numerous technological specialties, including software, and have the panels
rule on whether applications are attempting to patent the obvious. Let the
panels' decision be final--or at best, allow one appeal to a second panel
convened of different individuals from the same field.


The Big Win



But above all else, bar nothing, the reform that could make our patent system
realize its stated goals is to remove patent-holders' ability to control
markets. Right now, if you hold a patent, you can offer it to some licensees
but not others, charge different rates to different licensees, and all manner
of market-manipulation mischief like that. My reform proposal is to create a
system of competing rights collectives that would represent inventors and
collect royalties on patents for them, taking a small fee. Inventors would be
allowed to negotiate other licenses, but they would be required by law to make
their inventions available for a standard, regulated maximum fee to anyone who
wishes to license them.
The music business works a lot like this right now. You don't have to
individually negotiate the right to perform a song in a bar with each
songwriter. Instead, you buy a license from ASCAP, which distributes the
royalties to participating songwriters. This works well, as judged by the
enthusiasm of the songwriters for the system.
ASCAP has a huge war chest for prosecuting infringers, which they do
relentlessly and remorselessly, as many small bar owners will surely attest. I
think it's a great system. I think it would work phenomenally well for
patents.
I envision it happening this way: Some portion of a product's net receipts
would be earmarked for patent royalties. I think 15 percent would be about
right, since (having done a little product developing myself) I think that no
more than 15 percent of the manufacturer's portion of the value of any given
product is pure innovation. Much of it is marketing, documentation, and simple
implementation of public-domain art.
A concern wishing to license patents would register a product with one or more
rights collectives, specifying which patents it is licensing. Those rights
collectives would split the 15 percent equally, and be constrained from
quibbling over which patent contributed "more" to the product's ultimate
value. Such questions only enrich lawyers and do nothing for either inventors
or the engineering art. A valid, paid-up license with a collective would
protect the licensee from any patent litigation of any kind from that
collective.
With the collectives working for them, doing the legal wrangling and crunching
the paperwork, inventors could do what they do best. Better still, other
inventors could build on the work of their fellows, and those with the
most-licensed patents would get the most money. Inventors would be rewarded,
and the progress of technology would go absolutely through the roof.
I would support this system. I would holler from the heights in favor of this
system. I would even support nonobvious software patents under this system. I
can foresee that IBM, TI, and Apple would oppose it, as would (of course) the
lawyers' bloc.
Could it ever happen? Who knows? The USSR is history--weirder and tougher
things have happened. We've got a long way to go. And knowing what I know of
lawyers, I'm still damned glad I'm a writer and not a programmer.


A Mortgage-ing We Go


Enough of that. We're riding the whirlpool here, trying to make sense of Turbo
Vision streams. I provided an overview of how streams work last month. This
time, we'll start having a look at a practical example, a revision of HCALC
.PAS that has the ability to write mortgage tables to a stream and read them
back again.
My first job in adding streams to HCALC was, in fact, adding streams to the
mortgage object itself. The revised MORTGAGE.PAS is given in Listing One (page
164). HCALC is a serious chunk of code and will have to wait until next month;
there's plenty to talk about in the meantime.
I had to make three general mods to MORTGAGE.PAS:
I added the TMortgage type to the Turbo Vision object hierarchy by making it a
child of TObject. I explained the reasons for this in detail last month: The
very first field in any streamable object must be a pointer to that object's
VMT. Since TObject is itself the child of no object and has a VMT pointer as
its first field, all objects that descend from TObject will have the requisite
VMT pointer as their first field.
I created a stream-registration record for TMortgage. It's called RMortgage,
and it exists primarily to let the Turbo Pascal RTL know where a given object
type's Store and Load methods exist in the code segment. Consider it a record
in a behind-the-scenes index file that the RTL keeps of all its VMTs and
stream access methods.
I added a Load and a Store method to the TMortgage object. The Load method
must be a constructor, because what it does is very similar to what the
quintessential constructor Init does: It builds an object on the heap. To do
this, it uses information it reads from the stream. Init, by contrast, builds
an object on the heap from information hardcoded into the constructor itself.
Store, on the other hand, is an ordinary method.


Looking Closely at TMortgage.Store


Storing an object out to a stream is relatively easy. All the data is right
there, intact and accessible. As I show in the TMortgage. Store method, you
simply blast out an object's fields one at a time, using the Write method
belonging to TStream. Make sure you specify the stream when you call Write!
That is, make sure the call is S. Write (if the name of your stream is S)
rather than just Write.
One caution: If the object for which you're writing a Store method has a
parent object with a Store method (TMortgage does not, because its parent,
TObject, has no Store method), you must call the parent's Store method before
beginning to send your own object's fields out to the stream. You'll see how
this works next month, in the Store methods belonging to HCALC's TMortgage
View objects.
The Write method needs the name of the field, and the number of bytes of data
to be stored from that field to the stream. The best way to do this is to use
the built-in SizeOffunction on the field's type specifier, like so:
 S.Write(Principal,SizeOf(Real));
Since type Real is six bytes long, this statement writes six bytes of data
containing the field Principal out to stream instance S.
Only the last Write is vaguely tricky. If you remember from earlier
discussions of the TMortgage type, the mortgage amortization table itself is a
dynamically sized array on the heap. This allows you to have a 15-year,
30-year, or 247-month mortgage if you want, and not waste any heap space. The
PaymentSize field is a long integer containing the total size in bytes of the
Payments^ dynamic array. PaymentSize, passed as a parameter to S. Write,
allows you to write only the exact amount of data to the stream to embrace the
full length of the amortization table--and no more.
You'll note that although TMortgage has a pointer field named Payments, that
pointer field is not written to the stream. Pointers are 32-bit addresses of
memory locations on the heap. Writing them to disk is kind of pointless,
because there's no promise that when you bring a pointer back from disk, it's
going to point anywhere meaningful. We use the Payments pointer to help get
the amortization table out to the stream, but that done, we no longer need
Payments in the stream-writing process.


Getting Things Back from the Stream


What gets written out to the stream with Store gets read back from the stream
with Load. The Load method is conceptually similar to Store. You use the
stream's Read method to bring fields in from the stream, one by one, in the
same order that they were written out with Store. Again, you must tell the
Read method how many bytes of data to bring in from the stream with a second
parameter.
TMortgage.Load doesn't attempt to read a value for Payments from the stream.
We didn't write out anything for Payments to begin with. Instead, we allocate
(with GetMem) just enough space on the heap to contain the table and store the
address of that block of heapspace into Payments. We had previously read
PaymentSize from the stream, containing the correct size of the amortization
table. With Payments pointing to a correctly sized block of heapspace, we can
load the amortization table onto the heap directly, with S.Read:
 S.Read(Payments^,PaymentSize);
That's all it takes to get the mortgage object itself to and from the stream.
It may seem obvious, but for complicated objects with loads of fields, be
careful to read fields back from a stream in the same order that they were
written out.


More from the Confusion File


Which isn't to say that figuring it all out was easy. On page 157 of the Turbo
Vision Guide, it says, "Turbo Vision registers all the standard objects, so
you don't have to." That makes sense. Too bad that on page 163 it says, "The
rule is simple and unforgiving: It's your responsibility to register every
object type that your program will put onto a stream." Just my own? Or the
standard ones too?
The answer (in case you were wondering, and if you've tried to do anything at
all with streams then you've probably been wondering until your nose bled) is
that page 163 has it right: Turbo Vision registers nothing for you. It defines
registration records for all standard types (which is what I think the
marginal note on page 157 was trying to say) but you have to call the
RegisterType procedure for each standard type you intend to put out to a
stream.
Note that this does not mean that you have to register the object types that
you inherit from. TMortgageView inherits from TWindow, but I don't have to
register TWindow--just the exact object types that must be streamed. The
problem is that TWindow is a group, with a TFrame object attached to it
through a pointer. TWindow's own Load and Store methods know how to deal with
that frame object, so that you never have to bother worrying about writing the
TFrame to the stream. However, you must still register TFrame yourself.
I got seriously messed over on this one. While trying to put a TMortgageView
object out to a stream, I kept getting an I/O error code back from the stream.
The code showed up as -6, which equates to the stPutError constant, indicating
(see page 371 of the Turbo Vision Guide for the full st-series stream
error-code listing) that I was trying to put an unregistered type onto the
stream. I tried to register TView, TGroup, TWindow, and several other things
before I remembered that every TWindow object comes with its own TFrame.
Gakkh.


Do it Once and Then Stop!


All the standard types do, however, have predefined registration records,
using a standard naming convention: Replace the T at the beginning of the
standard type name with the letter R. Thus, the registration record for TFrame
is RFrame. All of the standard types provided by Borland have object ID codes
under 100, so any number over 100 is fair game.
Trying to register a type a second time will cause runtime error 212. This is
a catchall error message that will trigger if any of several things go wrong
while registering a type, but the two causes to watch out for are registering
a type with the same ID code as an already-registered type, and registering an
already-registered type. From TV's perspective, those two errors are
identical, because it's the unique ID code in a registration record that
defines a registration record as unique, not the name you give the record.
From your perspective, however, the first cause is generally choosing an ID
code that some other type already uses, and the second is registering types in
two or more different places in your application. To avoid confusion, gather
all your registration calls into a single procedure. And take pains to note
the registration ID codes of any third-party objects you incorporate into your
applications. And if you yourself provide streamable objects to other
programmers, complete with registration records, be sure to make it plain in
comment headers or documentation what those ID codes are.



Closing in on It


That's my word budget for this session. We're actually closing in on the
target, and it shouldn't take more than two more columns to peg streams
reasonably well. Next month I'll provide the updated HCALC.PAS listing, and
we'll speak of peer view pointers and other irritations. Month after that,
well, it might well be time to evaluate Turbo Vision on a cost-benefit basis,
and look around at other ways to skin the same cat.


_STRUCTURED PROGRAMMING COLUMN_
by Jeff Duntemann


[LISTING ONE]

{----------------------------------------------------------------------------}
{ MORTGAGE }
{ By Jeff Duntemann -- From DDJ for August 1992 }
{ Last Updated 5/2/92 }
{ Major update: 3/25/92: }
{ Added all the rigmarole to make the TMortgage type streamable. It now }
{ descends from TObject and uses the Objects unit. I also added the }
{ registration record and the Load and Store methods. }
{----------------------------------------------------------------------------}

UNIT Mortgage;

INTERFACE

USES Objects;

TYPE
 Payment = RECORD { One element in the amort. table. }
 PayPrincipal : Real;
 PayInterest : Real;
 PrincipalSoFar : Real;
 InterestSoFar : Real;
 ExtraPrincipal : Real;
 Balance : Real;
 END;
 PaymentArray = ARRAY[1..2] OF Payment; { Dynamic array! }
 PaymentPointer = ^PaymentArray;

 PMortgage = ^TMortgage;
 TMortgage =
 OBJECT(TObject) { Must descend from TObject to be streamable }
 Periods : Integer; { Number of periods in mortgage }
 PeriodsPerYear : Integer; { Number of periods in a year }
 Principal : Real; { Amount of principal in cents }
 Interest : Real; { Percentage of interest per *YEAR*}

 MonthlyPI : Real; { Monthly payment in cents }
 Payments : PaymentPointer; { Array holding payments }
 PaymentSize : LongInt; { Size in bytes of payments array }

 CONSTRUCTOR Init(StartPrincipal : Real;
 StartInterest : Real;
 StartPeriods : Integer;
 StartPeriodsPerYear : Integer);
 CONSTRUCTOR Load(VAR S : TStream);

 PROCEDURE SetNewInterestRate(NewRate : Real);
 PROCEDURE Recalc;
 PROCEDURE GetPayment(PaymentNumber : Integer;
 VAR ThisPayment : Payment);
 PROCEDURE ApplyExtraPrincipal(PaymentNumber : Integer;
 Extra : Real);
 PROCEDURE RemoveExtraPrincipal(PaymentNumber : Integer);
 PROCEDURE Store(VAR S : TStream);
 DESTRUCTOR Done; VIRTUAL;
 END;

CONST
 RMortgage : TStreamRec =
 (ObjType : 1200;
 VMTLink : Ofs(TypeOf(TMortgage)^);
 Load : @TMortgage.Load;
 Store : @TMortgage.Store);

IMPLEMENTATION

FUNCTION CalcPayment(Principal,InterestPerPeriod : Real;
 NumberOfPeriods : Integer) : Real;
VAR
 Factor : Real;
BEGIN
 Factor := EXP(-NumberOfPeriods * LN(1.0 + InterestPerPeriod));
 CalcPayment := Principal * InterestPerPeriod / (1.0 - Factor)
END;

CONSTRUCTOR TMortgage.Init(StartPrincipal : Real;
 StartInterest : Real;
 StartPeriods : Integer;
 StartPeriodsPerYear : Integer);
VAR
 I : Integer;
 InterestPerPeriod : Real;
BEGIN
 { Set up all the initial state values: }
 Principal := StartPrincipal;
 Interest := StartInterest;
 Periods := StartPeriods;
 PeriodsPerYear := StartPeriodsPerYear;

 { Here we calculate the size that the payment array will occupy. }
 { We retain this because the number of payments may change...and }
 { we'll need to dispose of the array when the object is ditched: }
 PaymentSize := SizeOf(Payment) * Periods;

 { Allocate payment array on the heap: }
 GetMem(Payments,PaymentSize);

 { Initialize extra principal fields of payment array: }
 FOR I := 1 TO Periods DO
 Payments^[I].ExtraPrincipal := 0;
 Recalc; { Calculate the amortization table }
END;

CONSTRUCTOR TMortgage.Load(VAR S : TStream);
BEGIN

 S.Read(Periods, Sizeof(Integer));
 S.Read(PeriodsPerYear,SizeOf(Integer));
 S.Read(Principal, SizeOf(Real));
 S.Read(Interest, SizeOf(Real));
 S.Read(MonthlyPI, SizeOf(Real));
 S.Read(PaymentSize, SizeOf(LongInt));
 { Note that we *don't* try to read a pointer in from the stream. That would }
 { be meaningless; instead, we allocate heap space for the payments array }
 { with GetMem and assign the returned pointer to Payments: }
 GetMem(Payments,PaymentSize);
 S.Read(Payments^, PaymentSize);
END;

PROCEDURE TMortgage.Store(VAR S : TStream);
BEGIN
 S.Write(Periods, Sizeof(Integer));
 S.Write(PeriodsPerYear,SizeOf(Integer));
 S.Write(Principal, SizeOf(Real));
 S.Write(Interest, SizeOf(Real));
 S.Write(MonthlyPI, SizeOf(Real));
 { Note that we *don't* store the pointer to the payments array! }
 { A pointer (i.e., a heap address) is meaningless written to disk.}
 S.Write(PaymentSize, SizeOf(LongInt));
 S.Write(Payments^, PaymentSize);
END;

PROCEDURE TMortgage.SetNewInterestRate(NewRate : Real);
BEGIN
 Interest := NewRate;
 Recalc;
END;

{ This method calculates the amortization table for the mortgage. }
{ The table is stored in the array pointed to by Payments. }

PROCEDURE TMortgage.Recalc;
VAR
 I : Integer;
 RemainingPrincipal : Real;
 PaymentCount : Integer;
 InterestThisPeriod : Real;
 InterestPerPeriod : Real;
 HypotheticalPrincipal : Real;
BEGIN
 InterestPerPeriod := Interest/PeriodsPerYear;
 MonthlyPI := CalcPayment(Principal,
 InterestPerPeriod,
 Periods);
 { Round the monthly to cents: }
 MonthlyPI := int(MonthlyPI * 100.0 + 0.5) / 100.0;

 { Now generate the amortization table: }
 RemainingPrincipal := Principal;
 PaymentCount := 0;
 FOR I := 1 TO Periods DO
 BEGIN
 Inc(PaymentCount);
 { Calculate the interest this period and round it to cents: }
 InterestThisPeriod :=

 Int((RemainingPrincipal * InterestPerPeriod) * 100 + 0.5) / 100.0;
 { Store values into payments array: }
 WITH Payments^[PaymentCount] DO
 BEGIN
 IF RemainingPrincipal = 0 THEN { Loan's been paid off! }
 BEGIN
 PayInterest := 0;
 PayPrincipal := 0;
 Balance := 0;
 END
 ELSE
 BEGIN
 HypotheticalPrincipal :=
 MonthlyPI - InterestThisPeriod + ExtraPrincipal;
 IF HypotheticalPrincipal > RemainingPrincipal THEN
 PayPrincipal := RemainingPrincipal
 ELSE
 PayPrincipal := HypotheticalPrincipal;
 PayInterest := InterestThisPeriod;
 RemainingPrincipal :=
 RemainingPrincipal - PayPrincipal; { Update running balance }
 Balance := RemainingPrincipal;
 END;
 { Update the cumulative interest and principal fields: }
 IF PaymentCount = 1 THEN
 BEGIN
 PrincipalSoFar := PayPrincipal;
 InterestSoFar := PayInterest;
 END
 ELSE
 BEGIN
 PrincipalSoFar :=
 Payments^[PaymentCount-1].PrincipalSoFar + PayPrincipal;
 InterestSoFar :=
 Payments^[PaymentCount-1].InterestSoFar + PayInterest;
 END;
 END; { WITH }
 END; { FOR }
END; { TMortgage.Recalc }

PROCEDURE TMortgage.GetPayment(PaymentNumber : Integer;
 VAR ThisPayment : Payment);
BEGIN
 ThisPayment := Payments^[PaymentNumber];
END;

PROCEDURE TMortgage.ApplyExtraPrincipal(PaymentNumber : Integer;
 Extra : Real);
BEGIN
 Payments^[PaymentNumber].ExtraPrincipal := Extra;
 Recalc;
END;

PROCEDURE TMortgage.RemoveExtraPrincipal(PaymentNumber : Integer);
BEGIN
 Payments^[PaymentNumber].ExtraPrincipal := 0.0;
 Recalc;
END;


DESTRUCTOR TMortgage.Done;
BEGIN
 FreeMem(Payments,PaymentSize);
END;

END. { MORTGAGE }
























































August, 1992
GRAPHICS PROGRAMMING


Color Modeling in 256-color Mode


 This article contains the following executables: XSHRP21.ZIP


Michael Abrash


Lately, my daughter has wanted some fairly sophisticated books read to her.
Wind in the Willows. Little House on the Prairie. Pretty heady stuff for a six
year old, and sometimes I wonder how much of it she really understands. As an
experiment, during today's reading I stopped whenever I came to a word I
thought she might not know, and asked her what it meant. One such word was
"mulling."
"Do you know what 'mulling' means?" I asked.
She thought about it for a while, then said, "Pondering."
"Very good!" I said, more than a little surprised.
She smiled and said, "But, Dad, how do you know that I know what 'pondering'
means?"
"Okay," I said, "What does 'pondering' mean?"
"Mulling," she said.
What does this anecdote tell us about the universe in which we live? Well, it
certainly indicates that this universe is inhabited by at least one comedian
and one good straight man. Beyond that, though, it can be construed as a
parable about the difficulty of defining things properly; for example,
consider the complications inherent in the definition of color on a 256-color
display adapter such as the VGA. Coincidentally, VGA color modeling just
happens to be this month's topic, and the place to start is with color
modeling in general.


A Color Model


We've been developing X-Sharp, a real-time 3-D animation package, for several
months now. Last month, we added illumination sources and shading; that
addition makes it necessary for us to have a general-purpose color model, so
that we can display the gradations of color intensity necessary to render
illuminated surfaces properly. In other words, when a bright light is shining
straight at a green surface, we need to be able to display bright green, and
as that light dims or tilts to strike the surface at a shallower angle, we
need to be able to display progressively dimmer shades of green.
The first thing to do is to select a color model in which to perform our
shading calculations, the dot product-based stuff I discussed last month. The
approach we'll take is to select an ideal representation of the full color
space and do our calculations there, as if we really could display every
possible color; only as a final step will we map each desired color into the
limited 256-color set of the VGA, or the color range of whatever adapter we
happen to be working with. There are a number of color models that we might
choose to work with, but I'm going to go with the one that's both most
familiar and, in my opinion, simplest: RGB (red, green, blue). In the RGB
model, a given color is modeled as the mix of specific fractions of full
intensities of each of the three color primaries. For example, the brightest
possible pure blue is 0.0*R, 0.0*G, 1.0*B. Half-bright cyan is 0.0*R, 0.5*G,
0.5*B. Quarter-bright gray is 0.25*R, 0.25*G, 0.25*B. You can think of RGB
color space as being a cube, as shown in Figure 1, with any particular color
lying somewhere inside or on the cube.
RGB is good for modeling colors generated by light sources, because red,
green, and blue are the additive primaries; that is, all other colors can be
generated by mixing red, green, and blue light sources. They're also the
primaries for color computer displays, and the RGB model maps beautifully onto
the display capabilities of 15- and 24-bpp display adapters, which tend to
represent pixels as RGB combinations in display memory.
How, then, are RGB colors represented in X-Sharp? Each color is represented as
an RGB triplet, with eight bits each of red, green, and blue resolution, using
the structure shown in Listing One (page 166). That is, each color is
described by three color components--one each for red, green, and blue--and
each primary color component is represented by eight bits. Zero intensity of a
color component is represented by the value 0, and full intensity is
represented by the value 255. This gives us 256 levels of each primary color
component, and a total of 16,772,216 possible colors.
Holy cow! Isn't 16,000,000-plus colors a bit of overkill?
Actually, no, it isn't. At the eighth Annual Computer Graphics Show in New
York, this past January, Sheldon Linker, of Linker Systems, related an
interesting tale about color perception research at the Jet Propulsion Lab
back in the '70s. The JPL color research folks had the capability to print
more than 50,000,000 distinct and very precise colors on paper. As a test,
they tried printing out words in various colors, with each word printed on a
background that differed by only one color index from the word's color. No one
expected the human eye to be able to differentiate between two colors, out of
50,000,000-plus, that were so similar. It turned out, though, that everyone
could read the words with no trouble at all; the human eye is surprisingly
sensitive to color gradations, and also happens to be wonderful at detecting
edges.
When the JPL team went to test the eye's sensitivity to color on the screen,
they found that only about 16,000,000 colors could be distinguished, because
the color-sensing mechanism of the human eye is more compatible with
reflective sources such as paper and ink than with emissive sources such as
CRTs. Still, the human eye can distinguish about 16,000,000 colors on the
screen. That's not so hard to believe, if you think about it; the eye senses
each primary color separately, so we're really only talking about detecting
256 levels of intensity per primary here. It's the brain that does the amazing
part; the 16,000,000-plus color capability actually comes not from
extraordinary sensitivity in the eye, but rather from the brain's ability to
distinguish between all the mixes of 256 levels of each of three primaries.
So it's perfectly reasonable to maintain 24 bits of color resolution, and
X-Sharp represents colors internally as ideal, device-independent 24-bit RGB
triplets. All shading calculations are performed on these triplets, with
24-bit color precision. It's only after the final 24-bit RGB drawing color is
calculated that the display adapter's color capabilities come into play, as
the X-Sharp function ModelColorToColorIndex() is called to map the desired RGB
color to the closest match the adapter is capable of displaying. Of course,
that mapping is adapter dependent. On a 24-bpp device, it's pretty obvious how
the internal RGB color format maps to displayed pixel colors: directly. On
VGAs with 15-bpp Sierra Hicolor DACS, the mapping is equally simple, with the
five upper bits of each color component mapping straight to display pixels.
But how on earth do we map those 16,000,000-plus RGB colors into the 256-color
space of a standard VGA?
This is the "color definition" problem I mentioned at the start of the column.
The VGA palette is arbitrarily programmable to any set of 256 colors, with
each color defined by six bits each of red, green, and blue intensity. In
X-Sharp, the function InitializePalette() can be customized to set up the
palette however we wish; this gives us nearly complete flexibility in defining
the working color set. Even with infinite flexibility, however, 256 out of
16,000,000 or so possible colors is a pretty puny selection. It's easy to set
up the palette to give yourself a good selection of just blue intensities, or
of just greens; but for general color modeling there's simply not enough
palette to go around.
One way to deal with the limited simultaneous color capabilities of the VGA is
to build an application that uses only a subset of RGB space, then bias the
VGA's palette toward that subspace. This is the approach used in the DEMO1
sample program in X-Sharp; Listings Two and Three (page 166) show the versions
of InitializePalette() and ModelColorToColorIndex() that set up and perform
the color mapping for DEMO1. In DEMO1, three-quarters of the palette is set up
with 64 intensity levels of each of the three pure primary colors (red, green,
and blue), and then most drawing is done with only pure primary colors. The
resulting rendering quality is very good, because there are so many levels of
each primary.
The downside is that this excellent quality is available for only three
colors: red, green, and blue. What about all the other colors that are mixes
of the primaries, like, say, cyan or yellow, to say nothing of gray? In the
DEMO1 color model, any RGB color that is not a pure primary is mapped into a
2-2-2 RGB space that the remaining quarter of the VGA's palette is set up to
display; that is, there are exactly two bits of precision for each color
component, or 64 general RGB colors in all. This is genuinely lousy color
resolution, being only 1/64th of the resolution we really need for each color
component. In this model, a staggering 262,144 colors from the 24-bit RGB cube
map to each color in the 2-2-2 VGA palette. The results are not impressive;
the colors of mixed-primary surfaces jump abruptly, badly damaging the
illusion of real illumination. To see how poor a 2-2-2 RGB selection can look,
run DEMO1, and press the '2' key to turn on spotlight 2, the blue spotlight.
Because the ambient lighting is green, turning on the blue spotlight causes
mixed-primary colors to be displayed--and the result looks terrible, because
there just isn't enough color resolution. Unfortunately, 2-2-2 RGB is close to
the best general color resolution the VGA can display.
Another approach would be to set up the palette with reasonably good mixes of
two primaries but no mixes of three primaries, then use only two-primary
colors in your applications (no grays or whites or other three-primary mixes).
Or you could choose to shade only selected objects, using part of the palette
for a good range of the colors of those objects, and reserving the rest of the
palette for the fixed colors of the other, nonshaded objects. Jim Kent, author
of Autodesk Animator, suggests dynamically adjusting the palette to the needs
of each frame, for example by allocating the colors for each frame on a
first-come, first-served basis. That wouldn't be trivial to do in real time,
but it would make for extremely efficient use of the palette.
The sad truth is that the VGA's 256-color palette is an inadequate resource
for general RGB shading. The good news is that clever workarounds can make VGA
graphics look nearly as good as 24-bpp graphics; the burden falls on you, the
programmer, to design your applications and color mapping to compensate for
the VGA's limitations. To experiment with a different 256-color model in
X-Sharp, just change InitializePalette() to set up the desired palette and
ModelColorToColorIndex() to map 24-bit RGB triplets into the palette you've
set up. It's that simple, and the results can be striking indeed.


Where to Get X-Sharp


The full source for X-Sharp is available in the file XSHPn.ARC in the DDJ
Forum on CompuServe, and as XSHARPn.ZIP in both the programming/graphics
conference on M&T Online and the graphic.disp conference on Bix.
Alternatively, you can send me a 360K or 720K formatted diskette and an
addressed, stamped diskette mailer, care of X-Sharp, DDJ, 411 Borel Ave., San
Mateo, CA 94402, and I'll send you the latest copy of X-Sharp. There's no
charge, but it'd be very much appreciated if you'd slip in a dollar or so to
help out the folks at the Vermont Association for the Blind and Visually
Impaired.
I'm available on a daily basis to discuss X-Sharp on M&T Online and Bix (user
name mabrash in both cases).


Fast VGA Text


This next item comes from The BitMan. (That's how he asked to be described;
don't ask me why.) The Bitman passed along a nifty application of the VGA's
under-appreciated write mode 3 that is, under the proper circumstances, the
fastest possible way to draw text in any 16-color VGA mode.
The task at hand is illustrated by Figure 2. We want to draw what's known as
solid text, in which the effect is the same as if the cell around each
character was drawn in the background color, and then each character was drawn
on top of the background box. (This is in contrast to transparent text, where
each character is drawn in the foreground color without disturbing the
background.) Assume that each character fits in an eight-wide cell (as is the
case with the standard VGA fonts), and that we're drawing text at byte-aligned
locations in display memory.
Solid text is useful for drawing menus, text areas, and the like; basically,
it can be used whenever you want to display text on a solid-color background.
The obvious way to implement solid text is to fill the rectangle representing
the background box, then draw transparent text on top of the background box.
However, there are two problems with doing solid text this way. First, there's
some flicker, because for a little while the box is there but the text hasn't
yet arrived. More important is that the background-followed-by-foreground
approach accesses display memory three times for each byte of font data: once
to draw the background box, once to read display memory to load the latches,
and once to actually draw the font pattern. Display memory is incredibly slow,
so we'd like to reduce the number of accesses as much as possible. With The
BitMan's approach, we can reduce the number of accesses to just one per font
byte, and eliminate flicker, too.
The keys to fast solid text are the latches and write mode 3. The latches, as
you may recall from earlier discussions in this column, are four internal VGA
registers that hold the last bytes read from the VGA's four planes; every read
from VGA memory loads the latches with the values stored at that display
memory address across the four planes. Whenever a write is performed to VGA
memory, the latches can provide some, none, or all of the bits written to
memory, depending on the bit mask, which selects between the latched data and
the drawing data on a bit-by-bit basis. The latches solve half our problem; we
can fill the latches with the background color, then use them to draw the
background box. The trick now is drawing the text pixels in the foreground
color at the same time.
This is where it gets a little complicated. In write mode 3 (which
incidentally is not available on the EGA), each byte value that the CPU writes
to the VGA does not get written to display memory. Instead, it turns into the
bit mask. (Actually, it's ANDed with the Bit Mask register, and the result
becomes the bit mask, but we'll leave the Bit Mask register set to 0xFF, so
the CPU value will become the bit mask.) The bit mask selects, on a bit-by-bit
basis, between the data in the latches for each plane (the previously loaded
background color, in this case) and the foreground color. Where does the
foreground color come from, if not from the CPU? From the Set/Reset register,
as shown in Figure 3. Thus, each byte written by the CPU (font data,
presumably) selects foreground or background color for each of eight pixels,
all done with a single write to display memory.
I know this sounds pretty esoteric, but think of it this way. The latches hold
the background color in a form suitable for writing eight background pixels
(one full byte) at a pop. Write mode 3 allows each CPU byte to punch holes in
the background color provided by the latches, holes through which the
foreground color from the Set/Reset register can flow. The result is that a
single write draws exactly the combination of foreground and background pixels
described by each font byte written by the CPU. It may help to look at Listing
Four (page 167), which shows The BitMan's technique in action. And yes, this
technique is absolutely worth the trouble; it's about three times faster than
the fill-then-draw approach described above, and about twice as fast as
transparent text. So far as I know, there is no faster way to draw text on a
VGA.
It's important to note that The BitMan's technique only works on full bytes of
display memory. There's no way to clip to finer precision; the background
color will inevitably flood all of the eight destination pixels that aren't
selected as foreground pixels. This makes The BitMan's technique most suitable
for monospaced fonts with characters that are multiples of eight pixels in
width, and for drawing to byte-aligned addresses; the technique can be used in
other situations, but is considerably more difficult to apply.

At this point, some of you are no doubt nodding your heads and saying, "Yes, I
see how that would work." Others are probably muttering, "Well, heck, I knew
that; tell me something new." Then there are the rest of you, the VGA
neophytes, the ones with glazed eyes, who think this technique sounds
interesting, but understand maybe 30 percent of what you just read. Where can
you turn for help?
Read on.


Useful VGA Reading


For years, I've recommended Richard Wilton's Programmer's Guide to PC and PS/2
Video Systems (Microsoft Press, 1987, $24.95, ISBN 1-55615-103-9) as a
VGA-programming reference, not because it's perfect, but because it was the
only VGA book I knew of that was good enough to be useful. I've added another
book to my good-enough-to-get list: Programmer's Guide to the EGA and VGA
Cards, Second Edition, by Richard Ferraro (Addison-Wesley, 1990, $29.95, ISBN
0-201-57025-4). This 1000- plus page tome has a wide variety of valuable VGA
information, ranging from registers to BIOS functions to the specifics of
seven manufacturers' SuperVGA implementations, and it has plenty of good
figures. This is, without question, a useful book. However, it is not (sigh),
the ultimate VGA reference I've awaited for five years. The book is not
error-free, especially regrettable in a second edition; for example, the
polarity of the bits in the Color Don't Care register are reversed in the
discussion of read mode 1. Also, although write mode 3 and the latches are
covered, they're discussed in considerably less detail than I'd like to see;
you'd have a tough time figuring out why The BitMan's technique works from
this book alone. And surely Ferraro knows that you have to read display memory
to load the latches before the bit mask can do its job of protecting selected
pixels within a destination byte, because he shows line-drawing code that does
just that. Still, he keeps saying that the bit mask keeps destination pixels
from being modified, as if that happens even if you don't read display memory
first. Nonetheless, I've found this book useful when I've reached for it, and
I've found myself reaching for it increasingly often, and that's the real test
of any reference. If you're a PC graphics programmer, you should probably have
this book on your shelf.


_GRAPHICS PROGRAMMING COLUMN_
by Michael Abrash


[LISTING ONE]

typedef struct _ModelColor {
 unsigned char Red; /* 255 = max red, 0 = no red */
 unsigned char Green; /* 255 = max green, 0 = no green */
 unsigned char Blue; /* 255 = max blue, 0 = no blue */
} ModelColor;






[LISTING TWO]

/* Sets up the palette in mode X, to a 2-2-2 general R-G-B organization, with
 64 separate levels each of pure red, green, and blue. This is very good
 for pure colors, but mediocre at best for mixes.

 -------------------------
 0 0 RedGreen Blue 
 -------------------------
 7 6 5 4 3 2 1 0

 -------------------------
 0 1 Red 
 -------------------------
 7 6 5 4 3 2 1 0

 -------------------------
 1 0 Green 
 -------------------------
 7 6 5 4 3 2 1 0

 -------------------------
 1 1 Blue 
 -------------------------
 7 6 5 4 3 2 1 0

 Colors are gamma corrected for a gamma of 2.3 to provide approximately
 even intensity steps on the screen.
*/

#include <dos.h>
#include "polygon.h"


static unsigned char Gamma4Levels[] = { 0, 39, 53, 63 };
static unsigned char Gamma64Levels[] = {
 0, 10, 14, 17, 19, 21, 23, 24, 26, 27, 28, 29, 31, 32, 33, 34,
 35, 36, 37, 37, 38, 39, 40, 41, 41, 42, 43, 44, 44, 45, 46, 46,
 47, 48, 48, 49, 49, 50, 51, 51, 52, 52, 53, 53, 54, 54, 55, 55,
 56, 56, 57, 57, 58, 58, 59, 59, 60, 60, 61, 61, 62, 62, 63, 63,
};

static unsigned char PaletteBlock[256][3]; /* 256 RGB entries */

void InitializePalette()
{
 int Red, Green, Blue, Index;
 union REGS regset;
 struct SREGS sregset;

 for (Red=0; Red<4; Red++) {
 for (Green=0; Green<4; Green++) {
 for (Blue=0; Blue<4; Blue++) {
 Index = (Red<<4)+(Green<<2)+Blue;
 PaletteBlock[Index][0] = Gamma4Levels[Red];
 PaletteBlock[Index][1] = Gamma4Levels[Green];
 PaletteBlock[Index][2] = Gamma4Levels[Blue];
 }
 }
 }

 for (Red=0; Red<64; Red++) {
 PaletteBlock[64+Red][0] = Gamma64Levels[Red];
 PaletteBlock[64+Red][1] = 0;
 PaletteBlock[64+Red][2] = 0;
 }

 for (Green=0; Green<64; Green++) {
 PaletteBlock[128+Green][0] = 0;
 PaletteBlock[128+Green][1] = Gamma64Levels[Green];
 PaletteBlock[128+Green][2] = 0;
 }

 for (Blue=0; Blue<64; Blue++) {
 PaletteBlock[192+Blue][0] = 0;
 PaletteBlock[192+Blue][1] = 0;
 PaletteBlock[192+Blue][2] = Gamma64Levels[Blue];
 }

 /* Now set up the palette */
 regset.x.ax = 0x1012; /* set block of DAC registers function */
 regset.x.bx = 0; /* first DAC location to load */
 regset.x.cx = 256; /* # of DAC locations to load */
 regset.x.dx = (unsigned int)PaletteBlock; /* offset of array from which
 to load RGB settings */
 sregset.es = _DS; /* segment of array from which to load settings */
 int86x(0x10, &regset, &regset, &sregset); /* load the palette block */
}







[LISTING THREE]

/* Converts a model color (a color in the RGB color cube, in the current
 color model) to a color index for mode X. Pure primary colors are
 special-cased, and everything else is handled by a 2-2-2 model. */
int ModelColorToColorIndex(ModelColor * Color)
{
 if (Color->Red == 0) {
 if (Color->Green == 0) {
 /* Pure blue */
 return(192+(Color->Blue >> 2));
 } else if (Color->Blue == 0) {
 /* Pure green */
 return(128+(Color->Green >> 2));
 }
 } else if ((Color->Green == 0) && (Color->Blue == 0)) {
 /* Pure red */
 return(64+(Color->Red >> 2));
 }
 /* Multi-color mix; look up the index with the two most significant bits
 of each color component */
 return(((Color->Red & 0xC0) >> 2) ((Color->Green & 0xC0) >> 4) 
 ((Color->Blue & 0xC0) >> 6));
}






[LISTING FOUR]

; Demonstrates drawing solid text on the VGA, using The BitMan's write mode
; 3-based, one-pass technique. Tested with TASM 3.0 and MASM 5.1.

CHAR_HEIGHT equ 8 ;# of scan lines per character (must be <256)
SCREEN_HEIGHT equ 480 ;# of scan lines per screen
SCREEN_SEGMENT equ 0a000h ;where screen memory is
FG_COLOR equ 14 ;text color
BG_COLOR equ 1 ;background box color
GC_INDEX equ 3ceh ;Graphics Controller (GC) Index reg I/O port
SET_RESET equ 0 ;Set/Reset register index in GC
G_MODE equ 5 ;Graphics Mode register index in GC
BIT_MASK equ 8 ;Bit Mask register index in GC

 .model small
 .stack 200h
 .data
Line dw ? ;current line #
CharHeight dw ? ;# of scan lines in each character (must be <256)
MaxLines dw ? ;max # of scan lines of text that will fit on screen
LineWidthBytes dw ? ;offset from one scan line to the next
FontPtr dd ? ;pointer to font with which to draw
SampleString label byte
 db 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
 db 'abcdefghijklmnopqrstuvwxyz'
 db '0123456789!@#$%^&*(),<.>/?;:',0


 .code
start:
 mov ax,@data
 mov ds,ax

 mov ax,12h
 int 10h ;select 640x480 16-color mode

 mov ah,11h ;BIOS character generator function
 mov al,30h ;BIOS get font pointer subfunction
 mov bh,3 ;get 8x8 ROM font subsubfunction
 int 10h ;get the pointer to the BIOS 8x8 font
 mov word ptr [FontPtr],bp
 mov word ptr [FontPtr+2],es

 mov bx,CHAR_HEIGHT
 mov [CharHeight],bx ;# of scan lines per character
 mov ax,SCREEN_HEIGHT
 sub dx,dx
 div bx
 mul bx ;max # of full scan lines of text that
 mov [MaxLines],ax ; will fit on the screen

 mov ah,0fh ;BIOS video status function
 int 10h ;get # of columns (bytes) per row
 mov al,ah ;convert byte columns variable in
 sub ah,ah ; AH to word in AX
 mov [LineWidthBytes],ax ;width of scan line in bytes
 ;now draw the text
 sub bx,bx
 mov [Line],bx ;start at scan line 0
LineLoop:
 sub ax,ax ;start at column 0; must be a multiple of 8
 mov ch,FG_COLOR ;color in which to draw text
 mov cl,BG_COLOR ;color in which to draw background box
 mov si,offset SampleString ;text to draw
 call DrawTextString ;draw the sample text
 mov bx,[Line]
 add bx,[CharHeight] ;# of next scan line to draw on
 mov [Line],bx
 cmp bx,[MaxLines] ;done yet?
 jb LineLoop ;not yet

 mov ah,7
 int 21h ;wait for a key press, without echo

 mov ax,03h
 int 10h ;back to text mode

 mov ah,4ch
 int 21h ;exit to DOS

; Draws a text string.
; Input: AX = X coordinate at which to draw upper left corner of first char
; BX = Y coordinate at which to draw upper left corner of first char
; CH = foreground (text) color
; CL = background (box) color
; DS:SI = pointer to string to draw, zero terminated

; CharHeight must be set to the height of each character
; FontPtr must be set to the font with which to draw
; LineWidthBytes must be set to the scan line width in bytes
; Don't count on any registers other than DS, SS, and SP being preserved.
; The X coordinate is truncated to a multiple of 8. Characters are
; assumed to be 8 pixels wide.
 align 2
DrawTextString proc near
 cld
 shr ax,1 ;byte address of starting X within scan line
 shr ax,1
 shr ax,1
 mov di,ax
 mov ax,[LineWidthBytes]
 mul bx ;start offset of initial scan line
 add di,ax ;start offset of initial byte
 mov ax,SCREEN_SEGMENT
 mov es,ax ;ES:DI = offset of initial character's
 ; first scan line
 ;set up the VGA's hardware so that we can
 ; fill the latches with the background color
 mov dx,GC_INDEX
 mov ax,(0ffh SHL 8) + BIT_MASK
 out dx,ax ;set Bit Mask register to 0xFF (that's the
 ; default, but I'm doing this just to make sure
 ; you understand that Bit Mask register and
 ; CPU data are ANDed in write mode 3)
 mov ax,(003h SHL 8) + G_MODE
 out dx,ax ;select write mode 3
 mov ah,cl ;background color
 mov al,SET_RESET
 out dx,ax ;set the drawing color to background color
 mov byte ptr es:[0ffffh],0ffh ;write 8 pixels of the background
 ; color to unused offscreen memory
 mov cl,es:[0ffffh] ;read the background color back into the
 ; latches; the latches are now filled with
 ; the background color. The value in CL
 ; doesn't matter, we just needed a target
 ; for the read, so we could load the latches
 mov ah,ch ;foreground color
 out dx,ax ;set the Set/Reset (drawing) color to the
 ; foreground color
 ;we're ready to draw!
DrawTextLoop:
 lodsb ;next character to draw
 and al,al ;end of string?
 jz DrawTextDone ;yes
 push ds ;remember string's segment
 push si ;remember offset of next character in string
 push di ;remember drawing offset
 ;load these variables before we wipe out DS
 mov dx,[LineWidthBytes] ;offset from one line to next
 dec dx ;compensate for STOSB
 mov cx,[CharHeight] ;
 mul cl ;offset of character in font table
 lds si,[FontPtr] ;point to font table
 add si,ax ;point to start of character to draw
 ;the following loop should be unrolled for
 ; maximum performance!

DrawCharLoop: ;draw all lines of the character
 movsb ;get the next byte of the character and draw
 ; character; data is ANDed with Bit Mask
 ; register to become bit mask, and selects
 ; between latch (containing the background
 ; color) and Set/Reset register (containing
 ; foreground color)
 add di,dx ;point to next line of destination
 loop DrawCharLoop

 pop di ;retrieve initial drawing offset
 inc di ;drawing offset for next char
 pop si ;retrieve offset of next character in string
 pop ds ;retrieve string's segment
 jmp DrawTextLoop ;draw next character, if any

 align 2
DrawTextDone: ;restore the Graphics Mode register to its
 ; default state of write mode 0
 mov dx,GC_INDEX
 mov ax,(000h SHL 8) + G_MODE
 out dx,ax ;select write mode 0
 ret
DrawTextString endp
 end start





































August, 1992
PROGRAMMER'S BOOKSHELF


Literate Programming




Andrew Schulman


In much the same way that General Francisco Franco is still dead, volume 4
(Combinatorial Algorithms) of Donald Knuth's projected seven-volume Art of
Computer Programming is still not out. Neither are volumes 5 (Syntactical
Algorithms), 6 (Theory of Languages), or 7 (Compilers).
DDJ readers are probably familiar with the reason why: Knuth has been on a
ten-year detour from the Art of Computer Programming, working in the field of
computer typesetting (TeX) and typography (METAFONT). In addition to producing
the TeX and METAFONT software itself, Knuth has used this software to produce
an attractive five-volume series, Computers and Typesetting, that includes not
only the documentation for TeX and METAFONT, but also their source code.
What DDJ readers may not be familiar with is that this source code is written
in a programming language called WEB. But when I say "written," I really do
mean written: the source code, and the written description of it, are one and
the same. WEB is a language, quite similar to Pascal (there's also CWEB, which
is quite similar to C), which makes it possible to merge the executable source
for a system with its description. More importantly, it allows you to
construct and present the source in an order which makes "psychological"
sense. Using such a system, software "authors" really do become authors,
concerned with writing and presenting code in a way that makes sense, not so
much to the compiler, but to the reader.
Even if you're not interested in typesetting or typography take a look some
time at Knuth's TeX: The Program (Volume B of Computers and Typesetting) and
METAFONT: The Program (Volume D). You'll not find anywhere else such detailed
presentations of the entire source code--warts and all--for a large program.
It's instructive to consider what life would be like if the source code for
your favorite system were available as a WEB.
What inspired Knuth to issue these 600-page hardcover books of source code?
Literate Programming, a recently issued collection of essays by Knuth and
others from 1974 to 1989, contains a fascinating answer:
Tony Hoare provided a special impetus for WEB when he suggested in 1978 that I
should publish my program for TeX. Since very few large-scale software systems
were documented in the literature, he had been trying to promote the
publication of well-written programs. Hoare's suggestion was actually rather
terrifying to me, and I'm sure he knew that he was posing quite a challenge.
As a professor of computer science, I was quite comfortable publishing papers
about toy problems that could be polished up nicely and presented in an
elegant manner; but I had no idea how to take a piece of real software, with
all the compromises necessary to make it useful to a large class of people on
a wide variety of systems, and to open it up to public scrutiny. How could a
supposedly respectable academic, like me, reveal the way he actually writes
large programs?
This same challenge faces anyone who has ever tried to write about software:
Only small programs seem explicable in the course of an article or even a
reasonably sized book. Yet genuine software tends not to be small, and
generally seems to consist mostly of ugly distractions from whatever major
points you're trying to make. Small programs are toys, and genuine programs
seem impossible to explain in depth. Of course, one could present only a
topdown view of a large program, ignoring the mass of details; but it is in
these details that the program's true worth (certainly its monetary worth!)
probably resides.
Knuth tackles this problem with the idea of programs as webs:
When I first began to work with the ideas that eventually became the WEB
system, I thought that I would be designing a language for "top-down"
programming, where a top-level description is given first and successively
refined. On the other hand I knew that I often created major parts of programs
in a "bottom-up" fashion, starting with the definitions of basic procedures
and data structures and gradually building more and more powerful subroutines.
I had the feeling the top-down and bottom-up were opposing methodologies: one
more suitable for program exposition and the other more suitable for program
creation.
But after gaining experience with WEB, I have come to realize that there is no
need to choose once and for all between top-down and bottom-up because a
program is best thought of as a web instead of as a tree....
When I'm writing a longish program ... I invariably have strong feelings about
what part of the whole should be tackled next. For example, I'll come to a
point where I need to define a major data structure and its conventions,
before I'll feel happy about going further. My experiences have led me to
believe that a person reading a program is, likewise, ready to comprehend it
by Learning its various parts in approximately the order in which it was
written.... Sometimes the "correct" order is top-down, sometimes it is
bottom-up, and sometimes it's a mixture; but always it's an order that makes
sense on expository grounds.
Thus the WEB language allows a person to express programs in a "stream of
consciousness" order.... the fact that there's no need to be hung up on the
question of top-down versus bottom-up--since a programmer can now view a large
program as a web, to be explored in a psychologically correct order--is
perhaps the greatest lesson I have learned from my recent experiences.
In other words, there's a way to present genuine, large programs, give the
reader an understanding of how the entire system fits together, and still not
brush aside messy issues like error recovery, special cases, system
dependencies, tweaks for performance, hacks, kludges, and all the other
seemingly nonalgorithmic issues that make up the bulk of a genuine program.
Such details are crucial to one's understanding. They can't be "black boxed."
This jumping around from top-down to bottom-up and back to top-down isn't just
a matter of how source code gets presented, either. It cuts to the root of
programming itself. As Knuth notes in an amazing essay from 1974 in this
collection ("Structured Programming with go to Statements"):
I have felt for a long time that a talent for programming consists largely of
the ability to switch readily from microscopic to macroscopic views of things,
i.e., to change levels of abstraction fluently.
The name WEB of course comes straight from this notion of a program as a
tangle of high-level and low-level issues.
Above all, what comes through here is the notion of writing a program as
writing. TeX, METAFONT, and WEB all fit together as part of a grand attempt
(apparently partially inspired by Arthur Koestler's Ghost in the Machine) to
break down the division between documents and programs, or at least between
documentation and programs.
One of the key points is the need for a program and its documentation to be
written by the same people. This sounds like an impossible luxury, but in many
cases the awful state of documentation--and the inexplicable misfeatures of a
program--are the result of the "doc" group not knowing the program they have
been hired to describe, and the programmers not thinking about how one would
actually describe or use the program.
What is missing is "the discipline of simultaneously writing and describing a
program." "Manual writing provides an ideal incentive for system improvements,
because you discover and remove glitches that you can't justify in print."
Basically, if you can't explain it in a nonembarrassing way, then it shouldn't
be in the product. "The designer of a new system must not only be the
implementor and the first large-scale user; the designer should also write the
first user manual." This is just a variation on the age-old theme that the
best way to learn is to teach, that the best way to understand something is to
have to explain it. Again, we see that writing software means writing.
Of course, where there is writing, there must be criticism. Not so much
"criticism" in the sense of a "this is good, this is bad" review, but what in
literary criticism is called "close reading." So it is fitting that one
chapter of Literate Programming, a reprint from a column by Jon Bentley on
WEB, contains first a program by Knuth and then a detailed criticism by Doug
McIlroy (who ends up calling Knuth's sample program "a sort of
industrial-strength Faberge egg").
One of the most fascinating parts of Literate Programming is a 100-page
section called "The Errors of TeX," which "describes the milieu of literate
programming, by tracing the history of all changes made to TeX as that system
evolved." This includes a complete error log for TeX from 1978 to 1991.
Interestingly,
... if you ask me whether keeping such a log has helped me learn how to reduce
the number of future errors, my answer has to be No. I kept a similar log for
errors in METAFONT, and there was no perceivable reduction. I continue to make
the same kinds of mistakes.
Oh well.
It seems unfortunate that so much of this book focuses on Pascal. However,
Chapter 12 presents the word count (wc) program from UNIX, rewritten in CWEB
to demonstrate "literate programming" in C. While wc is, of course, a very
simple program, Knuth notes that his "fondest hope is that readers who look at
Chapter 12 will realize how wonderful it would be if the entire UNIX system
and its successors were written in the style of Chapter 12 or something
similar."
Actually, it also made me wish that volumes 1-3 of the Art of Computer
Programming had been written in the style of Chapter 12 or something similar,
or at least in the style of something other than the MIX assembly language. At
any rate, Literate Programming brings back what a pleasure reading source code
can be.


The Dr. Dobb's Handprinting Recognition Contest


Last month marked the official launch of our Dr. Dobb's Handprinting
Recognition Contest, and your response has been exhilarating. Whether it's the
opportunity for you to strut your recognition stuff, or the chance to win a
Macintosh Power-Book 100, we're receiving new entries daily. But just in case
this is the first you've heard about the contest, we'll briefly recap the
details.
The contest began on June 15th, when the official version of the DDJ test
framework, test data, and contest entry from were made available
electronically and by mail. The deadline for submissions is September 15th,
and the winner will be announced in our December 1992 issue.
We've built a platform-independent test harness that, in the most general
case, allows you to plug in your recognizer and check the result. Your
recognizer can use any platform on which the DDJ test harness runs. You don't
need a pen computer or pen operating system to participate in the contest. The
DDJ harness code assumes only the C standard library. Even though you can run
the harness on any platform that has a C compiler, we can only test your code
on Macintosh or PC platform. Assuming your code is portably written, this
shouldn't be a problem.
The test harness is 200 lines of C code, and was written by Ron Avitzur to run
as a batch process using the standard I/O library functions. This code has
been tested on the Macintosh, SPARC, DOS, and Windows 3 platforms. On the PC,
the code compiles with both Borland and Microsoft compilers.
You must send in both source code and an executable. Any other written
commentary or documentation is also welcome. Source code is for publication
and can be in C (or, on the PC, in any language that can be linked to the OBJ
files of the DDJ test harness). Submissions will be judged primarily on
recognition accuracy. Speed is a secondary consideration; third is the
conciseness and elegance of your implementation. As mentioned, first prize is
a Macintosh PowerBook 100, generously provided by Apple Computer.
A more complete description of how the harness works is included on page 60 of
our July 1992 issue as well as with the contest entry information.
--editors












August, 1992
OF INTEREST





Microsoft has announced the completion of the first version of the SDK for the
Messaging API (MAPI). MAPI is a set of messaging function calls that allow
developers to create message-enabled applications. The SDK contains the
Windows operating system DLLs that implement the MAPI calls described in the
finalized portions of the specification. Microsoft will make MAPI available on
four platforms: Windows, DOS, Macintosh, and OS/2, allowing the same basic
messaging functionality across the board.
Microsoft is making the specification for MAPI available to all interested.
Some areas of the specification are still open for developer comment. The
areas still under review are explicitly called out in the specification.
Developers interested in obtaining the MAPI specification can contact
Microsoft Developer Services at 800-227-4679. Reader service no. 20.
Microsoft Corp. One Microsoft Way Redmond, WA 98052-6399 206-882-8080
Introl's C compiler line now includes support for Motorola's 68HC16
microprocessor. The 68HC16 supports multiple memory models, making it more
useful in complicated embedded systems than the 68HC11. This latest offering,
coupled with Introl's C compilers for the 68HC11 and 68332, amounts to a
complete set of development tools that include the compilers, source-level
debuggers, target-specific relocating assembler, stand-alone library, and
support utilities.
Introl C is available for PCs, Macintoshes, and workstations. Prices begin at
$2000 and vary according to platform. Reader service no. 21.
Introl Corp. 9220 W. Howard Ave. Milwaukee, WI 53228 414-327-7171
The ISIcon programming language and ISIcon/SI screen-interface development
system are two new UNIX products from Iconic Software. ISIcon 1.0 is the first
commercial release of the Icon programming language for UNIX/386 platforms,
with new features designed to support a production software development
environment. The Screen Interface (SI) development system combines ISIcon with
the UNIX industry standard character-terminal optimization package, the
extended terminal interface (ETI). This system supports the rapid development
of high-performance, terminal-independent, full-screen user interfaces with
ETI character-terminal text windows, menus, and forms. Front ends and
interfaces developed with SI will operate on devices ranging from character
terminals to graphical workstations running terminal emulators. SI allows you
to both prototype new interfaces and develop transportable front ends to
C-language applications.
ISIcon and ISIcon/SI are both available for 386- and 486-based PCs running
most versions of UNIX. List prices are $395 for ISIcon and $695 for ISIcon/SI.
Reader service no. 22.
Iconic Software Inc. P.O. Box 3097 Lisle, IL 60532 800-621-4266
Magna Carta Software has announced the C Communications Toolkit/Extended DOS,
a C-language developer's library for serial and fax communications. The
toolkit provides interrupt-driven serial communications at speeds of up to
115,200 bps using standard PC hardware. For file transfer, you can choose from
XModem, YModem, ZModem, and Kermit protocols. The package supports
Hayes-compatible modems, emulation of VT100, VT52, and ANSI terminals, and fax
communications.
For 286 protected mode, the toolkit supports Borland C++ and Microsoft C using
the Phar Lap 286 DOS-Extender. Applications developed with either compiler can
use up to 16 Mbytes of memory and use 286 protected-mode instructions. For 386
protected mode, the toolkit supports the Intel C Code Builder 386/486;
MetaWare High C with Phar Lap 386 DOS-Extender; and Watcom C/386 with 386 DOS
extenders from Rational Systems, Phar Lap, or Intel.
The price is $299.95 and includes source code. Reader service no. 23.
Magna Carta Software P.O. Box 475594 Garland, TX 75047-5594 214-226-6909
Coherent 4.0 is a 32-bit version of Mark Williams' UNIX-compatible operating
system. New to the version is the ability to run COFF binaries that run on
other PC UNIX systems (such as SCO UNIX System V/386 3.2.2), allowing users
access to software libraries from other PC UNIX vendors.
The distribution consists of six floppy disks, and takes up less than 10
Mbytes of hard-disk space; the kernel is around 100K. Coherent 4.0's
development tools include an optimizing C compiler, optimizing linker, and
versions of lex, yacc, awk, make, termcap, terminfo, and curses. A new 386
macro assembler is included, as is support for conditional assembly, listings,
and assembly time variables. Text processing is done with nroff and troff;
UUCP, ckermit, and kermit handle communications. The administrative commands
include archiving utilities, a Bourne shell, a Korn shell, System V-style
cron, virtual console support, and online man pages.
Coherent 4.0 costs $99.95. Reader service no. 24.
Mark Williams Co. 60 Revere Drive Northbrook, IL 60062 708-291-6700
TeraTech announced that they have begun shipping Dazzle/VB, an image library
for Visual Basic. In addition to allowing the display of existing images,
Dazzle lets you draw lines, points, boxes, and circles in both filled and
outline versions, using any of 256 colors. Dazzle also has text-drawing
routines which can use any font and point size. Thirty different screen wipes
allow effects such as closing curtains, explosions, implosions, and more. You
can also write your own wipes using the high-speed, partial image-block copy
routines in Dazzle's DLL.
Dazzle Professional has all the same features as Dazzle/VB, plus: grey
conversion and display, full-color negative, image compression, color-image
enhancement, and extended palette control.
The list price of Dazzle is $299; Dazzle Professional costs $499. Reader
service no. 25.
TeraTech 3 Choke Cherry Road, Suite 360 Rockville, MD 20850 800-447-9120
Now shipping from Compiler Resources is Yacc++, an object-oriented rewrite of
lex and yacc. Yacc++ creates classes of lexer and parser objects that act as
call back coroutines within eventdriven applications such as found on Windows.
Presentation Manager, OpenLook, and Motif. The lexer and parser objects are
reentrant and can be active concurrently. They are created from grammars,
which can be modularized and reused, owing to multiple inheritance. The
grammar classes can be dynamically bound to lexer and parser objects.
Yacc++ directly translates regular expressions and produces minimal state
LR(1) lexers and parsers. It comes with the Language Objects Library, a class
library intended for managing lexer and parser objects and developing language
processors.
Single-user professional versions cost $495 for DOS or Windows, $695 for OS/2
or PC UNIX, and $995 for SPARC or Sun workstations. Source code included.
Reader service no. 26.
Compiler Resources Inc. 3 Proctor Street Hopkinton, MA 01748 508-435-5016
M++ QUAD is a C++ numerical integration package from Dyad. The package works
with Dyad's M++ Scientific to allow integration of one- or two-dimensional
functions over finite or infinite limits. The functions can be continuous,
discontinuous, or infinite. The user can select the integration rule, the end
points, the number of points, and the error criteria.
The integration procedure is based upon the adaptive Gauss-Konrod quadrature
rules, but nonadaptive rules can also be selected. QUAD++ offers a set of
special-purpose software modules designed for advanced mathematical
operations, as well as modules for optimization, statistical utilities,
testing, and least squares.
QUAD++ costs $195 for DOS and $245 for UNIX and requires M++ Scientific.
Reader service no. 27.
Dyad Software Corp. 515 116th Ave. NE, Suite 120 Bellevue, WA 98004
800-366-1573 or 206-637-9426
FlashTek has released two new 32-bit DOS extenders: X-32 and X-32VM. X-32 (an
evolution of the DOSX DOS extender included with Zortech C/C++) now includes:
a debugger interface; spawn(), exec(), and system() functions; and extender
function calls. X-32 is compact, with executables either slightly larger or
smaller than equivalent large-model 16-bit programs, depending on program
size.
X-32VM combines all the features of X-32 with up to 3.5 gigabytes of virtual
memory. The package for the Zortech compiler sells for $69.00 and includes
both X-32 and X-32VM. Reader service no. 28.
FlashTek Inc. 804 Airport Way, Suite D Sandpoint, ID 83864 208-263-7311
Support for OS/2 2.0 and new 486 optimization has been included in version 9.0
of the Watcom 32-bit compilers. The compilers are supported by various add-ons
such as libraries for graphics and communications and Windows development
tools. The tools will support development of 32-bit applications for OS/2 2.0,
and use of OS/2 2.0 as a host system enabling 32-bit cross development for a
large set of target 32-bit environments, including Windows and DOS.
The new version costs $895. Upgrades are $99. Reader service no. 29.
Watcom 415 Phillip Street Waterloo, Ontario Canada N2L 3X2 800-265-4555



















August, 1992
SWAINE'S FLAMES


Thought Experiments




Michael Swaine


Assume that the relationship between citizen dissatisfaction and participation
in the electoral process can be described by a curvilinear function: mild to
strong levels of dissatisfaction lead to increases in participation, but
strong to extreme levels lead to disgust with the process and decreased
participation. Under this assumption, is it more strategic for a President
already unpopular among African Americans to attempt to build bridges or to
hire the (former) Los Angeles Police Chief as an advisor in the hope that
people who wouldn't vote for him in any case will be disgusted and stay home?
You are in a burning building with a teenager and a test tube containing a
fertilized human egg. The teenager's leg was broken by the heavy beam that is
pinning her to the floor, but she is conscious and it is just possible that
she could push off the beam and crawl to safety before the roof falls in,
without your help. The egg, on the other hand, can't move unaided no matter
how hard it tries. Just outside are a mobile hospital unit, a team of
obstetricians, and a willing surrogate mother, all ready to give the egg every
chance to grow up to be President of the United States. Also, a boy scout with
a tourniquet to set that leg. You have time to save only one, the teenager or
the egg. State which and be prepared to justify your answer in court.
You are an association of software publishers. The NSA offers you a deal,
involving expedited export approval for some software, getting around the
holdups that the State department imposes on software that includes encryption
capabilities. All the NSA asks of you is that you not tell anyone just what
the deal is. What do you do?
These being thought experiments rather than questions on a test, there are no
right answers, but there are plenty of wrong ones. More on political
participation: If politics is the art of getting elected, the problem of the
homeless, politically speaking, is the nuisance of sidewalk traffic
disruption; the disenfranchised are not a constituency. Congressional
candidate Glenn Tenney (415-574-3420, voice or fax) is my source on
governmental efforts to keep e-mail crackable. Aside to Nelson Richardson in
New York: I can't answer your question about deconstruction and mathematics,
but I'm off to France next week to research it.
And now, here are the answers to June's quiz....


Acronym Event


RTFM (Universally ignored advice.) Read the flaming manual. (You see this a
lot online.)
RTFS (Same advice, but for engineers.) Read the flaming spec.
RTFB (Advice for novice users of Macintosh System 7.) Read the flaming
balloon. (System 7 has "help balloons" for novices.)
RWFM? TFRM, TFUM, OTFG-SM? (Choices, choices.) Read which flaming manual? The
flaming reference manual, the flaming user manual, or the flaming
getting-started manual?
IAEF, RTFM (Last-resort advice.) If all else fails, read the flaming manual.
(The full version of the original acronym.) YWTFS/WRSN&IWTFMASAP (Said the
documenter to the developer.) You write the flaming software real soon now,
and I'll write the flaming manual as soon as possible.
IRTFM&ISCUTFS/W (Why the advice is universally ignored.) I read the flaming
manual and I still can't use the flaming software. ("U" can also mean
"understand.")
ICRTFM; T1WWTFS/WWTFM2 (Why programmers should program.) I can't read the
flaming manual; the one who wrote the flaming software wrote the flaming
manual, too. (For hardware, the acronym might add, &TIFTT, "and translated it
from the Taiwanese.")
RWFM? TJOS, TKR, OTKS? Read which flaming manual? The Joy of Sex, the Kinsey
Report, or the Kama Sutra? (I don't understand this one.)


Anagram Event


Real friend or nuts? = Userland Frontier. Magic land of lost tribes = Bill
Gates and Microsoft. Comment in jest by MIS; grad gets poorer =
Object-oriented programming systems.


Bonus Question


Why is Christmas the same as Halloween? Because Dec(imal) 25 = Oct(al) 31.



















September, 1992
September, 1992
EDITORIAL


Reading Between the Lines




Jonathan Erickson


I admit it. I'm a card-carrying library junkie. Consequently, you can find me
perusing my local library once or twice a week. In this respect, I'm not much
different from other Americans--two-thirds of us visit one of our 115,000
libraries at least once a year; nearly half of us use one on a monthly basis.
Yet even after countless visits to dozens of libraries, I'm still amazed at
the amount of information at our fingertips and the open access we have to it.
I doubt that any other country has made so much information so conveniently
available to so many people. From the subscription library Ben Franklin set up
in 1731 to today's vast Library of Congress, libraries have been one of the
great equalizers in our country. Whether it be by bookmobiles or modems,
everyone--urban and rural, young and old--has access to information.
Unfortunately, as we enter the information age, the times are a'changing.
What's ironic is that just as we're bombarded with more and more information
and as new technologies come online to help us sort through it all, our access
to information is being restricted. (If you don't think so, take a look at the
American Library Association's Less Access to Less Information, an 11-year
chronology that documents bureaucratic efforts to stifle the free flow of
information. The 1981 through 1991 index is available for $17.00 from the ALA,
110 Maryland Ave. NE, Washington, DC 20002.)
Restriction comes in various guises. Budget cuts, for instance, have shut down
more than half the school libraries in California, forced the Brooklyn (NY)
Public Library to close branches most the week, and caused Massachusetts
libraries to lay off nearly a quarter of their staffs.
At one time, technology was the great hope of libraries. Database services,
BBSs, PCs, audio tapes, video cassettes, and CD-ROMs were going to make it
easier to find information, give us access to fragile documents, provide
information to the handicapped, and more. But libraries can't afford to buy
books, let alone PCs with CD-ROM drives.
With its proposal to slash the 1993 federal library budget from $148 million
to $35 million, the federal government isn't helping things. (Let me get this
straight: The government prunes the library budget by 75 percent, while
spending about $140 million a year to store the helium necessary to keep
dirigibles aloft. The legislators say we need to "provide sufficient helium
for essential government activities"--as if there isn't enough hot air in
Washington to do the job.)
Another form of restriction is the government's efforts to "privatize"
information access. In this case, information generated at taxpayers' expense
is given to private corporations who add value and sell it back to taxpayers.
This is fine--as long as that information isn't av ilable only to private
firms and as long as "value" is truly added. Among the examples of
privatization are Hughes/General Electric's lock on LANDSAT image data,
Westlaw's copyrighting of information on the JURIS online system, and the
EDGAR project managed by the Mead Data Central (owners of the LEXUS/NEXUS
service).
Beginning in 1993, the Security and Exchange Commission will require companies
to file data electronically through the $90 million EDGAR (short for
"Electronic Data Gathering, Analysis, and Retrieval") system which taxpayers
paid for and which Mead Data will operate. As a result, EDGAR is perhaps the
most valuable database of corporate activity in the world. The cost for the
you and I to access EDGAR information ranges from $340,000/year for real-time,
high-speed broadcast of SEC filings to $30,000/year for subsets of the current
day's filings on magnetic tape. In neither case will historical data,
cumulative filings, and the like be available to us. Incredibly, the SEC will
receive back from Mead microfiche versions of the data--the government won't
possess electronic versions of the data it owns and which it is requiring
companies to submit electronically.
One ray of hope is House bill HR 2772 (the Senate version is called the "Gore
Bill"). HR 2772 proposes that the Wide Information Network for Data Online
(WINDO) be set up by the Government Printing Office so that anyone--private
citizens and commercial businesses--can walk into one of the 1400 federal
depository libraries and, for a nominal fee, access hundreds of government
databases: economic statistics, federal court cases, SEC disclosure documents,
U.S. and foreign patents, congressional testimonies, and more.
Commercial database vendors aren't happy about this proposal, nor, strangely
enough, is the Office of Management and Budget which is trying to
differentiate between hardcopy and electronic versions of the same
information, proposing that the federal depository libraries receive only the
paper copies. (Ironically, this "push for paper" seems to run counter to the
Paperwork Reduction Act which seeks to make the government "paperless" by the
year 2000.) Electronic versions presumably would be given to commercial
vendors.
What's at stake is our right to know and make informed decisions on things
that affect us. Any barrier to our access of information is a step towards our
disenfranchisement as citizens; the result is a two-tiered society of the
"information haves" and "information have-nots," where the privileged get the
information and everyone else gets Geraldo.
By the way, I found out yesterday that starting next month my local library
has cut back to three days a week.




































September, 1992
LETTERS







If OOP is the Question, is Software Engineering the Answer?


Dear DDJ,
I have been following the software patent issue since you published the
November 1990 article on the subject. Even though I agree with the League for
Programming Freedom, I believe there is another threat to the American
software industry, which is currently being ignored.
The current state of software development in the U.S. can be compared to
furniture manufacturing during the Industrial Revolution. At the beginning of
the Industrial Revolution, someone would either handcraft his own piece of
furniture or go to a master craftsman and have it custom made. As the
Industrial Revolution came into its own, the consumer could buy furniture from
a factory. Of course, furniture from a factory was not custom built for each
consumer, but the consumer could purchase furniture at a lower price.
Most software-development teams can easily be described as groups of master
craftsmen, hand fashioning every part of the entire system they are building.
As master craftsmen, we will often argue that each part must be "optimized"
(or handcrafted) in order to precisely fit the customer's needs. We need to
move away from the art of software development and develop software
engineering methodologies which work in the real world of software scheduling
and development costs.
The beginnings of this methodology can be found in the object-oriented
paradigm. But OOP will fail if the software development community fails to
deal with the "Not Written Here Syndrome" (NWHS). NWHS will prevent the
software industry from gaining the reuse benefits of OOP. Many programmers
simply do not trust any software components they did not write themselves. Of
course many of these software components are inadequately designed,
implemented, tested, and documented to be of any use in the real world.
Changes in the software-development community must begin with the individual
programmers. Each of us must spend the time necessary to analyze, design,
implement, test, and document software components which can be reused. We must
not allow ourselves to fall into the trap of getting the system working today
and putting the rest of the work off until tomorrow. We all know that tomorrow
never comes.
I have not mentioned a single programming language here, since I believe good
software engineering can be accomplished in any language, even though some
languages do encourage better software engineering techniques than others.
If we don't act to correct this problem, the U.S. may lose its world dominance
in the software industry, just as it lost its dominance in the automobile
industry. We now export approximately $22 billion in software, accounting for
80-85% of the world-wide market. This is threatened by the Japanese effort to
develop a "software factory." Already, the Software Engineering Institute has
rated several Japanese companies at level five in their process-maturity
scale. Only a few American companies have been rated at level three.
We must act quickly to foster software-engineering practices, or ten years
from now we may be wondering why the U.S. does not dominate the world market
for software anymore.
David M. Tannen
Tampa, Florida


Yet Another MODman


Dear DDJ,
I must add my bit to the great MOD debate catalyzed by Jeff Duntemann
beginning in the November 1990 "Structured Programming" column.
I was in my first year at the University of Cape Town in 1980. One of our
first assignments was to write a program to return the day of week for a date
using this same algorithm (using Fortran on a punch card chewing UNIVAC). I
used the following to calculate a0..n-1 style MOD:
 day_of_week = mod(mod(mod (value,7) + 7,7)
The professor got quite upset, as he was sure that MOD would always return a
0..n--1 value, as any sane person might expect. He only gave me 90% for the
assignment because of this. Arguing didn't help. The PhD wasn't listening to
some trumped-up 18-year-old! So I proved my point with a few test programs and
was finally awarded 109% (110 less 1% because I didn't comment the use of a
double MOD). Well, that's the way I remember it; the story has probably
improved with age.
I still maintain that this is the best way of handling the problem. In the
February 1992 "Letters" column, David Hall proposes two methods. The first
(k:=i MOD 7; if k < 0 then k:=k + 7;) I personally don't like because I feel
that IF... THEN... ELSE... should be avoided in calculations unless it is an
intuitive part of the algorithm. Here it is being used to circumvent an
illogical language feature. The second method (QuickBasic INT) relies on a
quirk of a particular language, whereas the first (bring on the MIDI orchestra
here) has been used in Fortran, Pascal, C, Lisp, and PLM. The second method is
probably more reliable because there are no IFy...THENy...ELSEy... chunks of
code that go untested until the 1997 summer solstice (or whenever).
Charles Manning
Cape Town, South Africa


Quadratic-root Routines


Dear DDJ,
Nicholas Wilt's informative article "Assembly Language Programming for the
80x87" unfortunately uses a bad example--calculation of both quadratic roots
from the equation we all learned in elementary algebra. One root or the other
inevitably suffers loss of precision due to subtraction of (b{2}-4ac){1/2}
from -b. If ac<<b{2}, this loss can be severe, since the two terms in the
subtraction are nearly the same size. Good practice is to determine the sign
of (-b) and then calculate one root r[1] using that same sign ahead of the
second term. The other root r[2] is then easily obtained from r[1] x r[2] =
c/a.
There is a common misconception that with high-precision arithmetic, serious
errors from this and similar sources are rare. However, the writer of a
general-purpose routine cannot know the circumstances under which it will be
used. It is easy to devise examples where b{2} might exceed 4ac by many orders
of magnitude, and a user of a routine ought not to have to perform additional
tests or to change routines in the middle of a set of data.
A second misconception is that the root that suffers subtractive precision
loss is intrinsically less well-defined by the data. This is not the case. As
the relationship between the two roots given above implies, both are often
known to the same relative precision. The precision loss has nothing to do
with the data--it is an artifact of the calculation.
Wilt's article was a good introduction to math-coprocessor programming.
Perhaps he could be persuaded to provide an amended quadratic-root routine.
Brad Thompson
Saint Peter, Minnesota
Nicholas responds: Brad is correct, of course. I was remiss not to consult
standard references before implementing the routine. To remedy my oversight,
and to comply with his request, I have written a new solve_quadratic() routine
that is more numerically stable.
The new routine is from Numerical Recipes by Press, Flannery et. al (Cambridge
University Press, 1986) and works as follows. The roots of a quadratic
polynomial ax{2}+bx+c=0 can be given in terms of an intermediate variable q,
as shown in Figure 1.
Figure 1
 q = - 1/2[b+b\/--b{2} -4ac]

The roots are then q/a and c/q. The technique is almost identical to the one
given by Thompson, but uses a and c to arrive at the two roots separately
rather than computing the first root and deriving the second from it.
Example 1 shows the new solve_quadratic() routine. My thanks to Brad for
bringing this issue to my attention.
Example 1

 ; quad.asm: quadratic-solving function callable from Borland C++.

 ; Copyright (C) 1991, 1992 by Nicholas Wilt. All rights reserved.
 This routine
 ; is modified from the version published in Dr. Dobb's Journal, March
 1992. ; It's more numerically stable per Brad Thompson's suggestion. The
 technique
 ; is described on pg. 145 of _Numerical Recipes_ by Press, Flannery
 et al.

 .MODEL LARGE,C
 .CODE
 ; int solve_quadratic (double a, double b, double c, double *x1, double
 *x2);
 ; solve_quadratic takes the coefficients of a quadratic polynomial and
 finds
 ; roots of that polynomial. If there are two real roots, it writes
 them back
 ; to x1 and x2 and returns 1. If there are no real roots it returns 0.
 PUBLIC solve_quadratic
 solve_quadratic PROC A:QWORD,B:QWORD,C:QWORD,X1:DWORD,X2:DWORD
 ; Comments show stack contents: Stack top is at left.
 sub sp,2 ; Allocate local
 mov bx,sp ; and point BX at it
 fld C ; c
 fld A ; a c
 fld B ; b a c
 fchs ; -b a c
 ; We're going to need the sign of -b later, so do the
 ; test and store the resulting status word in DX.
 ftst ; DX <- sgn (-b)
 fstsw ss:[bx]
 mov dx,ss:[bx] ;
 fld st ; -b -b a c
 fmul st,st(0) ; b^2 -b a c
 fld st(2) ; a b^2 -b a c
 fmul st,st(4) ; ac b^2 -b a c
 fadd st,st(0) ; 2ac b^2 -b a c
 fadd st,st(0) ; 4ac b^2 -b a c
 fsub ; b^2-4ac -b a c
 ftst ; Return 0 if negative
 fstsw ss:[bx] ;
 mov ax,ss:[bx] ;
 sahf ;
 jae FindRoots ;
 fstp st ; Clear FP stack and return 0.
 fstp st ;
 fstp st ;
 fstp st ;
 xor ax,ax ;
 jmp short LeaveQuadratic
 FindRoots:
 fsqrt ; sqrt(b^2-4ac) -b a c
 xchg ax,dx ; Negate if -b is negative
 sahf ;
 ja FindR1 ;
 fchs ;
 FindR1: fadd ; (2q = -b-sgn(b)sqrt(b^2-4ac)) a c
 fxch st(2) ; c a 2q
 fadd st,st(0) ; 2c a 2q
 fdiv st,st(2) ; c/q a 2q

 les bx,X1 ; Write out first root
 fstp qword ptr es:[bx]
 fadd st,st(0) ; 2a 2q
 fdiv ;
 les bx,X2 ; Write second root
 fstp qword ptr es:[bx]
 mov ax,1 ; Return 1 to say there are roots
 LeaveQuadratic:
 add sp,2 ; Deallocate local
 ret ; Return
 solve_quadratic ENDP
 END



More on the Swap Macro


Dear DDJ,
I'd never thought about a generic swap macro until I read Greg Renzelman's
letter in the April 1992 "Letters" column. While I like his macro, it's not
true that swapping requires a temporary variable; I've come across a swap
implementation that doesn't. The trick lies in using exclusive ORs to "pass
the values through each other." Based on that, I've modified Greg's swap macro
to eliminate the temporary variable; see Example 2 .
Example 2

 =define SWAP(a.b) \
 { \
 unsigned int i; \
 char *aptr = (char *)&(a), *bptr = (char *)&(b): \
 for (i=0; i<sizeof((a)); i++, aptr++, bptr++) { \
 *aptr ^= *bptr; \
 *bptr ^= *aptr; \
 *aptr ^= *bptr; \
 }
 }

 or, more compactly,

 #define SWAP(a,b) \
 { \
 unsigned int i; \
 char *aptr = (char *)&(a), *bptr = (char *)&(b): \
 for (i=0; i<sizeof((a)); i++, aptr++, bptr++) { \
 *aptr ^= *bptr ^= *aptr ^= *bptr; \
 }

Rob Ewan
Whitby, Ontario


C++ String Class Update


Dear DDJ,
In my October 1991 article, "Proposing a C++ String Class Standard," I
promised that feedback on the article would be sent to the ANSI committee on
the C++ string class. As it turns out, the proposal to include the string
class in the C++ standard library was turned down, but I have passed on the
following feedback:
The proposal to have no reserved terminator character in the string class was
generally unpopular. Most critics thought that the benefit of being able to
include null bytes in a string was outweighed by the confusion which would
arise if the string of characters was not null terminated.
String classes should define a type substring, so that, among other things, a
replace function with a substring& argument could do the work of insert and
delete functions. Matching/searching functions should work in terms of
substrings, and should be capable of matching regular expressions.
My suggestions for overloaded operators were not appreciated.
Matching/scanning functions should allow specification of a starting position.
Functions should be provided to map a set of characters in a string to another
set of characters, along the lines map("nxt","NXT") which would map "next" to
"NeXT". Functions should be provided to search for characters from a specified
set. (This would be met by the regular expression suggestion.)
Functions should be available to justify strings, left, right, or center in a
specified width.
You should be able to specify fixed-length strings which could be included as
part of a struct. The context here was serial communications.

Steve Teale
Budd Lake, New Jersey


A Letter from the President of M&T Publishing




Ten Years After: Still Reinventing the Wheel


The computer and software industries have had a spate of tenth anniversaries
lately: Lotus, Sun, Adobe, Autodesk, Compaq, Silicon Graphics, and Symantec.
It seems the company baby boom generation spurred by the personal computer
phenomenon is reaching early maturity. It would be tempting to pine for the
West Coast Computer Faire, software in Ziploc plastic baggies, Tiny Basic on
the 68XX, and the wild west days of computers and computer publishing, if
nostalgia weren't so misplaced.
I will allow myself one brief reminiscence. I remember meeting with Mike
Swaine for the first time in late 1983 at Scott's Seafood in Palo Alto. M&T
had been in business a little over a year as the U.S. sales office for Markt &
Technik, our German parent company. We were eager (if not ready) to enter the
U.S. market for computer publications. A license and subsequent purchase of
Dr. Dobb's Journal from the People's Computer Company was our immediate goal.
Mike was the prospective editor of DDJ (and continues as a key contributor and
editor-at-large to this day). I had read Fire in the Valley, which Mike
co-authored with his then-colleague at Infoworld, Paul Freiberger. In Fire,
Mike had documented the awkward beginnings, obsessive characters, and bold
sense of mission that typified the Homebrew Computer Club and other seminal
happenings in microcomputing, as we called it then. Mike and I talked warmly
about the potential for software development and for DDJ. Actually I did most
of the talking, which is typical in a pre-beer Swaine discussion, but he
nodded, raised his eyebrows enthusiastically, and stroked his beard a lot. I
felt deeply connected, not just to the historian, but to the history of the
PC. We talked about continuing the DDJ tradition of sharing information among
interested programmers, imagining the works they would achieve.
Much more history has unfolded since those days. We live in a world with more
than 100 million personal computers. More than 30 million more will be shipped
this year (Infocorp). Most office users are now hooked to networks and four
out of five will be by 1994 (says the Yankee Group). There are more than one
million professional software programmers and analysts in the U.S., says the
government. M&T's audience has grown from 18,000 to more than 300,000 paying
readers, all professionals in software development or network systems design
and integration.
Lately, it seems the bloom is off the personal computing rose. Doubt about the
productive benefit of computers is growing. Times are tough. Where is the
spreadsheet for the 90s? Why are hardware companies losing so much money? You
can sense the self-doubt in a proposed computer industry seminar topic, "Can
Technology Take Us to a New Level of Significant Opportunity?"
If we view the creative, economic, and practical achievements of the last ten
years as a basis for new innovation, a resounding "yes" to this question
follows easily. Tens of millions of graphical, networked PCs are a tremendous
potential resource for programmers and system designers in the decade ahead.
Millions of portable devices with pen interfaces and wireless communications
are also soon to come. They represent a test-bed for software that delivers a
"connected Dynabook." Embedded systems are another vast area of opportunity.
The possibilities are getting really interesting. The fact is, we have just
begun to address the world's challenges with software, because we are just
starting to have an infrastructure upon which to solve group problems.
I was speaking with a colleague recently who said, "Software is like the
wheel." She went on to mention how much of the urban, industrial development
of mankind followed from understanding and applying the power of the wheel to
the physical world. Software is like a wheel for the mind. It presages an
information, communication, and creativity-based future that is hard to
foresee. Historically speaking, software has just been invented. M&T, by
serving you with technical information, is committed to its further spread for
at least the next ten years.












































September, 1992
POSTMORTEM DEBUGGING


More accurate debugging for Windows development


 This article contains the following executables: CORONER.ARC


Matt Pietrek


Matt works for a California programming-tools vendor, specializing in
debuggers and file-format programming. He is coauthor of Undocumented Windows
(Addison-Wesley, 1992) and can be reached at CIS 767117,1720.


Although relatively new to the world of PC programming, postmortem debugging
has been around for a long time. On some systems, for example, the technique
is known as "core dumping." Whatever name it goes by, the fundamental idea is
that of simply taking a snapshot of the state of the machine at the time of
the crash.
This article focuses on postmortem debugging under Windows. It's also possible
to implement a postmortem debugger for DOS and OS/2. However, it is more
difficult to do so because of the way protected mode works (or doesn't work)
with these operating systems. For instance, because DOS applications run in
real mode, you can't generate a general protection (GP) fault when you use a
NULL pointer or access beyond the limits of a segment--these actions silently
destroy the integrity of your DOS program. Consequently, the crash may not
occur until much later, and the code that takes the snapshot might never be
invoked. Windows programs, on the other hand, run in protected mode, so a NULL
pointer reference or an access beyond a segment limit will cause a GP fault.
The postmortem facility can be set to kick in at that time.
What some programmers view as a weakness of Windows actually becomes a
strength when postmortem debugging enters the picture. For instance, Windows
has a single address space for all programs (not including virtual DOS
machines). Because of this shared address space, it's possible for one task to
handle the exceptions of another, as well as access the memory in its address
space. This makes it significantly easier to implement postmortem facilities
under Windows than under OS/2.
Although OS/2 runs in protected mode, it's not possible to trap GP faults of
another process. To allow one process to handle the exceptions of another
would seriously weaken the stability and security that OS/2 offers. In OS/2
2.0, a process can register an exception handler and handle the exception
itself. This "intrusive" approach, however, requires you to add extra code to
the application.


What You Can Get from Postmortem Debugging


The fundamental requirement for successful postmortem analysis is to make sure
there is sufficient debug information available to the analysis tools. If
debug information is not available, postmortem analysis can be tedious at best
and a pile of meaningless numbers at worst. (This is one reason why Microsoft
ships debug information with the debugging version of Windows.)
The type of debug information you can expect from a postmortem debugger
includes that for registers, stack traces, memory, as well as miscellaneous
details about the system.
Registers. The most fundamental question you can ask about a crash is, "Where
did it occur?". That question can be answered by looking at what the CS:IP was
at the time of the exception. Of course, seeing a number like "012F:2521"
isn't much help. By adding debug information and undertaking some mechanical
processing, it's possible to convert that address into something more useful,
like: "six bytes past the start of the PostMessage( ) function." Given more
detailed information, it's possible to get an even better description, like:
"Line 17, inside of FOO.C".
Other registers are often useful too. For instance, assume you know that the
faulting instruction was MOV ES:[BX],1. Good postmortem information will tell
you that the segment limit of the selector in the ES register is 01FFh. You
can then look at BX and see that it contained the value 0200h. This tells you
that the problem was due to accessing memory beyond the segment's limit.
Stack Trace. After "Where did the program crash?", the next question is
usually "How did the program get to that point?" The answer is in the stack
trace. Whenever you call a function, the return address is saved on the stack.
By exploiting this knowledge, it's possible for a postmortem debugger to walk
(or "crawl") each frame on the stack and obtain the return address. As
described earlier, these return addresses are usually meaningless as raw hex
numbers, but by applying debug information (and a little elbow grease), the
numbers can be converted into meaningful information such as function names
and line numbers.
Stack tracing is one of the reasons postmortem analysis is only as good as the
debug information. Unfortunately, many a programmer will blindly start
analysis without having produced debug information for the tools to work with.
The result is a frustrated programmer who sees nothing but a bunch of
meaningless numbers.
A similar situation can occur even with a moderate amount of debug
information. The classic example is the programmer who tries to use as debug
information the Windows SDK SYM files, which are devoid of addresses for the
undocumented internal functions in Windows. If a stack trace went through one
of these undocumented functions, the symbolic name shown for that frame will
not be the real function--instead, it will be the closest documented function
found earlier in the code segment.
Memory. By itself, a raw dump of the faulting program's memory probably isn't
going to be much help. Still, it's possible to get useful information if the
memory is shown in the proper context. For instance, a postmortem report could
show some memory around each stack frame, along with the name of the function
that it corresponds to. If you know the size and order that the passed
parameters were passed in, you can determine their values. Doing so can be
tedious, but with some coding and sophisticated debug information (like that
used by Turbo Debugger for Windows and Codeview), it's possible to get the
names, types, and values for all of the parameters and local variables in each
stack frame. This same idea applies to global data in data segments.
System Information. Oftentimes, the cause of a crash is related to the state
of the system at the time of the crash and is not reproducible under other
circumstances. In these cases, it's important to have a clear picture of the
state of the operating system as a whole. Under Windows, this includes a list
of all the modules (programs and DLLs) currently loaded (including file
dates/times), all the tasks in the system, the state of the memory manager,
the version of Windows, the sequence of Windows messages that the faulting
task processed, and so on.
A major cause of the "system-configuration dependent" bugs is incompatible
versions of programs and DLLs that don't interact properly. For instance, a
bug that manifests itself only when an old DLL is used with a newer copy of an
application can be found by looking at the file dates of the EXE/DLL involved.
A software quality assurance department can take this information and use it
to filter out the known bugs, leaving the development team free to concentrate
on new problems. Additionally, an analysis of these duplicate bugs can lead to
a prioritized list of the most common problems for the developers to tackle
first.


Postmortem Tools


In this section, I'll examine a few of the Windows postmortem tools available
to programmers. This list is by no means comprehensive, but it does include
the tools I have the most experience with.
Multiscope Debugger. At this writing, the Multiscope debugger is the most
ambitious postmortem debugging tool I've run across. At the heart of the
debugger is MED.EXE, a Windows program that's run when you start up Windows
(or at least before you know a program is going to UAE). MED lies dormant
until a UAE occurs. At that time, it writes out to disk a binary data file
containing information such as register values, contents of the memory in use
by all the running programs, and so on.
Analysis of the data is performed by a separate program Multiscope calls the
"crash analyzer" (a variation of the Multiscope runtime debugger) that cannot
run the program and has a restricted set of options. For instance, you can't
set breakpoints because it's meaningless to set breakpoints in a program that
can't be executed.
In all other respects, the crash analyzer looks and acts like the Multiscope
runtime debugger. Register contents are viewed by opening a register window,
the call stack is viewed by opening the call window, and so on. You can
examine the values of local and global variables just as if you were running
the runtime debugger at the time of the exception. The benefit is that you
don't have to learn different programs for postmortem and runtime debugging.
The downside is that the postmortem file can get quite large. Additionally,
since the postmortem information is kept in a binary file, it's difficult to
post snippets of the information when asking for help via a BBS. Instead, the
entire postmortem file must be oaded into the crash analyzer.
Dr. Watson. Dr. Watson comes with Windows 3.1 (but will work equally well in
Windows 3.0). It was originally intended as a way for Microsoft to obtain
information about where the common UAEs were occurring and who was
responsible.
Dr. Watson, like MED.EXE, is run when you start up Windows. Later, if a GP
fault occurs, Dr. Watson writes out a file (DRWATSON.LOG) to the disk. Unlike
the postmortem file from the Multiscope MED.EXE, the DRWATSON.LOG file is
ASCII text. Reports on subsequent faults are appended to the DRWATSON.LOG
file, so previous GP fault information isn't lost.
The symbolic stack-trace capabilities of Dr. Watson are provided via SYM
files. As the stack of the faulting application is walked, Dr. Watson looks
for a SYM file for the appropriate module. If a SYM file is found, Dr. Watson
reads it in, and the resulting entry in the DRWATSON.LOG file contains the
name of function that the stack frame is for. The Windows SDK comes with SYM
files for the core Windows DLLS. You can also make your own SYM files from
Microsoft-style MAP files with the MAPSYM utility, which is also in the SDK.
Dr. Watson can hook into the "notification stream" and receive the
parameter-validation notifications. By setting an option in the DRWATSON.INI
file, you can tell it to treat parameter-validation errors as if they were
UAEs and have Dr. Watson dump out a miniature version of a UAE report,
including the all-important stack trace.
WinSpector. WinSpector comes with Borland C++ 3.1. Conceptually, it is a cross
between Dr. Watson and the Multiscope debugger. While there isn't a
full-fledged UI for postmortem analysis, you can obtain much of the
information available in the Multiscope debugger.
Like the other two programs, WinSpector lies dormant until a GP fault occurs.
When it springs into action, it writes out an ASCII text file (WINSPCTR.LOG).
It also writes out a second file, WINSPCTR.BIN, that contains the contents of
all the data segments in use by the faulting task. A post-processing utility
(DFA.EXE) can then take the WINSPCTR.LOG and WINSPCTR.BIN file and combine
them with the Turbo Debugger debug information to produce a text file that
contains source-file and line-number information for each frame on the stack
(where possible). Additionally, it outputs the names, types, and values of all
the local and global variables. If you choose not to run DFA, you may still
have symbolic stack traces , as WinSpector works with SYM files in a manner
similar to Dr. Watson.
Along with WINSPCTR.EXE and DFA.EXE, there's also a trio of programs that
allow you to make SYM files for your own programs. Additionally, it's possible
to make SYM files for any Windows EXE/DLL that has exported functions. For
instance, you can create your own SYM files for USER, KRNL286, KRNL386, and
GDI that include all the exported functions, rather than just the documented
ones. These improved SYM files are often the difference between a confusing
stack trace and one that pinpoints the problem.
More Details.
Coroner. CORONER.EXE is a sample application from the TOOLHELP chapter in
Undocumented Windows (Addison-Wesley, 1992). Although nowhere near as
comprehensive as the previous programs, it does provide a demonstration of
much of the TOOLHELP API. You can take the code and customize it, adding your
own routines to provide whatever information you'd like to see in a postmortem
report. While I won't discuss the program here, it is available electronically
through Dr. Dobb's; see the book for a complete description of the program.


A Postmortem Example



Example 1 shows BAD.C, a program that intentionally causes a GP fault.
Although it doesn't do anything useful, BAD.C does illustrate some of the
concepts described above. For more general information about the types of
problems postmortem debuggers can help you track down, see the textbox
entitled, "Common Problems and How to Spot Them."
Example 1: BAD.C intentionally causes a GP fault.

 #include <windows .h>
 #include <string .h>
 #include <dos .h>

 int MeaningOfLife = 0x42; // A meaningless global,
 // for DFA demonstration

 void Foo (void far *ptr)
 {

 _fstrlen(ptr); // This call will GP fault
 }

 static void Bar(void far *ptr)
 {

 void far *b; // Local var for DFA demonstration

 b = ptr;

 Foo(b);
 }

 int PASCAL WinMain(
 HANDLE hInstance,
 HANDLE hPrevInstance,
 LPSTR lpszCmdLine,
 int nCmdShow

 )
 {

 Bar (MK_FP(1.0)); // pass bad pointer
 return 0;
}

In keeping with the rule that your analysis is only as good as your debug
information, the first step in preparing this example for postmortem debugging
is to provide as much debug information as possible. To that end, I compiled
the program with Turbo Debugger debug information, and told the linker to
generate a detailed MAP file. TMAPSYM was run on BAD.MAP to produce a BAD.SYM
in the same directory as BAD.EXE. By doing this, I provided two sources of
debug information (the Turbo Debugger information and the SYM file), neither
of which is dependent on the other.
More Details.
Loading WinSpector and running BAD.EXE results in file WINSPCTR.LOG; see
Example 2. (Portions of the file are not shown, for brevity's sake.) Near the
top of the listing, you see that the module name of the task that UAEed was
"BAD". A few lines down, note that the faulting instruction was "REPNE SCASB."
SCASB is one of the instructions that implicitly references memory through the
ES:DI register combination. A quick glance at the registers section of Example
2 shows that ES contains 0. That explains why the GP fault occurred. You may
be thinking that the segment portion of the far pointer contained 1, not 0,
but it turns out that the CPU automatically converts values between 0 and 3
into a 0 when loading into DS or ES.
Example 2: The file WINSPCTR.LOG generated by loading WinSpector and running
BAD.EXE.

 WinSpector failure report - 5/17/1992 08:59:57
 Exception 13 at BAD 0001:082D (1117:082D) (TASK=BAD)
 Disassembly:
 1117:082D REPNE SCASB
 1117:082F XCHG AX, CX
 1117:0830 NOT AX
 1117:0832 DEC AX
 1117:0833 POP DI
 Stack Trace:
 0 BAD __fstrlen + 0017
 CS:IP 0001:082D (1117:082D) SS:BP 10DF:164A
 C:\ART4\BAD.EXE
 1 BAD _Foo + 000E
 CS:IP 0001:017E (1117:017E) SS:BP 10DF:1654

 C:\ART4\BAD.EXE
 2 BAD _Foo + 002E
 CS:IP 0001:019E (1117:019E) SS:BP 10DF:1660
 C:\ART4\BAD.EXE
 3 BAD WINMAIN + 000A
 CS:IP 0001:01AD (1117:01AD) SS:BP 10DF:1668
 C:\ART4\BAD.EXE
 4 BAD <no info>
 CS:IP 0001:00B3 (1117:00B3) SS:BP 10DF:1676
 C:\ART4\BAD.EXE
 Registers:
 AX 0000
 BX 0238
 CX FFFF
 DX 0000
 SI 0232
 DI 0000
 SP 1648
 BP 164A
 IP 082D
 FL 0297
 CS 1117 Limit: 089F execute/read
 DS 10DF Limit: 267F read/write
 ES 0000 Limit: 0000 NULL
 SS 10DF Limit: 267F read/write

In the stack-trace portion of Example 2, each stack frame contains information
about a particular function call that was executed to get to the UAE. Stack
frame 0 indicates the exact CS:IP at the time of the exception. Each
subsequent frame is one function call removed from the GP faulting function.
If the name of the function can be determined for a particular frame, it
appears at the end of the first line in each frame entry.
The CS:IP for each stack frame can be seen by examining the second line of
each stack frame. It's important to note that the CS:IP is given in terms of a
logical address, as well as the physical address. For more information, see
the text box entitled, "Logical and Physical Addresses."
Some programmers find it easier to read stack traces in reverse order,
starting at the bottom and working to the top. Here, starting from entry 4 and
working up, you read the stack trace as: "Some unknown function called
WinMain(). WinMain() in turn called Foo(), which called Foo(). Foo() called
_fstrlen(), which is where the GP fault occurred."
But wait a minute! WinMain() doesn't call Foo(). We can clearly see from BAD.C
that WinMain() calls Bar(). Bar() then calls Foo(). What's going on here?
Notice in BAD.C that Bar() is declared as a static function. This makes the
function nonpublic; hence, Bar() does not appear in the BAD.MAP file or the
BAD.SYM file. Because Foo() was the closest preceding symbol that did appear
in the MAP/SYM files, it is displayed as the function name in stack frame 2.
This illustrates why postmortem analysis is only as good as your debug
information. This same problem occurs in stack traces that thread through the
Windows DLLs. The SYM files that Microsoft provides with the SDK contain only
symbol information for the documented APIs. Thus, you can see function names
in a stack trace that you know your program isn't calling. The key to
determining the stack trace's trustworthiness involves using the offset that
appears after the function name in each stack trace. For instance:
"POSTMESSAGE+002E" means that the CS:IP for the frame was 2Eh bytes past the
start of the POSTMESSAGE function. If you look closely at the two frames that
claim to be Foo(), you'll see that the offset for frame 1 is only 000Eh past
Foo(), whereas for frame 2 it is 002Eh. The point is that the larger the
offset, the less likely you are to actually be in the named function. You have
to judge how truthful the stack trace is.
A more reliable way of determining which function your code is really in
involves using debug information more sophisticated than that provided by SYM
files. For instance DFA.OUT, the more accurate report shown in Example 3, was
produced by running DFA WINSPCTR.LOG WINSPCTR.BIN. This report was generated
by crunching the information obtained from the two WINSPCTR files and adding
in the Turbo Debugger information in the BAD.EXE file. You can see that the
stack trace correctly shows WinMain() calling Bar(), which then calls Foo().
Since the Turbo Debugger information contains more information than just the
public symbols, it's much more accurate. Additionally, frames 1, 2, and 3 also
contain a source file and a line number. In those lines in BAD.C, you'll see
that the report is correct in terms of where execution was inside each
function.
Example 3: The file DFA.OUT, a report generated by crunching the information
obtained from two WINSPCTR files and adding in the Turbo Debugger information
in the BAD.EXE file.

 0 BAD __fstrlen +0017
 CS:IP 0001:082D (1117:082D) SS:BP 10DF:164A
 C:\ART4\BAD.EXE

 1 BAD _Foo +000E
 CS:IP 0001:017E (1117:017E) SS:BP 10DF:1654
 C:\ART4\BAD.EXE
 BAD.C line: 10

 SS:1658 ptr
 void far * 0001:0000

 2 BAD Bar +001B
 CS:IP 0001:019E (1117:019E) SS:BP 10DF:1660
 C:\ART4\BAD.EXE
 BAD.C line: 19

 SS:165C b
 void far * 0001:0000

 SS:1664 ptr
 void far * 0001:0000

 3 BAD WINMAIN +000A

 CS:IP 0001:01AD (1117:01AD) SS:BP 10DF:1668
 C:\ART4\BAD.EXE
 BAD.C line: 29

 SS:1674 hInstance
 unsigned int 10DE

 SS:1672 hPrevInstance
 unsigned int 0000

 SS:166E lpszCmdLine
 char far * 10D7:0080

 SS:166C nCmdShow
 int 0001

 4 BAD <no info>
 CS:IP 0001:00B3 (1117:00B3) SS:BP 10DF:1676
 C:\ART4\BAD.EXE

 Module: BAD
 Filename: C:\ART4\BAD.EXE

 Segments:
 Segment 01 Selector: 1117 Length 08A0 CODE
 Segment 02 Selector: 10DF Length 2680 DATA

 Data Dumps:

 Segment: 02 Selector: 10DF Length 2680 Offset: 0176

 0002:0056 _MeaningOfLife
 int 0042

As well as providing more accurate information about where the program was
executing, the addition of more complete debug information allows DFA to show
the names, types, and values of each parameter and local variable in every
frame that has debug information. Since frames 0 and 4 are in the Borland C++
runtime library, there is no source file or variable information for those
frames.
At the end of the file, you can see that the global variable MeaningOfLife
contains 0x42, which is what we initialized it to in BAD.C. While there are
many other public global variables in the runtime library, DFA does not
display them because there's no type information.


Conclusion


As with runtime debuggers like Turbo Debugger for Windows, Codeview for
Windows, or Multiscope, postmortem debugging does not free you from having to
think. It's still critical that you understand how your program interacts with
the operating system and other programs. Although you may end up having to
spend time with a runtime debugger, going through postmortem analysis can
dramatically narrow the problem to a manageable scope.


Common Problems and How to Spot Them


In looking at many postmortem files, certain problems show up over and over
again. Here's a short list of the most common UAEs and how they'll show up in
the postmortem analysis.
NULL Pointers. The code generators of most PC compilers use the ES register to
access memory when a far pointer is used. Typically, the ES register is set up
via the "LES" instruction.
For instance, the code LES BX,[BP+6] loads ES:BX with the far pointer passed
to the function as a parameter. If a NULL far pointer was inadvertently
passed, then [BP+06] contains 0. Unfortunately, a GP fault does not occur upon
execution of this instruction. It's perfectly legal to have a value of 0 in a
segment register. You cannot use the segment register to access memory,
however. Thus, the GP fault doesn't occur on the example instruction, but on a
subsequent instruction. The key to finding this bug is to look for a value of
0 in the ES register. You can look at the disassembly to see why ES was loaded
with a 0 value. In Windows 3.1, the parameter-validation mechanism will check
for NULL pointers in many cases, and attempt to prevent the application from
UAEing.
Invalid Pointers. This is really a variation on the previous bug. Mercifully
though, if you try to load a segment register with a value that's not a legal
LDT/GDT selector, the GP fault will occur on that instruction, rather than a
subsequent instruction.
A subtle variation on this bug occurs when the value you attempt to load looks
perfectly normal, and in fact has been successfully loaded before. The prime
example of this is inside a WEP routine, when Windows has already discarded
the data segment of the DLL. I spent several hours tracking this problem down
once, so I now am always on the lookout for this situation.
Accessing Memory Beyond the Segment Limit. This bug comes in two different
incarnations. The first version typically manifests itself to beginning
Windows programmers, who have dutifully called MakeProcInstance(), but
forgotten to export their window procedure. When this happens, the DS register
is not set up to point at the data segment of the application. The faulting
instruction will look something like: MOV AX, [7452].
In this case, the use of the DS register is implicit. By checking the limit of
the segment pointed to by the DS register, you can determine whether offset
7452h is within the limits of the segment. A dead giveaway of this bug is when
the DS register contains a different value than the hInstance of the program.
This is almost always the result of forgetting to export the window procedure
properly.
The other common variation of the "segment overrun" bug is caused by accessing
memory beyond the allocated size of an array. Once again, the key is to look
at the instruction and figure out at what offset in the segment the memory
access will occur. You can then check the segment limit to see if the intended
access really was past the limit of the segment.
Also, don't forget that string instructions such as LODSB, MOVSB, and the like
reference memory implicitly, via the SI and DI registers. If the faulting
instruction was MOVSB, for instance, you'll have to look at the DS and ES
limits, as well as the SI and DI registers, to figure out exactly what went
wrong.
--M.P.




Logical and Physical Addresses


Windows EXE and DLL files are called New EXE files because they have a
different format then traditional MS-DOS files. When New EXE files are linked,
each segment is placed in a different section of the file. (The linker can
combine multiple different segments in the .OBJ file into 1 segment in the EXE
file. This is what the "pack" option in the linker refers to.) To quickly find
each of these segments, a segment table is created in the EXE file that allows
Windows and other programs to locate the file offset of a given segment's
code/data.
When referring to a particular segment, we use its order in the segment table.
Thus, the first segment in the table is logical segment 1, the second is
logical segment 2, and so on. You can determine the number of segments, their
size, and other information by running Borland's TDUMP or Microsoft's EXEHDR.
Alternatively, the MAP file for your program contains this same information.
The addresses given in a Windows MAP file are examples of "logical addresses."
When a EXE or DLL is loaded, Windows allocates a selector for each segment in
the module. When the segment is needed, the base address of the allocated
selector is pointed at a free block of memory, and the data from the segment
on disk is read in. This process happens at module loadtime for PRELOAD
segments. LOADONCALL segments have selectors allocated for them, but memory is
not actually "committed" until the selector is loaded into a segment register,
causing a "segment not present" fault.
For any given EXE or DLL file, the logical segment that a function or variable
is in will never change. The selector value, on the other hand, is whatever
selector was allocated to store the particular logical segment. In other
words, the selector Windows uses for a particular logical segment can change
between different invocations of the EXE or DLL.
A logical address is comprised of a module name, a logical segment, and an
offset. For instance "USER 0001:65EA" means offset 65EA in the first segment
of USER.EXE. In a WINSPCTR.LOG file, where code addresses are given, the
logical address appears first.
A physical address is comprised of a selector and an offset. The selector
value is what the CPU loads into a segment register to access an "instance" of
the logical segment that's been loaded into memory. Thus, a typical physical
address that Windows might use would be 09CF:65EA. If a logical address is
given in a WINSPCTR.LOG file, its corresponding physical address appears
afterwards in parentheses.
The "mapping" from a logical segment to its allocated selector (and hence, its
physical address) can be found in the segment table inside a module table.
This segment table is a variation of the segment table found in the New EXE
file. The information to go from a selector to its logical segment is stored
in the "Arena header" of the global-memory memory block. The details of these
mappings are discussed in Undocumented Windows.
Every logical address has a corresponding physical address, but every physical
address does not necessarily have a logical address. Blocks allocated by
programs from the global heap do not. To have a logical address, the segment
has to have originated in the segment table of the New EXE file.
--M.P.



_POSTMORTEM DEBUGGING_
by Matt Pietrek

[EXAMPLE 1]

#include <windows.h>
#include <string.h>
#include <dos.h>

int MeaningOfLife = 0x42; // A meaningless global,
 // for DFA demonstration

void Foo(void far *ptr)
{
 _fstrlen(ptr); // This call will GP fault
}

static void Bar(void far *ptr)
{
 void far *b; // Local var for DFA demonstration

 b = ptr;

 Foo(b);
}

int PASCAL WinMain(
 HANDLE hInstance,
 HANDLE hPrevInstance,
 LPSTR lpszCmdLine,
 int nCmdShow
)
{
 Bar(MK_FP(1,0)); // pass bad pointer
 return 0;
}






































































September, 1992
AN EXCEPTION HANDLER FOR WINDOWS 3
 This article contains the following executables: WIN386.ARC


Brett Salter


Brett is the founder and chief developer at The Periscope Co., a maker of
software- and hardware-assisted debuggers. He can be contacted at The
Periscope Company, 1197 Peachtree St., Atlanta, GA 30361.


I faced a number of hurdles when I began writing Periscope/32 for Windows, a
system-level debugger for Microsoft Windows 3. For one thing, I had never
written 32-bit protected-mode code before. For another, I was learning about
the vagaries of the world according to Windows. Finally, I was in new
territory, needing to implement the debugger as an Enhanced Mode Windows
virtual device driver (VxD).
Adding to the challenge was Microsoft's suggestion that you use MASM 5.10B--a
funky, 32-bit hybrid version of MASM 5.10 -- to write Windows VxDs. Since VxDs
run in 32-bit mode at Ring 0, you find yourself writing 32-bit code with a
good chance of making errors. One potential pitfall, for example, is when you
use the OFFSET directive. To load an offset into a 32-bit register, you use
MOV EDI,OFFSET32 DATUM instead of the usual MOV DI, OFFSET DATUM. Because the
assembler doesn't warn you to use a 32-bit OFFSET32 instead of the 16-bit
OFFSET, it's easy to make a typing error that will cause major problems when
the code is run, but which is hard to find when scanning the source code.
There's also the risk of causing a fault, especially the common
general-protection (GP) fault (also known as Trap 0Dh). Here the possibilities
are seemingly endless. They range from something as simple as trying to access
memory beyond the limit of a selector to subtle faults like trying to copy the
value of the code segment register into any other segment register, as you
would in real mode.
It's possible to use MASM 6.00 to write device drivers, but you'll have to
convert the include files supplied in the Windows Device Driver Kit (DDK) to
get your programs to assemble without errors. If you do much 32-bit work,
you'll find this conversion appealing, since MASM 6.00 is much more at home
with 32-bit code and the 386 instructions you'll tend to use when writing Ring
0 code. If you decide to go this route, be sure to get MASM 6.OOB, since the
original 6.00 has some nasty bugs.


WINX.386 to the Rescue


In my development, I quickly found that any error in a VxD (such as a GP
fault) would drop me back to the DOS prompt without a clue of what had
happened. Clearly, I needed some sort of debugger to help me debug my
debugger. Consequently, I developed WINX.386, the Windows exception handler
presented in this article. WINX.386 is implemented as a VxD, so it can run at
protected-mode Ring 0 and have access to anything it needs. A VxD is quite
powerful--it's loaded when Enhanced Mode Windows starts up and is available
until Windows is shut down. This means that the VxD can oversee what's
happening with other VxDs, normal Windows applications and drivers, and the
DOS box, including TSRs and DOS device drivers. Still, VxDs have two problems:
They won't run with Standard-mode Windows since they require Enhanced Mode;
and due to the way Microsoft architected the four-part initialization of a
VxD, in some blind spots errors in a VxD are not readily trappable.
WINX.386 works by hooking various 386 CPU exception interrupts. It determines
whether the exception is a normal, expected event that should be passed on to
the prior interrupt handler for Windows to field, or if it is something
unexpected (a bug!) about which you want more information. If it is the
latter, a register dump appears on the Windows and/or monochrome screen. With
this dump, you can quickly locate the cause of various error conditions.
The screen display is basically a classic register dump, with some extensions.
Figure 1 shows an example of a GP fault, as trapped by WINX. The first line
displays the interrupt type and a description, followed by the CPU mode
(Protect or V86) and the Ring (0-3). The next line displays the error code, as
reported by some exceptions, and eight bytes of the instruction stream at
CS:EIP. Next are the general-purpose, 32-bit registers and the segment
registers. The last two lines display the control registers CRO, CR2, CR3, the
global descriptor table (GDT) base and limit, the local descriptor table (LDT)
selector, the interrupt descriptor table (IDT) base and limit, and the task
state segment (TSS) register. Typically, you use the opcodes and the CS:EIP
values to locate the faulting code. Then you use the other register values and
the exception type to help understand and fix the problem.
Figure 1: A sample GP fault, as trapped by WINX.

 Interrupt 0Dh - General protection Mode=Protect Ring=0
 Error code=0000 B000 Opcodes=66 8E C0 FF FF 90 90 66

 eax=0000 B000 ebx=8048 1000 ecx=8001 7BEC edx=0000 0000
 ebp=8001 0D50 efl=0001 3246 fs=0030 gs=0030
 eip=8001 7BCE esp=8001 0C48 esi=8040 2074 edi=8040 50B8
 cs=0028 ss=0030 ds=0030 es=0030

 cr0=8000 001B cr2=000A 1000 cr3=003E 5000
 gdt=8003 BC0C/010F 1dt=0000 idt=8000 DA70/02FF tss=0018
 Press a key to continue ...

The Windows DDK provides a variety of functions to hook interrupts. After
checking these out, however, I decided to go against Microsoft's suggestions
and hook the exception interrupts directly. This generally isn't a good
approach, but I needed a program with the following characteristics:
It had to be as fast as possible to minimize the performance impact on the
system.
It had to let me process faults before Windows so that WINX.386 could survive
a crash inside Windows itself.
It had to be totally nonintrusive.
It couldn't return results that could be "sanitized" by windows.
To be sure of catching as many exceptions as possible, I got out my dogeared
Intel i486 Microprocessor Programmer's Reference Manual and looked up
exceptions and interrupts. Based on the discussion in Chapter 9, I decided to
hook these exceptions: 06h (invalid opcode), 08h (double fault), 0ah (invalid
TSS), 0bh (segment not present), 0ch (stack fault), 0dh (GP fault), and 0eh
(page fault).
From the start, I knew that some of these exceptions would be more useful than
others, and I expected most of the use to be centered on interrupts 06h and
0dh. From my experience with interrupt 6 in real-mode debugging, I knew it to
be a valuable exception condition. It's all too easy to start executing
illegal instructions by taking a bad return off the stack or by using an
uninitialized pointer. By the time I started writing WINX.386, most of my
protected-mode problems had been with GP faults, so this was a must have too.
Most of the others were thrown in for the sake of completeness.
In the beginning, I enabled one interrupt at a time and started up Windows to
see what exception conditions were "legal." Then I began the iterative process
of adding the opcodes routinely fielded by the Windows interrupt handler. I
found that under normal conditions, interrupts 08h, 0ah, or 0ch weren't
triggered by Windows, so I decided to always display the system registers if
any of these exceptions were activated. Not wanting to interfere with Windows
page handling, I decided to always pass all Trap 0ehs on through to Windows
without stopping. That left interrupts 06h, 0bh, and 0dh. As it turns out,
Trap 0bh is infrequently used, but interrupts 06h and 0dh are used constantly.
Recently I benchmarked the number of times each of these traps occur from the
time WINX is loaded during Windows startup until CLOCK.EXE (with an embedded
illegal instruction) was executed. I executed WIN CLOCK from the DOS prompt to
negate any human "mousing" time. I found the results (shown in Table 1)
surprising. Now you can see why we call Windows an exception-driven operating
environment. Of the Trap 6s, the 4612 occurrences in V86 mode are all ARPLs, a
technique Microsoft has patented to switch from V86 mode into protected mode.
The two traps that occur in protected mode are execution of the illegal
instruction sequence Ofh ffh , which debuted in Windows 3.1 and is used to
switch from one protected-mode ring to another.
Table 1: Performance results when WINX is loaded during Windows startup.

 Exception Protected- Virtual 86 Total
 Interrupt Mode Traps Modes Traps Traps
 -----------------------------------------

 6 2 4612 4614
 B 3 0 3
 D 234 3827 4061

Ninety-four percent of the Trap 0dhs occur in V86 mode. This is to be
expected, since in V86 mode numerous instructions are emulated by the Ring 0
code--Trap 0dh is the method used to switch control from V86 code to
protected-mode Ring 0 code. The majority of the Trap 0dhs were caused by IN
and OUT instructions, followed distantly by PUSHF, POPF, INT, IRET, CLI, and
STI instructions. The Trap Ds in protected mode (all of which are not at Ring
0) are mostly IN and OUT instructions, with some CLI, STI, and INT
instructions. Trap Ds are an expected part of the overhead in a protected-mode
environment, but the extensive use of them by Windows is surprising. Even more
surprising is the high frequency of shifts to and from V86 mode.



The WINX.386 Code


WINX.386 is composed of WINX.ASM (Listing One, page 102); PSEQUATE.INC, a
general include file used for equates and handy macros; and the Windows DDK
include files (DOSMGR.INC, SHELL.INC, VDD.INC, VMM.INC, and VPICD.INC). The
generation of the executable uses MASM 5.10B, Link386 1.00.058, and Addhdr
1.01. Because of space constraints, only the main file (WINX.ASM) is printed
with this article. The other files, and a make file used to generate WINX.386,
are provided electronically.
Lines 5 and 6 of WINX.ASM have equates for the flat model CS and DS used by
Ring 0 code. Using a CS value of 28h, Windows drivers can access the full
four-gigabyte address space of the system. Similarly, using a DS value of 30h,
you can read or write memory anywhere from 00000000 to FFFFFFFFh.
After the includes is the device-descriptor block, a macro which describes
various characteristics of this driver to Windows. This is followed by a
locked data segment, which contains the Ring 0 data used by the driver. Of
particular interest in this segment are the legal opcode lists for Traps 0bh
and 0dh (intblist and intdlist, respectively). These lists identify the legal
opcodes that we pass to Windows. You may modify these lists to change the
pass-through logic as needed in your environment.
Next is the locked code segment, the Ring 0 code for the driver. At the
beginning of this segment is a dispatch table used to pass control off to the
protected-mode initialization code. Procedures p006 through p00e are entry
points for the corresponding interrupts. Procedure p013 hooks interrupt 51h,
the relocated keyboard interrupt (IRQ 1). This code counts the number of key
presses and releases so that we can implement a nonintrusive method of pausing
the system after an exception until a key is pressed (and released). The next
procedure, p020, is the generalized exception handler. It saves the interrupt
type, system registers, and error code and then calls the dispatch code
(p2OO). The dispatch code calls the actual error-check routine (p2O6 through
p2Oe) to see if we want to display the registers or not. If the check routine
returns with the carry flag set, the registers are displayed. In all cases,
control is then passed on to the original interrupt handler.
The code at p220 gets the first byte of the instruction, ignoring any prefix
bytes, such as 0f3h, a REPZ prefix. The p300 procedure is a callback routine
for Windows. Windows calls this routine when idle so that we can display a
Sysmodal message containing the register dump on the Windows screen. If this
code is disabled using the string Winx-No WinMsg in SYSTEM.INI, then WINX will
not use any Windows services after initialization.
The rest of the resident Ring 0 code (p880 through p970) is used to format and
display the screen dump. Since we're using flat model 32-bit code with a data
selector whose base is 0, we can refer to the monochrome display using a
32-bit register with a value of 0b0000h. Look Ma, no segments!
The remainder of the code is the transient initialization logic. The first of
these is the real-mode init code at p1000, which just displays a copyright and
exits after setting a few key registers. Next is the real-mode initialization
data segment.
The first phase of the protected-mode init code, which searches for the
profile strings in SYSTEM.INI and sets flags based on what it finds, is at
p1100. At the end of this code the interrupts are hooked using procedures
p1140 and p1160. These routines save the current interrupt address and
revector the interrupt to point to WINX. Note that individual-vector hooking
can be disabled using the string WinxNoIntxx in SYSTEM.INI, where xx is
replaced by the desired interrupt number in hex. Since Windows has already
established its interrupts, we get control before Windows on all interrupts
that we hook.
The second phase of the protected-mode init sets up the callback to p300
unless the user has used WinxNoWinMsg in SYSTEM.INI. Finally, the third phase
is Null, returning with the carry flag clear to indicate that all is well.
To install WINX, add the statement device = winx.386 in SYSTEM.INI in the
[386enh] section. The profile options also go in SYSTEM.INI. These include:
WinxUseMono (display the register dump on a monochrome screen), WinxMonoSave
(save the contents of the mono screen before displaying the register dump,
then restore it after a key press), WinxNoWinMsg (do not show the register
dump on the Windows screen), WinxNoIntxx (do not hook the indicated interrupt,
where xx is 06, 08, 0a, 0b, 0c, 0d, or 0e), and Periscope-Port:xxxx (a
Periscope Model IV board is using port xxxx, if the port is used, the board's
real-time trace buffer will be turned off if an exception has occurred).
For the least intrusive use, use monochrome-screen output. Indeed, if you're
encountering exceptions before the Windows desktop is displayed, the Sysmodal
message is not yet usable, so you must use the monochrome-screen output.
Please note that you can use both monochrome and Windows screens if desired.
The least intrusive keypress is something like the SysReq key or any Shift key
other than the Alt key.


Conclusion


There are several possible ways to extend WINX.386 to provide additional
information about Windows. You could enhance it to profile interrupt usage by
interrupt number, trap on specific interrupt usage (such as INT 31h, which is
used by DPMI services), or trap the usage of specific hardware interrupts by
Windows.


_AN EXCEPTION HANDLER FOR WINDOWS 3_
by Brett Salter



[LISTING ONE]

; winx.asm - exception handler for windows 3.x
.386p ; 386 protect mode
wincs equ 28h ; windows ring 0 cs
winds equ 30h ; windows ring 0 ds
 include psequate.inc ; general Periscope equates
 include dosmgr.inc ; all of these
 include shell.inc ; are from the
 include vdd.inc ; windows ddk
 include vmm.inc ; ...
 include vpicd.inc ; ...
 ; device descriptor block
Declare_Virtual_Device WINX, 3, 0, VAD_Control, Undefined_Device_ID,
VMM_Init_Order \ , , VAD_PM_Svc_Call
; ****************************************************************************
VxD_Locked_Data_Seg ; data segment
datastart equ $ ; symbol for start of data
 ; global data follows
psport dw 0 ; Periscope Model IV port number
currow db 0 ; cursor row (0-24)
curcol db 0 ; cursor column (0-79)
showregs db 0 ; 1 when we have something to show
usemono db 0 ; 1 when output to mono screen
monosave db 0 ; 1 when saving mono screen
winmsg db 1 ; 1 when output to windows screen

even
hextable db '0123456789ABCDEF' ; hex conversion table
even ; gdt/idt/ldt data
gdtlimit dw 0 ; global descriptor table limit
gdtbase dd 0 ; and base

idtlimit dw 0 ; interrupt descriptor table limit
idtbase dd 0 ; and base
ldtvalue dw 0 ; value of local descriptor table
ldtlimit dw 0 ; and limit
tssvalue dw 0 ; value of task state selector
 ; original interrupt values
origint6 df 0 ; illegal opcode
origint8 df 0 ; double fault
originta df 0 ; invalid tss
origintb df 0 ; segment not present
origintc df 0 ; stack fault
origintd df 0 ; general protection fault
originte df 0 ; page fault
origirq1 df 0 ; original keyboard interrupt
align 4
jumptable dd offset32 p206,offset32 origint6 ; control table
 dd 0,0 ; dummy for interrupt 7
 dd offset32 p208,offset32 origint8
 dd 0,0 ; dummy for interrupt 9
 dd offset32 p20a,offset32 originta
 dd offset32 p20b,offset32 origintb
 dd offset32 p20c,offset32 origintc
 dd offset32 p20d,offset32 origintd
 dd offset32 p20e,offset32 originte
intlistlen equ 9 ; number of interrupts in above list
intlist db '060708090a0b0c0d0e' ; list of 9 exception vecs
 ; keep the following together
hookint06 db 1 ; all but interrupts 7 and 9
hookint07 db 0 ; get hooked by default
hookint08 db 1
hookint09 db 0
hookint0a db 1
hookint0b db 1
hookint0c db 1
hookint0d db 1
hookint0e db 1
 ; end of keep together
intgate dw 0ee00h ; value for interrupt gate, dpl=3
keystrokes dw 0 ; keystroke count
errorcode dd 0 ; error code on interrupts 8 and higher
inttype dw 0 ; interrupt type
exitaddr df 0 ; original interrupt address as 16:32 ptr
align 4
 ; original registers
saveeax dd 0 ; original eax
saveebx dd 0 ; original ebx
saveecx dd 0 ; original ecx
saveedx dd 0 ; original edx
saveesp dd 0 ; original esp
saveebp dd 0 ; original ebp
saveesi dd 0 ; original esi
saveedi dd 0 ; original edi
saveeip dd 0 ; original eip
saveefl dd 0 ; original eflags
saveds dw 0 ; original ds
savees dw 0 ; original es
savess dw 0 ; original ss
savecs dw 0 ; original cs
savefs dw 0 ; original fs

savegs dw 0 ; original gs
 ; instruction prefixes - these opcodes are ignored
 ; when searching for the start of an instruction
prelist db 026h,02eh,036h,03eh,064h,065h,066h,067h,0f0h,0f2h,0f3h
prelistlen dd 11
 ; legal opcodes for int b - pass these on thru to the original handler
intblist db 0fh ; improve later to catch only
 ; 0f/b2, 0f/b4, 0f/b5
 db 063h ; arpl
 db 09ah ; far call
 db 0c4h,0c5h ; les/lds ... used by visual basic
 db 0cah,0cbh ; retf
 db 0eah ; far jmp
 db 0f4h ; hlt
 db 0fah,0fbh ; cli/sti
 db 0ffh ; various
intblistlen dd 12
 ; legal opcodes for int d - pass these on thru to the original handler
intdlist db 06ch,06dh,06eh,06fh ; in/out
 db 09ah ; far call
 db 09ch ; pushf
 db 09dh ; popf
 db 0cch,0cdh,0ceh ; int x
 db 0cfh ; iret
 db 0e4h,0e5h,0e6h,0e7h ; in/out
 db 0ech,0edh,0eeh,0efh ; in/out
 db 0f4h ; hlt
 db 0fah,0fbh ; cli/sti
intdlistlen dd 22
even
opcode db 8 dup(0) ; save the opcode bytes here
opcodecs dw 0 ; cs for opcodes
opcodeeip dd 0 ; eip for opcodes
explcount equ 7 ; number of interrupt descriptions
expllen equ 20 ; length of each interrupt description
explanations equ $ ; 1 byte for type, 20 bytes for text
db 06h,'Invalid opcode '
db 08h,'Double fault '
db 0ah,'Invalid TSS '
db 0bh,'Segment not present '
db 0ch,'Stack exception '
db 0dh,'General protection '
db 0eh,'Page fault '
modep db 'Protect' ; mode can be Protect or V86 only - we'll
modev86 db 'V86 ' ; never seel Real mode here
 ; start of display area
regline1 db cr,lf,'Interrupt '
xinttype db '..h - '
xexplain db '....................'
 db ' Mode='
xmode db '....... Ring='
xring db '.'
 db cr,lf
regline2 db 'Error code='
errorno db '0000 0000 Opcodes='
xopcode db '.. .. .. .. .. .. .. ..'
 db cr,lf
 db cr,lf
regline3 db 'eax='

regeax db '.... .... '
 db 'ebx='
regebx db '.... .... '
 db 'ecx='
regecx db '.... .... '
 db 'edx='
regedx db '.... ....'
 db cr,lf
regline4 db 'ebp='
regebp db '.... .... '
 db 'efl='
regefl db '.... .... '
 db ' fs='
regfs db '.... '
 db ' gs='
reggs db '....'
 db cr,lf
regline5 db 'eip='
regeip db '.... .... '
 db 'esp='
regesp db '.... .... '
 db 'esi='
regesi db '.... .... '
 db 'edi='
regedi db '.... ....'
 db cr,lf
regline6 db 'cs='
regcs db '.... '
 db ' ss='
regss db '.... '
 db ' ds='
regds db '.... '
 db ' es='
reges db '....'
 db cr,lf
 db cr,lf
regline7 db 'cr0='
regcr0 db '.... .... '
 db 'cr2='
regcr2 db '.... .... '
 db 'cr3='
regcr3 db '.... ....'
 db cr,lf
regline8 db 'gdt='
gdtb db '.... ..../'
gdtl db '.... '
 db 'ldt='
ldtw db '.... '
 db 'idt='
idtb db '.... ..../'
idtl db '.... '
 db 'tss='
tssw db '....'
 db cr,lf
regend db 0,'$'
 ; end of display area
 ; messages
pause db 'Press a key to continue ...',0
crlf db cr,lf,0

periscopeid db 'WINX (Windows Exception Handler)'
 db 0
align 4
monoscreen dw 80*25 dup(0) ; save the mono screen here
VxD_Locked_Data_Ends ; end of data segment
; ****************************************************************************
VxD_Locked_Code_Seg ; code segment
codestart equ $ ; symbol for start of code
 ; device control procedure
VAD_Control proc near ; control table
 Control_Dispatch Sys_Critical_Init, VAD_Sys_Crit_Init ; phase 1
 Control_Dispatch Device_Init, VAD_Device_Init ; phase 2
 Control_Dispatch Init_Complete, VAD_Init_Complete ; phase 3
 Control_Dispatch Create_VM, VAD_Create_VM
 clc ; no errors
 ret
VAD_Control endp
beginproc VAD_Get_Version, SERVICE
 mov eax,300h
 clc ; no errors
 ret
endproc VAD_Get_Version
VAD_PM_Svc_Call proc near
 ret
VAD_PM_Svc_Call endp
align 4
p006 proc near ; int 6 handler
 push eax
 mov al,6
 jmp short p020 ; to handler
p006 endp
align 4
p008 proc near ; int 8 handler
 push eax
 mov al,8
 jmp short p020 ; to handler
p008 endp
align 4
p00a proc near ; int a handler
 push eax
 mov al,0ah
 jmp short p020 ; to handler
p00a endp
align 4
p00b proc near ; int b handler
 push eax
 mov al,0bh
 jmp short p020 ; to handler
p00b endp
align 4
p00c proc near ; int c handler
 push eax
 mov al,0ch
 jmp short p020 ; to handler
p00c endp
align 4
p00d proc near ; int d handler
 push eax
 mov al,0dh

 jmp short p020 ; to handler
p00d endp
align 4
p00e proc near ; int e handler
 push eax
 mov al,0eh
 jmp short p020 ; to handler
p00e endp
align 4
p013 proc near ; irq 1 handler
 ; this routine hooks the keyboard; it is used only to count
 ; the number of keystrokes coming through
 @save eax,ds
 mov ax,winds
 mov ds,ax
 inc keystrokes ; count keystrokes
 @restore
 jmp cs:[origirq1] ; pass control onto prior handler
p013 endp
align 4
p020 proc near ; exception handler
 ; this is the common entry point for all exception handlers
 push ds ; save registers
 push es
 push ebp
 cld ; up!
 push eax
 mov ax,winds
 mov ds,ax ; set ds to windows ring 0 ds
 pop eax
 mov ah,0
 mov inttype,ax ; save interrupt type
 mov ebp,esp ; save registers
 mov eax,[ebp] ; 0=ebp,4=es,8=ds,12=eax,16=error/eip
 mov saveebp,eax ; save ebp
 mov ax,[ebp+4] ; get es from stack
 mov savees,ax ; save es
 mov ax,[ebp+8] ; get ds from stack
 mov saveds,ax ; save ds
 mov eax,[ebp+12] ; get eax from stack
 mov saveeax,eax ; save eax
 mov saveebx,ebx ; save ebx
 mov saveecx,ecx ; save ecx
 mov saveedx,edx ; save edx
 mov saveesi,esi ; save esi
 mov saveedi,edi ; save edi
 mov savefs,fs ; save fs
 mov savegs,gs ; save gs
 mov ebp,esp
 add ebp,16 ; point to eip/error
 mov errorcode,0 ; clear error code
 cmp inttype,8 ; error code?
 jb short p020a ; no
 mov eax,[ebp] ; get error code
 mov errorcode,eax ; save error code
 add ebp,4
p020a: mov eax,[ebp] ; get eip from stack
 mov saveeip,eax ; save eip
 mov ax,[ebp+4] ; get cs from stack

 mov savecs,ax ; save cs
 mov eax,[ebp+8] ; get flags from stack
 mov saveefl,eax ; save flags
 mov savess,ss
 mov eax,ebp ; get sp
 add eax,12 ; skip eip, cs, & flags
 mov saveesp,eax ; save esp
 test saveefl,bit17on ; v86 mode?
 jnz short p020d ; yes
 test savecs,3 ; ring 0 cs?
 jz short p020b ; yes
p020d:
 mov eax,[ebp+12] ; get esp from stack
 mov saveesp,eax ; save esp
 mov ax,[ebp+16] ; get ss from stack
 mov savess,ax ; save ss
 test saveefl,bit17on ; v86 mode?
 jz short p020b ; no
 mov ax,[ebp+20] ; get es from stack
 mov savees,ax ; save es
 mov ax,[ebp+24] ; get ds from stack
 mov saveds,ax ; save ds
 mov ax,[ebp+28] ; get fs from stack
 mov savefs,ax ; save fs
 mov ax,[ebp+32] ; get gs from stack
 mov savegs,ax ; save gs
p020b: pushad ; a bit redundant, but it's small
 call p200 ; check exceptions
 jnc short p020x ; ok - skip display
 mov dx,psport ; get Periscope port
 cmp dx,0 ; valid?
 jz short p020c ; no
 mov al,0dbh
 out dx,al ; stop Periscope Model IV trace
p020c: call p950 ; display registers
 mov showregs,1 ; indicate we have something to show
p020x: popad ; pop all registers
 pop ebp
 pop es
 pop ds
 pop eax
 jmp fword ptr cs:[exitaddr] ; pass control on to original handler
p020 endp
align 4
p200 proc near ; check exceptions
 mov ax,ds
 mov es,ax ; es=ds
 movzx eax,inttype ; get interrupt type
 sub eax,6 ; table starts at int 6
 shl eax,3 ; times 8
 mov esi,[jumptable+eax+4]
 mov edi,offset32 exitaddr
 movsw
 movsd ; copy original int to exitaddr
 jmp [jumptable+eax] ; handle the interrupt
p200 endp
align 4
p206 proc near ; handle int 6 - illegal instruction
 call p220 ; get opcode in al

 cmp al,63h ; arpl instruction? (ms patented technique!)
 jz short p206n ; yes - don't show registers
 mov ax,word ptr opcode
 cmp ax,0ff0fh ; 0f ff opcode? (ms special case)
 jz short p206n ; yes - don't show registers
 stc ; show registers
 ret
p206n: clc ; don't show registers
 ret
p206 endp
align 4
p208 proc near ; handle int 8 - double fault
 stc ; show registers on all int 8
 ret
p208 endp
align 4
p20a proc near ; handle int a - invalid tss
 stc ; show registers on all int a
 ret
p20a endp
align 4
p20b proc near ; handle int b - segment not present
 test saveefl,bit17on ; v86 mode?
 jnz short p20bs ; yes - show registers
 call p220 ; get opcode in al
 mov edi,offset32 intblist
 mov ecx,intblistlen
 repnz scasb ; search for opcode
 jnz short p20bs ; no match
p20bn: clc ; don't show registers
 ret
p20bs: stc ; show registers
 ret
p20b endp
align 4
p20c proc near ; handle int c - stack fault
 stc ; show registers on all int c
 ret
p20c endp
align 4
p20d proc near ; handle int d - general protection fault
 call p220 ; get opcode in al
 mov edi,offset32 intdlist
 mov ecx,intdlistlen
 cmp al,0cdh ; get an int?
 jz short p20dc ; yes
 repnz scasb ; search for opcode
 jnz short p20ds ; no match
p20dn: clc ; don't show registers
 ret
p20ds: stc ; show registers
 ret
p20dc: ; expand to handle individual interrupts as needed
 jmp p20dn
p20d endp
align 4
p20e proc near ; handle int e - page fault
 clc ; don't show registers
 ret

p20e endp
align 4
p220 proc near ; get opcode byte in register al
 test saveefl,bit17on ; v86 mode?
 jz short p220a ; no
 movzx ebx,savecs ; get opcode address for v86 mode
 shl ebx,4 ; times 16
 add ebx,saveeip ; plus eip
 mov opcodecs,ds ; use flat selector
 mov opcodeeip,ebx ; and our derived offset
 jmp short p220b
p220a:
 mov bx,savecs ; get opcode address for protect mode
 mov opcodecs,bx ; save cs
 mov ebx,saveeip
 mov opcodeeip,ebx ; and eip
p220b: mov edi,offset32 opcode
 mov ax,ds
 mov es,ax ; es=ds
 mov ecx,8
 mov esi,opcodeeip
 push ds
 mov ds,opcodecs
 rep movsb ; copy opcodes from user's cs:eip to us
 pop ds
mov esi,offset32 opcode
p220c: lodsb ; get opcode byte
 cmp esi,offset32 opcode+8 ; too far?
 jae short p220d ; yes - bail out
 mov edi,offset32 prelist ; is it a prefix byte?
 mov ecx,prelistlen
 repnz scasb ; search for prefix
 jz p220c ; got a prefix - get next byte
p220d: ret
p220 endp
align 4
p300 proc near ; display message using Windows services
 ; this routine is used as a callback unless WinxNoWinMsg is used
 cmp showregs,1 ; something to show?
 jz short p300a ; yes
 ret ; no - exit now
p300a: @save eax,ebx,ecx,esi,edi ; save registers
 VMMcall Get_Cur_VM_Handle ; get handle in ebx
 mov eax,mb_iconhand+mb_ok
 mov ecx,offset32 regline1
 xor esi,esi ; no callback
 mov edi,offset32 periscopeid
 VxDcall Shell_Sysmodal_Message ; display message
 mov showregs,0 ; nothing to show for now
 @restore
 ret
p300 endp
align 4
p880 proc near ; convert byte in al and output it to [edi]
 push ebx
 mov ah,0
 mov ebx,offset32 hextable
 shl ax,4 ; high nibble in ah
 shr al,4 ; low nibble in al

 xlat ; convert low nibble
 xchg ah,al
 xlat ; convert high nibble
 mov [edi],ax ; save the result
 inc edi
 inc edi ; point to next output address
 pop ebx
 ret
p880 endp
align 4
p885 proc near ; convert word in dx and output it to [edi]
 push eax
 mov al,dh
 call p880 ; convert high byte
 mov al,dl
 call p880 ; convert low byte
 pop eax
 ret
p885 endp
align 4
p889 proc near ; convert dword in edx and output it to [edi]
 rol edx,16
 call p885 ; convert high word
 inc edi ; a space between the high and low words
 rol edx,16
 call p885 ; convert low word
 inc edi ; a space after the low word
 ret
p889 endp
align 4
p900 proc near ; display string on mono screen
 @save eax,esi,edi,es
 ; entry: esi points to string
 mov ax,ds
 mov es,ax ; es=ds
p900d: call p905 ; calc offset
p900a: lodsb ; get next byte
 cmp al,0 ; end?
 jz short p900x ; yes
 cmp al,cr ; cr?
 jz short p900b ; yes
 cmp al,lf ; lf?
 jz short p900c ; yes
 mov ah,0fh ; high intensity
 stosw ; output it
 inc curcol ; bump column number
 cmp curcol,79 ; at end of screen?
 jbe p900a ; no
 mov curcol,0 ; line overflow
 jmp short p900c ; force an lf
p900b: ; handle cr
 mov curcol,0 ; column 0
 jmp short p900d ; force recalc
p900c: ; handle lf
 inc currow ; next row
 cmp currow,25 ; on row 25?
 jb short p900d ; no - force recalc
 call p915 ; scroll the screen
 jmp short p900d ; recalc now

p900x: @restore
 ret
p900 endp
align 4
p905 proc near ; calc offset in di using currow, curcol
 @save eax,ebx,ecx
 movzx eax,currow ; current row
 mov cl,80*2
 mul cl ; times 160
 movzx ebx,curcol
 shl ebx,1 ; plus (column times 2)
 add eax,ebx ; gives offset relative to mono screen
 add eax,0b0000h ; plus mono segment*16 gives 32-bit offset
 mov edi,eax ; return result in edi
 @restore
 ret
p905 endp
align 4
p910 proc near ; clear mono screen
 @save eax,ecx,edi,es
 mov ax,ds
 mov es,ax ; es=ds
 mov ax,720h
 mov edi,0b0000h
 mov ecx,25*80
 rep stosw ; init the screen
 mov currow,cl ; set these to zero
 mov curcol,cl
 @restore
 ret
p910 endp
align 4
p915 proc near ; scroll mono screen
 @save eax,ecx,esi,edi,es
 mov ax,ds
 mov es,ax ; es=ds
 mov esi,0b0000h
 mov edi,esi
 add esi,80*2 ; skip a line
 mov ecx,24*80/2 ; dwords to copy
 rep movsd ; scroll it
 mov ax,0720h
 mov ecx,80/2 ; dwords in a line
 rep stosd ; blank last line
 mov currow,24 ; now at row 24
 mov curcol,cl ; colum 0
 @restore
 ret
p915 endp
align 4
p950 proc near ; display registers
 mov ax,inttype
 mov edi,offset32 xinttype
 call p880 ; convert byte
 mov ax,inttype
 mov ah,al ; int type in ah
 mov esi,offset32 explanations ; point to interrupt descriptions
 mov ecx,explcount ; number of descriptions
p950b: lodsb ; get the byte

 cmp al,ah ; does it match?
 jz short p950c ; yes
 add esi,expllen ; add the message length
 loop p950b ; and try again
 jmp short p950d ; no find
p950c: mov edi,offset32 xexplain ; point to output buffer
 mov ecx,expllen ; length of message
 rep movsb ; copy it across
p950d:
 mov ax,savecs
 and eax,3 ; isolate cs ring bits
 add al,'0' ; convert to ascii
 mov xring,al ; save it
 mov esi,offset32 modep ; assume protect mode
 test saveefl,bit17on ; v86 mode?
 jz short p950e ; no
 mov esi,offset32 modev86
 mov xring,'3' ; v86 is always ring 3
p950e: mov edi,offset32 xmode ; point to output buffer
 mov ecx,7
 rep movsb ; copy mode across
 mov edx,errorcode
 mov edi,offset32 errorno
 call p889 ; convert error code
 mov ecx,8
 mov esi,offset32 opcode ; point to opcode bytes
 mov edi,offset32 xopcode
p950a: lodsb ; get byte
 call p880 ; convert byte
 inc edi
 loop p950a ; continue
 mov edx,saveeax
 mov edi,offset32 regeax
 call p889 ; convert eax
 mov edx,saveebx
 mov edi,offset32 regebx
 call p889 ; convert ebx
 mov edx,saveecx
 mov edi,offset32 regecx
 call p889 ; convert ecx
 mov edx,saveedx
 mov edi,offset32 regedx
 call p889 ; convert edx
 mov edx,saveebp
 mov edi,offset32 regebp
 call p889 ; convert ebp
 mov edx,saveesp
 mov edi,offset32 regesp
 call p889 ; convert esp
 mov edx,saveesi
 mov edi,offset32 regesi
 call p889 ; convert esi
 mov edx,saveedi
 mov edi,offset32 regedi
 call p889 ; convert edi
 mov edx,saveeip
 mov edi,offset32 regeip
 call p889 ; convert eip
 mov edx,saveefl

 mov edi,offset32 regefl
 call p889 ; convert efl
 mov dx,savecs
 mov edi,offset32 regcs
 call p885 ; convert cs
 mov dx,saveds
 mov edi,offset32 regds
 call p885 ; convert ds
 mov dx,savees
 mov edi,offset32 reges
 call p885 ; convert es
 mov dx,savefs
 mov edi,offset32 regfs
 call p885 ; convert fs
 mov dx,savegs
 mov edi,offset32 reggs
 call p885 ; convert gs
 mov dx,savess
 mov edi,offset32 regss
 call p885 ; convert ss
 mov edx,cr0
 mov edi,offset32 regcr0
 call p889 ; convert cr0
 mov edx,cr2
 mov edi,offset32 regcr2
 call p889 ; convert cr2
 mov edx,cr3
 mov edi,offset32 regcr3
 call p889 ; convert cr3
 sidt fword ptr idtlimit
 mov edx,idtbase ; get idt base
 mov edi,offset32 idtb
 call p889 ; convert idt base
 mov dx,idtlimit ; get idt limit
 mov edi,offset32 idtl
 call p885 ; convert idt limit
 sgdt fword ptr gdtlimit
 mov edx,gdtbase ; get gdt base
 mov edi,offset32 gdtb
 call p889 ; convert gdt base
 mov dx,gdtlimit ; get gdt limit
 mov edi,offset32 gdtl
 call p885 ; convert gdt limit
 sldt ldtvalue
 mov dx,ldtvalue ; get ldt
 mov edi,offset32 ldtw
 call p885 ; convert ldt
 str tssvalue
 mov dx,tssvalue ; get tss
 mov edi,offset32 tssw
 call p885 ; convert tss
 cmp usemono,0 ; use mono screen?
 jz short p950x ; no
 call p960 ; save/clear mono screen if needed
 mov esi,offset32 regline1
 call p900 ; display registers
 mov esi,offset32 pause
 call p900 ; display pause msg
 mov keystrokes,0 ; clear keystrokes

 sti ; allow interrupts
p950l: cmp keystrokes,2 ; get 2 irq1s yet?
 jb p950l ; no
 mov esi,offset32 crlf
 call p900 ; display a crlf
 call p970 ; restore mono screen if needed
 cli ; no more interrupts for now
p950x: ret
p950 endp
align 4
p960 proc near ; save/clear mono screen
 cmp monosave,1 ; save it?
 jnz short p960x ; no
 mov ax,ds
 mov es,ax ; es=ds
 mov esi,0b0000h
 mov edi,offset32 monoscreen
 mov ecx,80*25/2
 rep movsd ; save the screen
 call p910 ; clear the screen
p960x: ret
p960 endp
align 4
p970 proc near ; restore mono screen
 cmp monosave,1 ; save it?
 jnz short p970x ; no
 mov ax,ds
 mov es,ax ; es=ds
 mov esi,offset32 monoscreen
 mov edi,0b0000h
 mov ecx,80*25/2
 rep movsd ; restore the screen
p970x: ret
p970 endp
VAD_Create_VM proc near
 ret
VAD_Create_VM endp
VxD_Locked_Code_Ends
; ****************************************************************************
VxD_Real_Init_Seg ; init seg (real mode)
p1000 proc near
 mov ah,9
 mov dx,offset copyr
 int 21h ; display copyright
 xor ax,ax ; don't abort load
 xor bx,bx ; don't exclude any pages
 xor si,si ; no instance data items
 xor edx,edx ; dword of reference data
 ret
p1000 endp
copyr db 'WINX (Windows Exception Handler) Version 0.88'
 db cr,lf
 db 'Copyright 1991, The Periscope Company, Inc. All rights reserved.'
 db cr,lf,'$'
VxD_Real_Init_Ends ; init seg (real mode)
; ****************************************************************************
VxD_Idata_Seg ; init data seg (protect mode)
winxusemono db 'WinxUseMono',0 ; tokens in system.ini
winxmonosave db 'WinxMonoSave',0

winxnowinmsg db 'WinxNoWinMsg',0
winxpsport db 'PeriscopePort',0
winxnoint db 'WinxNoInt'
winxnoint2 db '..',0
VxD_Idata_Ends
; ****************************************************************************
VxD_Icode_Seg ; init code seg (protect mode)
VAD_Sys_Crit_Init proc near ; init phase 1
 xor edx,edx ; pointer to default string
 xor esi,esi ; look in [386enh]
 mov edi,offset32 WinxUseMono
 VMMcall Get_Profile_String ; search for 'WinxUseMono'
 jc short p1100a ; no find
 mov usemono,1 ; use mono screen
 call p1120 ; init mono screen
p1100a:
 xor edx,edx ; pointer to default string
 xor esi,esi ; look in [386enh]
 mov edi,offset32 WinxNoWinMsg
 VMMcall Get_Profile_String ; search for 'WinxNoWinMsg'
 jc short p1100b ; no find
 mov winmsg,0 ; no windows messages
p1100b:
 xor eax,eax ; zap value
 xor esi,esi ; look in [386enh]
 mov edi,offset32 WinxPSPort
 VMMcall Get_Profile_Hex_Int ; search for 'PeriscopePort'
 jc short p1100c ; no find
 mov psport,ax ; set Periscope's port
p1100c:
 xor edx,edx ; pointer to default string
 xor esi,esi ; look in [386enh]
 mov edi,offset32 WinxMonoSave
 VMMcall Get_Profile_String ; search for 'WinxMonoSave'
 jc short p1100d ; no find
 mov monosave,1 ; save/restore mono screen
 call p1120 ; init mono screen
p1100d:
 mov eax,offset32 hookint06
 mov ebx,offset32 intlist
 mov ecx,intlistlen ; count of interrupts
p1100e: xor edx,edx ; pointer to default string
 xor esi,esi ; look in [386enh]
 mov edi,offset32 winxnoint
 push ebx
 mov bx,[ebx] ; get int name in ascii
 mov word ptr [winxnoint2],bx ; save it
 VMMcall Get_Profile_String ; search for 'WinxNoInt..'
 jc short p1100f ; no find
 mov byte ptr [eax],0 ; don't hook this int
p1100f: pop ebx
 inc ebx
 inc ebx
 inc eax
 loop p1100e ; check all of our interrupts
 call p1140 ; hook the indicated interrupts
 clc ; no error
 ret
align 4

p1120 proc near ; init mono screen
 mov ax,ds
 mov es,ax ; es=ds
 mov edi,0b0000h ; init mono screen
 mov ax,0720h
 mov ecx,25*80
 rep stosw
 ret
p1120 endp
align 4
p1140 proc near ; set interrupts in idt
 mov intgate,0ee00h ; set interrupt gate, dpl=3
 mov eax,offset32 p006 ; offset of exception handler
 mov ebx,offset32 origint6 ; offset of original cs:eip
 mov cl,hookint06 ; if 1, we hook this interrupt
 mov edi,int6*2 ; offset of interrupt in idt
 call p1160 ; set int 6
 mov eax,offset32 p008
 mov ebx,offset32 origint8
 mov cl,hookint08
 mov edi,int8*2
 call p1160 ; set int 8
 mov eax,offset32 p00a
 mov ebx,offset32 originta
 mov cl,hookint0a
 mov edi,int0a*2
 call p1160 ; set int a
 mov eax,offset32 p00b
 mov ebx,offset32 origintb
 mov cl,hookint0b
 mov edi,int0b*2
 call p1160 ; set int b
 mov eax,offset32 p00c
 mov ebx,offset32 origintc
 mov cl,hookint0c
 mov edi,int0c*2
 call p1160 ; set int c
 mov eax,offset32 p00d
 mov ebx,offset32 origintd
 mov cl,hookint0d
 mov edi,int0d*2
 call p1160 ; set int d
 mov eax,offset32 p00e
 mov ebx,offset32 originte
 mov cl,hookint0e
 mov edi,int0e*2
 call p1160 ; set int e
 mov intgate,08e00h ; interrupt gate, dpl=0
 mov eax,offset32 p013
 mov ebx,offset32 origirq1
 mov cl,1 ; always hook this interrupt
 mov edi,51h*8
 call p1160 ; set int 51h (keyboard)
 ret
p1140 endp
p1160 proc near ; set interrupt in idt
 ; on entry, eax has offset of new handler,
 ; ebx points to save area for current handler's address,
 ; cl contains a 1 if the interrupt is to be hooked, and

 ; edi has the interrupt number*4
 cmp cl,1 ; hook it?
 jnz short p1160x ; no
 push eax
 mov ax,ds
 mov es,ax ; es=ds
 sidt fword ptr idtlimit
 add edi,idtbase
 mov ax,[edi] ; get low offset
 mov word ptr [ebx],ax
 mov ax,[edi+6] ; get high offset
 mov word ptr [ebx+2],ax
 mov ax,[edi+2] ; get segment
 mov word ptr [ebx+4],ax
 pop eax
 stosw ; save low offset
 mov ax,cs
 stosw ; save cs
 mov ax,intgate ; value for interrupt gate
 stosw ; save misc bytes
 shr eax,16
 stosw ; save high offset
p1160x: ret
p1160 endp
VAD_Sys_Crit_Init endp
; ****************************************************************************
VAD_Device_Init proc near ; init phase 2
 cmp winmsg,0 ; skip windows msg?
 jz short p1200a ; yes
 mov esi,offset32 p300
 VMMcall Call_When_Idle ; setup callback
p1200a:
 clc ; no error
 ret
VAD_Device_Init endp
; ****************************************************************************
VAD_Init_Complete proc near ; init phase 3
 clc ; no error
 ret
VAD_Init_Complete endp
VxD_Icode_Ends
end




















September, 1992
YOUR OWN PROTECTED-MODE DEBUGGER


A resident debugger for 80386/486 platforms


 This article contains the following executables: DB.ARC


Rick Knoblaugh


Rick is a software engineer specializing in systems programming and is the
coautbor of Screen Machine, a screen design/prototyping/code generation
utility. He can be reached at P.O. Box 1109, Half Moon Bay, CA 94019.


The 80386's debug registers and capabilities for virtualizing an 8086
real-mode environment have fostered some powerful software debuggers.
Designing and implementing such debuggers is a good vehicle for exploring
these 80386 features. This article describes a debugger I recently developed,
DB.EXE, that utilizes the 80386 debug registers and protected and virtual 8086
modes. Note that because of their length, the full source code listings for DB
are only available electronically.
DB enables breakpoints to be generated on code execution (including ROM code),
interrupts, data accesses, and I/O accesses. Since DB does not use DOS or the
BIOS, it facilitates system-level debugging. For example, you can debug a
program which takes over INT 16h. In addition, DB can coexist and work with a
real-mode debugger such as Debug or Codeview.


DB Architecture


DB consists of two layers which work together to utilize protected-mode
features, provide interaction with the user, and process commands. DBISR.ASM
contains all protected-mode code that handles exceptions and manipulates areas
restricted to privilege-level 0 code. All other resident DB code operates in
V86 mode at privilege level 3. DB was designed in this fashion to simplify
debugging. It is very difficult to debug protected-mode code because most
stand-alone software debuggers cannot be used. Thus, by placing most of the
debugger's logic in the privilege-level 3 layer, that code could be debugged
using a software debugger. It may also facilitate using a high-level language
for that portion of the code, although in this implementation, DB has been
written entirely in assembler.
In cases where privilege-level 3 code needs to manipulate an area (such as the
debug registers) that can only be accessed by more privileged code, the
privilege-level 3 layer of DB issues a user software interrupt (such as INT
60h). This causes a general-protection exception. The protected-mode exception
handler in the privilege-level 0 layer of DB recognizes the user software
interrupt as a request for privileged service and dispatches it to the
appropriate routine.


Initialization


Initialization begins with the reprogramming of the master PIC, so that
interrupts start at 20h rather than at 08h. This eliminates the possibility of
collision between protected-mode exceptions and standard PC-hardware
interrupts. For example, the protected-mode general-protection exception 0dh
(one of the areas at the heart of DB) will no longer conflict with IRQ5, which
normally generates interrupt 0dh.
The global descriptor table (GDT) is the next area to be initialized. Entries
are created for all segments to be used by DB, including the interrupt
descriptor table (IDT) and task state segment (TSS). After all these data
areas are established, virtual 8086 mode is entered by creating a
protected-mode exception stack frame, setting the VM bit in the EFLAGS, and
executing an IRETD. DB then terminates and stays resident in your system.


Breakpoints on Interrupts


Since, in protected mode, interrupts pass through gates in the IDT rather than
through the interrupt vectors, DB can get control as each interrupt occurs. In
fact, DB must route all interrupts to their respective real-mode interrupt
service routines per the real-mode interrupt vectors. (See pass_thru in
Listing One, page 108.)
DB conveniently forces all software interrupts to take a slightly more
indirect route. All IDT entries for software interrupts are defined with
descriptor privilege level (DPL) equal to 0. When this is compared against the
current privilege level (CPL) of the V86 code (CPL=3) attempting to execute an
INT n instruction, a general-protection exception (interrupt 0dh) occurs.
Using the CS:IP of the faulting instruction, DB detects the attempted software
interrupt (see gen_prot_isr in Listing Two, page 108) and can generate a
breakpoint before it routes the interrupt to the appropriate real-mode ISR.


Breakpoints on I/O


The key to generating breakpoints on I/O is the I/O permission bitmap located
at the end of the TSS. This bitmap specifies which I/O addresses a task may
access. (Refer to the 80386 Programmer's Reference Manual for more details.)
When code in the V86 task tries to execute an I/O instruction, the processor
consults the I/O permission bitmap to see if the task has been allowed access
to the particular I/O address. If the corresponding bit in the bitmap is set,
a general-protection exception is generated.
By setting such bits, DB generates a breakpoint via gen_prot_isr when an I/O
access occurs at a particular address. In such cases, DB temporarily clears
the bitmap entry and allows the instruction to continue, with one slight
difference--it sets the trap flag causing an INT 1h just after the I/O
instruction completes. Upon receiving INT 1h, DB recognizes that it is
single-stepping through an I/O instruction. At that point, it checks for
further user-specified conditions and either activates the debugger (if the
conditions are met) or simply clears the trap flag and exits. When the INT 1h
service routine is exited, the bits corresponding to the I/O address are again
set in the I/O permission bitmap.


Breakpoints on Data Accesses


DB uses the 80386 debug registers to generate breakpoints on data accesses and
code execution. DB supports all of the debug-register data-access options,
including break-on-write or read/write accesses to bytes, words, or dwords.
Since DB utilizes debug registers for execution breakpoints (as opposed to the
INT 3h method), breakpoints can be set in ROM as well as RAM. The do_debug_reg
routine (see Listing Three, page 109) performs all of the debug-register
manipulation.


Using DB


Once DB is resident in your system, it can be activated either by
simultaneously pressing the left and right Shift keys or by an INT 1h Software
interrupt.

All of the currently supported DB command are shown in Table 1. Note that the
XUD (exit to user debugger) command can be used to pass control to another
debugger. For example, you can load your application under Debug, hotkey into
DB to define a breakpoint, quick back to Debug, and execute a Go command. When
the specified breakpoint occurs, DB gets control. You can then return to Debug
by executing XUD. This is accomplished by generating an INT 1h, which will
activate most debuggers. Optionally, an alternative interrupt can be specified
with XUD. For example, issuing the command XUD 3 causes an INT 3 to be
generated. Once specified, the alternative interrupt will be generated by
subsequent XUD commands until it is specifically overridden. Int 1h, the
default, seems to work well with Debug. For Codeview, XUD 2 (causing the INT
2h associated with NMI) also works well.
Table 1: DB commands.

 Command Description
 ----------------------------------------------------------------

 R [register name] Display or change register.
 E [address] Edit memory.
 D [address] Dump memory.
 T [number] Trace.
 G [address] Go.
 BPX address Break on code execution.
 BP[B] address [RW W] Break on byte data access.
 BPW address [RW W] Break on word data access.
 BPD address [RW W] Break on dword data access.
 BPIO port address [R W RW] Break on I/O port access.
 BPINT int number [AX AH AL Break on interrupt.
 EQ =hex value]
 BL List breakpoints.
 BC [* breakpoint number] Clear breakpoints.
 XUD [int number] Exit to user debugger.
 Q Quit.

The XUD command is useful when you want to supplement your debugger's
capabilities by utilizing DB's more specialized features such as breaking on
I/O accesses.


Limitations and Enhancements


Since the purpose of the debugger code provided with this article is to
highlight the 80386 capabilities, I've not dealt with all the video aspects of
the debugger. To keep the code simple, DB always assumes text mode (25 line)
and makes no attempt to handle other video modes. (There are comments in the
code which indicate where this could be added.)
Although DB supports setting breakpoints on interrupts, there is one
limitation. Currently, DB does not operate correctly if you specify a
condition which causes a breakpoint within an interrupt service routine for an
interrupt with equal or higher priority than the keyboard. Therefore, don't
specify conditions which cause breakpoints within INT 8h or 9h code. The
reason is that DB requires IRQ1 interrupts to get its own keyboard input.
Unfortunately, interrupts can not occur if breakpoints are reached within INT
8h or 9h code before the ISRs issue EOI to the PIC.
One solution may be to have DB provide the EOI, and then trap and ignore the
EOI issued by the interrupt service routine. This would necessitate trapping
I/O on PIC accesses and keeping track of which interrupt is currently being
serviced.
Currently, DB doesn't employ the 80386 paging feature. If this were
implemented, DB could provide more robust breakpoint capabilities on memory
accesses: You could set breakpoints on ranges of memory accesses, as opposed
to the simple debug register-type memory-access breakpoints utilized here.
Also, via paging, DB could further verify memory accesses to prevent "runaway"
programs from overwriting the debugger.
Note that an errant program could potentially clear interrupts and then "go
out to lunch," rendering DB virtually (no pun intended) useless. This is
because DB initializes the V86 task to run with I/O privilege level (IOPL)
equal to 3, thereby allowing V86 mode "sensitive" instructions (CLI, STI,
PUSHF, POPF, INT n, and IRET) to affect the interrupt flag.
An option could be added to DB in which the IOPL of the V86 task would be set
to a more privileged level than 3, thus causing general-protection exceptions
when the V86 mode sensiti e instructions are attempted. These instructions
would need to be detected and emulated.


Conclusion


As you can see, these 80386 features provide enormous benefits when utilized
for debugger applications. The techniques presented here show how to apply
these processor capabilities. There's room for enhancements, but this is the
foundation.


References


80386 Programmer's Reference Manual. Santa Clara, CA: Intel Corp., 1986.
Green, Thomas. "80386 Protected Mode and Multitasking." Dr. Dobb's Journal
(September, 1989).
Margulis, Neil, "Advanced 80386 Memory Management." Dr. Dobb's Journal (April,
1989).
Turley, James L. Advanced 80386 Programming Techniques. Berkeley, CA:
Osborne/McGraw-Hill, 1988.
Williams, Al. "Homegrown Debugging--386 Style!" Dr. Dobb's Journal (March,
1990).
Williams, Al. "Roll Your Own DOS Extender: Part II." Dr. Dobb's Journal
(November, 1990).


_YOUR OWN PROTECTED-MODE DEBUGGER_
by Rick Knoblaugh


[LISTING ONE]


;-----------------------------------------------------------------------------
;pass_thru - This procedure is JMPed to by any interrupt handler which wishes
; to pass control to the original ISR per the interrupt vector table. Also,
; it checks to see if there are any breakpoints set on the int. If there are,
; the int being passed through is checked to see if it matches the condition
; for the break point. If the condition for the break point is met, DR0 is
; used to cause a break at the ISR. Enter: See stack_area struc for stack
; layout. Any error code has been removed from stack. EIP on stack has been
; adjusted if necessary.
;-----------------------------------------------------------------------------
pass_thru proc near
 mov bp, sp
 pushad
 call adjust_ustack ;adjust user stack
;returns with [esi][edx] pointing to user stack area
 mov cx, [bp].s_cs ;put on user cs
 mov [esi][edx].user_cs, cx
 mov ecx, [bp].s_eip ;put on ip
 mov [esi][edx].user_ip, cx
 movzx ebx, [bp].s_pushed_int ;get int number
 movzx ecx, [ebx * 4].d_offset ;offset portion
 mov [bp].s_eip, ecx
 mov cx, [ebx * 4].d_seg ;segment portion
 mov [bp].s_cs, cx

 mov cx, offset gdt_seg:sel_data
 mov fs, cx
 assume fs:data
 push fs
 cmp fs:trap_clear, TRUE ;tracing through an int?
 jne short pass_thru500
 mov fs:trap_clear, FALSE ;reset it
 mov fs:int1_active, TRUE ;debugger active
;If

 mov [esi][edx].user_cs, cx
 mov ecx, [bp].s_eip
 mov [esi][edx].user_ip, cx

 mov cx, ZCODE
 mov [bp].s_cs, cx
 mov cx, offset int_1_isr
 movzx ecx, cx
 mov [bp].s_eip, ecx
pass_thru500:
 pop ds ;get data seg (was fs)
 assume ds:data

 cmp int1_active, TRUE ;is debugger active?
 je short pass_thru999 ;if so, don't even think
 ;of breaking
 mov cx, num_int_bp ;number of defined int breaks
 jcxz pass_thru999 ;if no int type break points
 mov si, offset int_bpdat
pass_thru700:
 cmp [si].int_stat, ACTIVE ;is break on int enabled?
 jne short pass_thru800
 dec cx

 cmp [si].int_num, bl ;is this the int specified?
 jne short pass_thru800
 cmp [si].int_reg, NO_CONDITION ;no conditions?
 je short pass_thru750 ;if none go ahead and set break
 mov dx, [si].int_val ;get data for comparison
 cmp [si].int_reg, INT_AL_COMP ;condition compare on al?
 jne short pass_thru730
 cmp al, dl ;condition met?
 je pass_thru750 ;if so, go ahead and set break
 jmp short pass_thru800 ;if != look for more conditions
pass_thru730:
 cmp [si].int_reg, INT_AH_COMP ;condition compare on ah?
 jne short pass_thru740
 cmp ah, dl ;condition met
 je short pass_thru750 ;if so, go ahead and set break
 jmp short pass_thru800 ;if != look for more conditions
pass_thru740: ;condition compare on ax
 cmp ax, dx
 jne short pass_thru800 ;if != look for more conditions
pass_thru750:
 mov ebx, [bp].s_eip ;get offset and
 movzx edx, [bp].s_cs ;segment of ISR
 shl edx, 4 ;convert to linear
 add edx, ebx

 mov ch, 1 ;set debug register
 mov al, DEB_DAT_LEN1 ;exec breaks use 1 byte length
 mov ah, DEB_TYPE_EXEC
 sub cl, cl ;debug reg zero
 call do_debug_reg
 jmp short pass_thru999
pass_thru800:
 add si, size info_int ;advance to next int break
 or cl, cl ;all int breaks checked?
 jnz short pass_thru700 ;if not, check the next one
pass_thru999:
 popad
 add sp, 2 ;get rid of int number
 pop bp
 iretd
pass_thru endp





[LISTING TWO]

;-----------------------------------------------------------------------------
;gen_prot_isr - JMP here if int 0dh. Look for software int. If a software int
; caused the exception then: If debugger is active, look for user software
; interrupts issued by PL3 layer of debugger. If int 15h function 89h deny.
; If int 15h function 87h, emulate it. If none of these, simply route the
; interrupt per the real mode interrupt vector table. If exception was not
; caused by a software int and there are breakpoints defined on I/O accesses,
; look for I/O instruction. If it is an I/O instruction, temporarily clear
; the corresponding TSS I/O permission bit map bit and set trap flag to
; single step through the instruction. If other than software int or I/O,
; display cs:ip, 0dh and then halt.

;----------------------------------------------------------------------------
gen_prot_isr proc near
 pushad
;Note: Don't use DX or AX below as DX may contain an I/O port address; in the
; case of a software interrupt, AH will have a function code. Also, don't use
; SI or CX as they are inputs for extended memory block move function
 mov bx, offset gdt_seg:sel_databs
 mov ds, bx
 movzx ebx, [bp].e_cs ;get cs of user instruction
 shl ebx, 4 ;make linear
 add ebx, [bp].e_eip ;add ip
 mov bx, [ebx] ;get bytes at cs:ip

 mov di, offset gdt_seg:sel_data
 mov ds, di ;debugger's data

 cmp bl, INT_OPCODE
 je short gen_prot020

 cmp bl, INT3_OPCODE
 jne gen_prot150 ;go look for I/O instruction
 mov bh, 3 ;interrupt 3
gen_prot020:
 cmp trace_count, 0 ;is debugger tracing?
 je short gen_prot040 ;if not, skip test below
;See if this software interrupt is the instruction through which the user
;is tracing. If it is, set flag.
 mov di, [bp].e_cs
 cmp di, tuser_cs
 jne short gen_prot040
 mov edi, [bp].e_eip
 cmp di, tuser_ip
 jne short gen_prot040
; Clear trap bit so that it will not be set on user stack. Note: If user is
; doing a "trace n" where n is a number of instructions exceeding the number
of
; instructions in the ISR, instructions executing upon return from ISR will
; still be trapped through as the int 1 code will again set the trap flag.
 btr [bp].e_eflags, trapf
 mov trap_clear, TRUE
gen_prot040:
 inc [bp].e_eip ;get past the 0cdh (or 0cch)
 cmp bh, 3 ;int 3?
 je short gen_prot060 ;if so, only 1 byte
gen_prot050:
 inc [bp].e_eip
gen_prot060:

;See if the debugger is active and if this software interrupt is one of the
;ones used by the PL3 portion of the debugger to get PL0 services.
 cmp int1_active, TRUE ;is debugger active?
 jne short gen_prot085
;Note: In the event that an interrupt occuring while debugger is active
; (e.g. timer) actually uses these user software interrupts,
; code to verify caller would need to be added here.
 cmp bh, 60h ;do debug registers?
 jne short gen_prot080

 popad
 call do_debug_reg

 jmp gen_prot299
gen_prot080:
 cmp bh, 61h ;do I/O bit map?
 jne short gen_prot085

 popad
; Unlike accessing of debug registers, PL3 code could actually manipulate TSS
; I/O bit map directly. However, this interface keeps this in one location.
 call do_bit_map
 jmp gen_prot299
gen_prot085:
 cmp bh, 15h ;int 15?
 jne short gen_prot100
 cmp ah, 89h ;request for protected mode?
 jne short gen_prot090
 ;if so, can't allow
 bts [bp].e_eflags, carry ;set carry
 popad
 jmp gen_prot299 ;and return
gen_prot090:
 cmp ah, 87h ;request for extended move?
 jne short gen_prot100
 call emulate_blk_mov ;if so, we must do it
 popad
 mov ah, 0 ;default to success
 jnz gen_prot299 ;exit if success
 mov ah, 3 ;indicate a20 gate failed
 jmp gen_prot299 ;and return
gen_prot100:
;Adjust stack so that error code goes away and int number retrieved from
;instruction goes in spot on stack where pushed int number is (for stacks
;with no error code). Stack will be the way pass_thru routine likes it.
 mov ax, bx
 mov bx, [bp].e_pushed_bp
 shl ebx, 16 ;get into high word
 mov bl, ah ;interrupt number
 mov [bp].e_errcode, ebx

 cmp bl, 1 ;software int 1?
 jne short gen_prot140
; Check to see if we are already in debugger. This is to handle the unlikely
; case where there is an actual INT 1 instruction inside of an interrupt
; handler. If there is and debugger is active, instruction will be ignored.
 popad
 cmp int1_active, TRUE ;already in debugger?
 jne short gen_prot130 ;if not, go enter int 1

 ;else ignore it by returning
 add sp, 2 ;get rid of int number
 pop bp
 iretd
gen_prot130:
 add sp, 4 ;error code gone
 mov bp, sp
 pushad
 jmp int_1_210 ;go enter int 1
gen_prot140:
 popad
 add sp, 4 ;error code gone

 jmp pass_thru ;route the int via vectors
gen_prot150:
 cmp num_io_bp, 0 ;any I/O break points defined?
 je short gen_prot400 ;if not, don't look for I/O

 xor ah, ah ;use as string flag
 cmp bl, REP_PREFIX ;rep ?
 jne short gen_prot190
 mov ah, STRING ;only string type use rep
 mov bl, bh ;get 2nd byte
gen_prot190:
; If repeat prefix was found, ah now has a flag indicating only string type
; I/O instructions should be expected and bl now contains the byte of object
; code past the repeat prefix. Note: To be complete, this code should also
; look for the operand-size prefix and segment overrides.
 mov si, offset io_table
 mov cx, IO_TAB_ENTRIES
gen_prot200:
 or ah, ah ;strings only?
 jz short gen_prot225 ;if not, go test

 test [si].io_info, ah ;if table entry is not a string
 jz short gen_prot300 ;type I/O, go try next one
gen_prot225:
 cmp bl, [si].io_opcode
 jne short gen_prot300
 mov io_instrucf, TRUE ;instruction found
 mov cl, [si].io_info ;get info about instruction

 mov io_inst_info, cl
 test cl, CONSTANT ;port number in instruction?
 jz short gen_prot250 ;if not, we have it
 movzx dx, bh ;get port
gen_prot250:
 mov io_inst_port, dx ;save port
 mov cx, 1 ;number of bits
 sub ah, ah ;indicate clear
 call do_bit_map
gen_prot260:
 bts [bp].e_eflags, trapf ;single step i/o
 popad
gen_prot299:
 add sp, 2 ;int number pushed
 pop bp
 add sp, 4 ;error code
 iretd
gen_prot300:
 add si, size io_struc ;advance to next table entry
 loop gen_prot200
gen_prot400:
;Also need to add code here to check

; ch = 3 get bn portion of debug status register into ax
; If clearing and eax !=0, eax holds other bits to be cleared (used for
; also clearing ge or le bits). cl = debug register number (0-3) if setting
; also have: al = length (0=1 byte, 1=2 bytes, 3=4 bytes); ah = type
; (0=execution, 1=write, 3=read/write); edx = linear address for break
; Also, if al='*' simply reactivate the breakpoint keeping the existing
; type and address. Exit: if disabling, specified debug register breakpoint

; is disabled. If enabling, specified debug register is loaded and
; breakpoint is enabled. If getting debug status register, bn portion of
; DR6 is returned in AX.
; Save ebx.



[LISTING THREE]

;-----------------------------------------------------------------------------
do_debug_reg proc near
 cmp ch, 3 ;requesting status?
 jne short do_deb050 ;if not
 mov eax, dr6 ;debug status register
 and ax, 0fh ;isolate bn status
 ret ;and return
do_deb050:
 push ebx
 mov ebx, dr7 ;get debug control reg
 cmp ch, 1 ;determine function
 jb short do_deb850 ;if clear function go do it
 ja short do_deb100 ;setup, but not enable
 cmp al, '*' ;simply reset?
 je short do_deb850
do_deb100:
 push cx ;save function/reg #
 push edx ;save linear address
 mov edx, 0fh ;4 on bits
 shl cl, 2 ;reg # * bits associated
 add cl, 16 ;upper portion of 32 bit reg
 shl edx, cl
 not edx ;associated bits off
 and ebx, edx ;in the dr7 value
 shl al, 2 ;length bits to len position
 or al, ah ;put in the type
 mov dl, ah ;save type
 sub ah, ah
 shl eax, cl ;move len/rw to position
 or ebx, eax
 or dl, dl ;execution type?
 jz short do_deb500 ;if so, don't need ge
 bts bx, ge_bit
do_deb500:
 pop edx ;restore linear address
 pop cx ;and debug register #
 cmp cl, 1
 je short do_deb600
 ja short do_deb700
 mov dr0, edx
 jmp short do_deb800
do_deb600:
 mov dr1, edx
 jmp short do_deb800
do_deb700:
 cmp cl, 3
 je short do_deb750
 mov dr2, edx
 jmp short do_deb800
do_deb750:

 mov dr3, edx
do_deb800:
 cmp ch, 2 ;setup, but not enable?
 je short do_deb900 ;if so, skip enable
do_deb850:
 shl cl, 1 ;get to global enable for #
 inc cl
 movzx dx, cl ;bit number to turn on
 bts bx, dx ;set on in dr7 value
 or ch, ch ;set function?
 jnz short do_deb900
 btr bx, dx ;if not, disable break
 or ax, ax ;clear ge or le?
 jz short do_deb900 ;if not continue
 btr bx, ax ;if so, clear ge or le bit
do_deb900:
 mov dr7, ebx ;put adjusted value back
 pop ebx
do_deb999:
 ret
do_debug_reg endp

isrcode ends
 end
End Listings





































September, 1992
HIGH-RESOLUTION TIMING


A fast, tight timer for PCs


 This article contains the following executables: HRT.ARC


Thomas Roden


Thomas is a senior software engineer at Advanced Logic Research, 9401
Jeronimo, Irvine, CA 92718 or at rodentia @alr.com.


IBM-compatible PCs have always incorporated high-resolution timers (the Intel
8254 and equivalents) that govern short-duration functions like RAM refresh
and speaker-tone generation, and longer-duration functions such as time and
day determinations. Timers operate by counting pulses at a given frequency.
With PCs, the timer input frequency is 1.19318 MHz--going into a 16-bit timer.
Frequencies that are integer quotients (up to 1/65536) of the base frequency
can be generated. This allows for a minimum frequency of approximately 18.2 Hz
(corresponding to a period of about 55 milliseconds).
In PC parlance, the 55-millisecond period is often called a tick; much of the
rest of the world, notably Macintosh programmers, use the term to refer to
1/60 second. Since the PC tick consists of 65,536 periods of the timer input
(approximately 838 nanoseconds), it is natural to name this period. In the
interest of cuteness, I'll refer to it as the "ticklet."
To allow longer durations to be timed, the output of one of the timers (Timer
0) is connected to an interrupt line (IRQ 0). The PC is programmed to respond
to this interrupt by simulating a 21-bit timer cascaded from the 16-bit
hardware timer. This 21-bit timer is large enough to hold time values up to 24
hours. Under DOS, its overflow is fed to the system date value, which should
be good at least until the year 2000.
Time-of-day is normally needed only to the second resolution or slightly
better, so Timer 0 is set to the minimum frequency. This allows the PC to
spend as little time as possible simulating the additional timer bits, leaving
more time for the user's program. Normal time-of-day functions read only the
cascaded software timer (also known as the tick count), offering resolutions
down to one tick.
Some applications require higher timer resolution, to which there are two
popular approaches. For periods under one tick, the speaker timer (Timer 2)
can be programmed to start and stop on cue (much like a stopwatch). This
supplies resolution down to one ticklet. This ideal limit is degraded by the
time it takes to access the timer gate and any other latencies in the control
software. Three-ticklet resolution is generally attainable. The main side
effect of this technique is that the speaker is unavailable during timing.
For periods over one tick, Timer 0 can be accelerated, causing the tick count
to increase more quickly. Typically, a separate tick count is maintained, and
the original IRQ 0 interrupt is occasionally simulated to preserve the
integrity of the system tick count. The main side effect of this technique is
increased system overhead. Increasing the IRQ 0 frequency to the point of
supporting millisecond resolution seriously impacts system performance.
A technique outlined by Jerry Jongerius in his article "Accurately Timing
Windows Events Without Timer Reprogramming" (Microsoft Systems Journal, July
1991) demonstrates that Timer-0 state can be read and combined with the normal
system tick count to supply enhanced timer resolution without additional
system overhead. Jongerius realizes 1-millisecond resolution with
100-microsecond resolution on faster machines.
A slightly different approach that streamlines Timer-0 reading and integrates
8259 Programmable Interrupt Controller (PIC) data allows resolution under 16
microseconds on most machines. Here's how it's done: The essence of
high-resolution, nonintrusive timing is to combine the ticklet portion of the
current time with a tick count. This is accomplished by reading Timer 0 to
determine the number of ticklets that have passed since the last tick
interrupt, using that value as the low word of the result, and using the
low-order portions of the system tick count as the higher-order portions of
the result.
Timer 0 is typically loaded with the terminal count of 0 (which corresponds to
65,536 in this instance) and set to mode 3 (square wave generator). In this
mode, the timer is loaded with 0 and the output goes high. It then counts
down, by twos, to two. The output goes low, the timer reloads with 0, and
counts down again. The sequence repeats for each tick; see Figure 1(a) and
Figure 1(b).
To convert the available timer-state information to a ticklet count, it is
first necessary to translate the count down to a count up. The fact that a
count of 0 is in fact the maximum value must also be handled. Since counting
is by twos, this value is halved, producing a number from 0 to 32,767 which
repeats once within the tick.
To resolve the ambiguity of a repeating count, the value of the timer output
is polled to determine if this is the first or second count down. If the
output is high, it is the first count down. This supplies the 15th bit to
produce the ticklet count of 0 to 65,535.
This sounds like enough to produce the desired effect, but there are some
caveats. Interrupts must be disabled while the timer is polled to avoid
inconsistent ticklet and tick values. If the ticklet count was near maximum
when interrupts were disabled, it could overflow before read and be combined
with a tick count that is one too low (producing an error on the order of
65,536 ticklets!).
Fortunately, this condition can be detected. After the timer is read, the
Programmable Interrupt Controller is interrogated to see if it has an IRQ 0
pending; see Figure 1(c). If so, the calculated ticklet value may correspond
to a tick count of one greater than that in memory. It is not safe to simply
add to the tick count when combining, because it is also possible to read the
timer just before overflow and err in the other direction. The optimal
solution is to assume that the ticklet count is about to overflow (has a value
of 65,535) and combine it with the current tick count.
There is also a "feature" on some platforms that the timer output will in fact
change early when the timer count is still two and not yet reloaded. This is
not entirely compatible with the original Intel part, but it is out there, so
it must be accommodated. The best approach is to reject the current values and
read the hardware once more. It is almost impossible to read unusable values
twice in a row, and would not delay longer than 55 milliseconds, at worst.
It is also possible to read the timer during the "null count" phase when the
count is invalid due to reloading. Again, the best approach is to reject the
values and reread the hardware. Doing so yields a ticklet count suitable for
direct merging with a tick count; see Figure 1(d), Figure 1(e), and Figure
1(f). Even where there are inaccuracies, they are never glitchy: Time is
always read as an increasing value, and discontinuities are kept to a minimum.


It's Only as Good as DOS


Unfortunately, the system tick count is not entirely stable. When the time
reaches midnight, the count is cleared by DOS. Since it doesn't transition on
a power of two, its error can propagate down to any timing interval. Thus, any
routine using the system tick count must be prepared for such an error.
It is possible to add checking to determine if a day overflow has occurred,
although this tends to add some overhead to every timer interrogation. There
are other ways to handle such cases. These typically involve range-checking
intervals and supplying defaults when they are out of range.


How to Do Better


The midnight-reset problem can be remedied by keeping a parallel tick count
that rolls over on a power-of-two boundary. This can be accomplished by
hooking an interrupt service routine (ISR) to IRQ 0. This does increase system
overhead, but only slightly. Sixteen bits of count allow for a one-hour range.
Thirty-two bits of count allow for a whopping seven-year range.
The ISR should increment its count before transferring to the original
interrupt code. This ensures total consistency between the count and the
ticklet state by not issuing an end of interrupt (EOI) until after the count
is incremented. For the same reason, IRQ 0 (INT 08h) and not INT 1Ch should be
the interrupt that is intercepted.
It may be safe to intercept at a different point (such as INT 1Ch) so long as
it is not possible to call the ticklet reading code between the times that the
IRQ 0 is acknowledged and the IRQ 0 ISR is complete. This would depend on the
environment in which this technique is used.


How Fast do You Want to Go?


Combining the ticklet count with a parallel tick count can yield 48 bits of
information spanning a seven-year period. The lower bits are of questionable
accuracy, and the upper bits may correspond to uninteresting intervals. The
number of bits to use in a count is the count's dynamic range. Where that
window of bits is positioned is the resolution.
It is very inconvenient to deal with variables of greater than 32 bits in most
environments. Even variables of greater than 16 bits are slower to process
under DOS. This makes it useful to tailor the range and resolution of the
count to one's application. Table 1 provides some examples.
Table 1: Tailoring the range and resolution of the count to one's application.
(Numbers are approximate, so before you use this information in a programmable
pacemaker, do your own timing analysis.)

 Minimum Maximum Variable Shift
 Interval Interval Size
 ------------------------------------


 838 ns 1 hr 32 bits 0
 1676 ns 2 hrs 32 bits 1
 3352 ns 4 hrs 32 bits 2
 3352 ns 0.2 sec 16 bits 2
 13.4 us 0.88 sec 16 bits 4
 26.8 us 32 hrs 32 bits 5
 215 us 14 sec 16 bits 8

To avoid difficult situations, use maximum intervals of twice the durations
expected. It is feasible to time intervals near the maximum, but they require
sufficiently frequent polling to avoid undetected overflow.


Conditioning the Results


Timer output is an unsigned value. The general case of timing is of the
interval from one event to another. At the first event, the timer is sampled
and its value is stored. At the second output, the timer is again sampled, and
this value is subtracted from the first. This yields a value that is correct
even if there is a single timer roll over, so long as the range of the timer
is not exceeded.
Timer overflow can be handled by monitoring that the timer in question has
overflowed and setting a bit and/or ceasing to increment the timer. This
requires additional IRQ-0 handling and is generally more difficult than simply
using a wider counter.


The Right Tool for the Job


There are two major uses for a timer: to record the interval between two
events, and to wait until such an interval has elapsed. These uses are
analogous to a stopwatch and an alarm clock, respectively. Each task suggests
different strategies for dealing with the inaccuracies inherent in
DOS-dependent timing systems.
Interval recording is usually for the purpose of statistics gathering. Under
these circumstances, it is usually acceptable to clip erroneous values to a
"reasonable" range or drop them altogether. A value can be considered
erroneous when it falls outside of an expected range.
Elapsed time checking is useful for real-time control systems (and games),
where it should be guaranteed that a minimum amount of time passes between two
events. The approach for this situation depends on which (hopefully
infrequent) error behavior is more acceptable: delaying not long enough or
delaying up to twice the requested duration.


Does Anybody Really Know What Time it Is?


Working in units of ticklet is all fine and good for our discussion, but it is
certainly not a unit recognized by the National Institute of Standards and
Technology. Doing all the work in a common unit would add complexity,
overhead, and loss of precision to every step. A more effective approach is to
supply routines to convert between ticklets and whatever fraction of second
you use.
The conversion from ticklet to microsecond can be accomplished in integer by
multiplying by 88/105. This introduces an error of roughly one part in seven
million. Microseconds can be converted to ticklets with the inverse (105/88)
with the same error. Ticklet/millisecond conversions can be accomplished with
the fractions 11/13,125 and 13,125/11.
There are fractions that convert more accurately, but they handle only smaller
values, due to intermediate overflow. The precise ticklet/millisecond
fractions involved are based on 12,000,000/14,318,180 and its inverse.
Floating point will give better accuracy and less chance of overflow, but it
is so slow as to risk affecting the timing. The selection of the suggested
fractions was a bit more than pure guess work. It involved weighing the merits
of accuracy, range, and compute time. (In fact, this topic is interesting to
the point of deserving its own article.)


I Don"t Do Windows


The supplied implementation is designed for DOS, and may not work properly
under other operating environments. In particular, if the Timer-0 frequency is
changed, even by one, the assumptions made herein become invalid. One reason
is that the behavior of a timer in mode 3 is different for even and odd
initial counts. There would also be increased difficulty scaling the count for
merging with a tick count.
Environments that virtualize I/O, such as Windows 386 Enhanced Mode, also have
a problem. Not only is there a severe performance degradation, but there are
also additional requirements to the environment. Operations that change the
state of the processor interrupt-enable bit must be honored. There is also a
relationship between the state of the timer and interrupt controller that must
be simulated or preserved.
As Jongerius points out, Virtual Device Driver (VxD) programming techniques
can alleviate these problems. Actually, this technique could be used to
improve the performance of the Windows GetTickCount call if it were
implemented by Microsoft.
Unfortunately, this issue doesn't affect Windows programs exclusively. If this
technique is used in a DOS program running in a 386 Enhanced DOS box, it's
values will be highly suspect. Discontinuities and glitchy results become
common.


The Prize at the Bottom of the Box


HRTIME.ASM (Listing One, page 110) contains three externally referenced
functions. hrt_open() is an initialization function. When built to use a
parallel tick count, it installs the interrupt routine and clears the tick
count. hrt_close() unhooks any interrupt routine installed by hrt_open().
hrtime() is the function that does the high-resolution time determination. It
returns an unsigned long (assumed to be 32 bits) value of the number of
ticklets either from the time hrt_open() was called (if built for parallel
tick count) or since midnight.
I've also written a sample C program that uses the hrtime functions. Due to
space considerations, this program and programmer's notes are available
electronically. This program verifies that the technique I've described here
works on your system. Please report any platforms on which the unmodified form
of the technique fails. The program also illustrates how the major forms of
timings are used and demonstrates conversion between ticklets and microseconds
or milliseconds.


_HIGH-RESOLUTION TIMING_
by Thomas Roden


[LISTING ONE]


; hrtime.asm - Hi Resolution TIMEr for dos. Copyright (c) 1991 Thomas A. Roden
; All Rights Reserved (with one exception). The right to freely distribute
this
; source and any executable code it creates is granted, provided that this
; copyright notice is included in the source. It is requested that the
author's
; name (Thomas A. Roden) be included in the acknowledgements of any product
; including this code, but this request is in no way legally binding.

IO_DELAY_NEEDED equ 0 ; doesn't seem to need recovery time here
PRVT_TICKS_USED equ 1 ; if a private tick count is to be used
IRET_IF_RESTORE equ 0 ; if the flag is to be restored with an IRET,
 ; or with the windows suggested method
; io_delay - delay just a bit
io_delay macro
if IO_DELAY_NEEDED
 jmp short $+2
endif ; IO_DELAY_NEEDED
endm

PIC0_ADDR equ 020h ; I/O address of PIC 0
TIMER0 equ 040h ; I/O address of Timer 0
TIMER_STAT equ 043h ; I/O address for status/control of timers 0-2

PIC_RIRR equ 00Ah ; Read Interrupt Request Register of a PIC
T0_R_S_C equ 0C2h ; Timer 0 Read Status and Count

DOS_GLOBALS equ 040h ; segment for 40:xx variables
SYS_TIMER_CNT equ 06Ch ; offset for system count
 .model large
 .code
assume ds:nothing, es:nothing

if PRVT_TICKS_USED
hrt_ticks dd 0
old_int8 dd 0

; hrt_isr - interrupt routine for private tick counting for hi-res timer
 public _hrt_isr
_hrt_isr proc far
 add word ptr cs:[hrt_ticks], 1
 adc word ptr cs:[hrt_ticks+2], 0

 jmp dword ptr cs:[old_int8]
_hrt_isr endp
endif ; PRVT_TICKS_USED

; hrt_open - the init function for the hi-res timer
 public _hrt_open
_hrt_open proc far
 push ds ; save this stuff to avoid bad crashes
 push es ; save this stuff just to be paranoid
 push bx
 push cx
 push dx
if PRVT_TICKS_USED
 xor ax, ax
 mov word ptr cs:[hrt_ticks], ax
 mov word ptr cs:[hrt_ticks+2], ax
 mov ax, 03508h ; get int vector for int8 (irq0)
 int 21h

 mov word ptr cs:[old_int8], bx
 mov ax, es
 mov word ptr cs:[old_int8+2], ax
 mov dx, seg _hrt_isr
 mov ds, dx
 mov dx, offset _hrt_isr
 mov ax, 02508h ; set int vector for int8 (irq0)
 int 21h
endif ; PRVT_TICKS_USED
 pop dx
 pop cx
 pop bx
 pop es
 pop ds
 xor ax, ax
 ret
_hrt_open endp
; hrt_close - the un-init function for the hi-res timer
 public _hrt_close
_hrt_close proc far
 push ds ; save this stuff to avoid bad crashes
 push es ; save this stuff just to be paranoid
 push bx
 push cx
 push dx
if PRVT_TICKS_USED
 mov dx, word ptr cs:[old_int8+2]
 mov ds, dx
 mov dx, word ptr cs:[old_int8]
 mov ax, 02508h ; set int vector for int8 (irq0)
 int 21h
endif ; PRVT_TICKS_USED
 pop dx
 pop cx
 pop bx
 pop es
 pop ds
 xor ax, ax
 ret
_hrt_close endp
; hrtime - the hi-res time reader
 public _hrtime
_hrtime proc far
if IRET_IF_RESTORE
 pop bx ; return ip - to make iret return frame
 pop ax ; return cs
 pushf ; flags
 push ax ; cs
 push bx ; ip
else ; IRET_IF_RESTORE
 pushf ; store flags on stack to check at end
endif ; IRET_IF_RESTORE
 cli ; freeze ticker while checking
hrt_readhw:
 mov al, T0_R_S_C ; timer 0 read stat and count
 out TIMER_STAT, al
 io_delay
 in al, TIMER0 ; store stat in ch
 mov ch, al

 io_delay
 in al, TIMER0 ; store count in bx
 mov bl, al
 io_delay
 in al, TIMER0
 mov bh, al
 io_delay ; delay between timer and pic access
 mov al, PIC_RIRR
 out PIC0_ADDR, al
 io_delay
 in al, PIC0_ADDR
 test al, 001h ; check if system time is stale
 jz short hrt_timegood
 mov ax, 0ffffh ; force max in lower part
 jmp short hrt_gotlo
hrt_timegood:
 test ch, 040h
 jnz short hrt_readhw ; timer invalid, retry
 mov ax, bx ; move count to more convenient reg
 neg ax ; convert to count up
 shl ch, 1 ; bash ch to get high bit of status
 ; (output state) into high bit of
 ; count via carry
 cmc ; invert carry to match count negation
 rcr ax, 1 ; mode 3 counts down twice by twos,
 ; first with output high, then low
hrt_gotlo:
; This would be the place to shift ax down if less resolution but greater
range
; were needed. The merging with system ticks or parallel ticks would have to
; involve the same shifting.
if PRVT_TICKS_USED ; if a parallel tick count is to be used
 mov dx, word ptr cs:[hrt_ticks]
else ; PRVT_TICKS_USED
 mov dx, DOS_GLOBALS
 mov es, dx
 mov dx, es:[SYS_TIMER_CNT] ; add system time
endif ; PRVT_TICKS_USED
if IRET_IF_RESTORE
 iret
else ; IRET_IF_RESTORE
 pop bx ; get flags down to restore interrupt flag
 test bx, 0200h ; was IF set? (jump if no)
 jz short hrt_if_done
 sti ; restore interrupts to ON
hrt_if_done:
 ret
endif ; IRET_IF_RESTORE
_hrtime endp

end












September, 1992
AN IMPROVED LISP-STYLE LIBRARY FOR C


More robust, more complete, still very convenient


 This article contains the following executables: CLISP.ARC


Douglas Chubb


Douglas is a mathematician for the U.S. Army. He can be reached through the
DDJ offices.


I was first exposed to Lisp in 1975, while working as a research mathematician
for the Army. Since then, I've been fortunate enough to develop most of my
algorithms using Lisp. I've found it to be a wonderful language, because it
allows me to express abstract ideas quickly and naturally, while ignoring
mundane considerations such as memory allocation and type declarations. So I
was very pleased to see Daniel Ozick's article, "A Lisp-Style Library for C,"
in the August 1991 issue of Dr. Dobb's Journal. At the time, I had just begun
translating a large program from Common Lisp to C.
However, once the translation process was underway, I discovered that some of
the predicates in Daniel's library did not function properly. Also, the
library was missing some "standard" Lisp predicates and structures to which
I'd become accustomed. Fortunately, Daniel's approach is modular enough to
encourage serious tinkering, so in the course of a few months, I developed a
more complete library of predicates that behave like their Common Lisp
equivalents.
Unfortunately, I then discovered that Ozick's garbage collector prevented my
program from executing like its Lisp counterpart. By redesigning the
garbage-collection process, I've now succeeded in making the system efficient,
nearly automatic, and easy to use. The result is an improved Lisp library in C
which I call "C-lisp." This library should facilitate the translation of Lisp
programs to C.
I've tested the library on a variety of platforms, using different C and C++
compilers. The platforms include an IBM 486, a Macintosh FX, and a Sun
SPARCII. The compilers used were Borland C, Symantec's ThinkC,
Saber/CenterLine C++, and Sun C++ V2.1.


The Missing Predicates


Certain familiar Lisp predicates and structures were missing from Ozick's
original implementation. These include Lisp atoms and associated property
structures, the eq and equal predicates, and the list and append functions.
A key strength of Lisp is the language's ability to represent highly abstract
data structures--sets, trees, and lattices--using Lisp atoms and their
associated indicator-property structures. As you may recall, an
indicator-property structure (sometimes called an "attribute/value pair") is a
way of associating a set of features and corresponding values with a single
atom. You can use indicator-property pairs to represent a real-world object in
terms of an associated descriptor set. For example, consider Tweety--a small
yellow bird, one year old, which cannot fly. In Lisp, the atom which is the
pointer to Tweety would have the set of indicators and associated properties
shown in Example 1.
Example 1: A partial list of descriptors for Tweety.

 Indicator Property
 --------------------------------

 LISP ATOM Name Tweety

 Animal T

 Type bird

 Age one year

 Color yellow

 Fly nil

 Size small

You might argue that Tweety could also have been represented using a list.
However, although the list would contain Tweety's descriptors, you would have
a problem referencing it. The only referencing requirement is that Tweety's
label be unique, since the entity Tweety is assumed to be unique. (Note that
another Tweety with an identical set of Lisp descriptors may exist, but it is
not the original Tweety.) Lisp atoms with unique symbol names solve this
problem.
You can use the gensym (generate symbol) function from the C-lisp library to
generate unique atoms. Then, you use the put_prop (put property) function to
add/update indicators and their associated properties. Likewise, given an
indicator, you can use the get_prop function to retrieve the corresponding
property value. The remprop function removes both the indicator and its
associated property from an atom. The code in Example 2 shows how this is
done.
Example 2: Using gensym, put_prop, get_prop, and remprop.

 Object foo = gensym(); /* generate a new Lisp atom */

 put_prop(foo, "name", "Tweety"); /* give it name = Tweety */
 write_object (get_prop (foo, "name")) /* print name ("Tweety") */
 remprop(foo, "name"); /* remove name */
 write_object (get_prop(foo,"name")); /* print name (prints "Nil") */




EQ is not Equal


In Common Lisp, (eq x y) returns True if and only if x and y address the same
memory location. However, x and y can be considered "equal" if they are
structurally similar (isomorphic) objects. In the case of Lisp predicates such
as member, using eq to test for equality is frequently inappropriate. For
example, consider this Lisp fragment: (setf foo '(d e f)) (setf foo2 '(a b (d
e f))) (member foo foo2:test#'eq). As shown, the member predicate returns Nil;
by contrast, (member foo foo2 :test #'equal) returns the desired True.
I implemented a C-lisp lisp_equal function which recursively examines complex
structures, testing for C equality (==) only as appropriate. The lisp_equal
predicate is used in is_member, which returns True or Nil; in member, which
returns the list whose car is lisp_equal to the first function argument; and
in remove_item, which behaves like its Common Lisp counterpart. The C-lisp
functions remove_duplicates, intersection, and set_differenc use is_member,
member, and/or remove_item as well.


List and Append


Frequently in a Lisp program you want to create a list of lists. Occasionally,
some of the member lists may be empty. The original implementation, which used
a NULL (the empty set) as an argument terminator in calling the list function,
didn't allow this. The C statement foo=list(make_ integer(33),NULL) results in
foo being equal to (33). This means that foo= list(NULL,NULL) returns NULL
rather than (Nil).
To make this function work properly, I changed the calling protocol to use the
special symbol T_EOF as an argument terminator. Then the C statement foo =
list(NULL,NULL,T_EOF) produces the desired result of (Nil Nil).
The append function was also anomalous. As originally implemented, C-lisp
append behaved like Lisp nconc, a smash function that destroys its first
argument. For example, the C statement foo = append(foo_bar,foo2) results in
foo_bar being destroyed. I modified append to copy foo_bar and to cons each
element of (reverse foo_bar) to the front of list foo2, which is modified as
expected. The function nconc is now available as well.


Improving Garbage Collection


The improvements described earlier are incremental changes to the original
implementation. After these were complete, I discovered a problem which
necessitated a redesign of the original memory manager and garbage collector
(GC).
The original design was admittedly simple in both concept and implementation.
Although "real" Lisp systems use a variety of sophisticated GC algorithms
(such as mark-and-sweep and generation scavenging), Daniel's implementation
did not use a GC as such. Instead, memory was partitioned into two disjoint
sets: temporary and persistent, and the mark and mark_persistent functions tag
objects as one or the other. When memory needs to be reclaimed, the
application program calls free_to_mark, which uses the free() function of the
C standard library to deallocate memory. Objects are normally stored in
temporary memory, where they get reclaimed by the GC. To protect an object
from the GC, you must copy it into persistent memory. Eventually, this results
in running out of temporary memory, because dynamic changes to a C-lisp data
structure require many copies of the original object. A typical sequence is
shown in Example 3(a). This is a result of having only two memory types.
Example 3: (a) Using the original garbage collector; (b) using the improved
garbage collector, (c) how to protect a symbol's plist.

 (a)

 mark(); /* set memory type = temporary */
 ...
 mark_persistent(); /* set memory type = persistent */
 foo = copy_object (foo_bar); /* foo gets copied into persistent
 memory */
 unmark_persistent(); /* set memory type = temporary */
 ...
 free_to_mark(); /* free temporary-tagged memory */

 (b)

 foo = list (make_integer(33),NULL,T_EOF); /* make object foo */
 mark_object (foo); /* mark the object (set protect bit) */
 collect_garbage(); /* invoke the garbage collector */
 write_object(foo); /* display object (prints "(33 NULL)" */
 unmark_object (foo); /* clear protect bit */
 collect_garbage(); /* invoke the garbage collector again */
 write_object(foo); /* object no longer exists ("error") */

 (c)

 foo_bar = gensym("girl");
 put_prop (foo_bar,"age", make_integer(25));
 mark_object(foo_bar); /* foo_bar gets marked */
 foo=gensym("boy");
 mark_object(foo);
 put_prop(foo,"age",make_integer(30)); /* C-bit set on foo */
 put_prop (foo, "friend", foo_bar);
 mark_object(foo); /* second marking of foo
 needed */
 collect_garbage();
 write_object(get_prop (foo, "friend")); /* displays girl-1 */

A better approach is to tag C-lisp objects, not memory. Listing One (page 112)
is my memory-manager implementation. This more closely resembles the GCs in
actual Lisp implementations, while still remaining rudimentary and
nonautomatic. Each C-lisp object has a tag which indicates that object's
type--symbol, pair, or integer. Depending upon the computer architecture, the
minimal tag size is 1 byte. I use three bits of the type byte to specify the
object type. Two of the remaining five bits are used during garbage collection
to classify an object as "protected" (P bit) or "changed" (C bit). Normally,
an object's P bit is 0, and the object is considered "unmarked." Unmarked
objects may be reclaimed by the GC; objects with P bit equal to 1 are
protected from the GC. You set an object's P bit with the function mark_object
and clear it with unmark_object. Calls to mark_object are efficient because
previously marked objects are ignored. Example 3(b) shows a typical code
fragment.

Now suppose foo is an object of type symbol, and foo is marked. Foo's property
list (plist) can be changed by using either put_prop or remprop, as described
previously. If foo is not protected, the changed portion of foo's plist will
be reclaimed during garbage collection with possibly disastrous results. To
avoid this, both put_prop and remprop set the C bit if and only if the P bit
is set. The mark_object function reexamines objects whose C bit is set,
searching for any unmarked portion of the plist. To make this process memory
efficient, a special unmark algorithm, free_structure, is used to unmark the
portion of foo's plist being removed or replaced by put_prop or remprop. The
free_structure function differs from unmark_object in that the value of the P
bit for objects of type symbol is left unchanged. This strategy, illustrated
in Example 3(c), prevents the unwanted reclamation of memory by the GC for any
symbol objects which may be associated with foo. Example 4 shows the
pseudocode for the key routines in the garbage collector.
Example 4: Pseudocode for memory allocation and garbage collection.

 At Start-Up:
 Initialize the GC stack pointer to NULL.
 Return.

 When a C-lisp object is created:
 Obtain memory from the system by using malloc().
 Push GC stack pointer to location of last GC stack location.
 Return.

 Upon a call to collect_garbage();
 CG: Pop pointer from GC_Stack.
 If pointer is NULL {
 Pop pointer from Protect_Stack.
 If pointer is NULL
 Push pointer onto GC_Stack
 Else
 Return
 }
 Else {
 If object type > 7
 Pass object location to free().
 else
 Push pointer onto Protect_Stack.
 Goto CG.
 }
 Upon calls to put_prop() or remprop():
 If object type > 7 {
 Remove/post information on Object's plist.
 }
 else {
 Set object's C bit to 1
 If there is data to remove from plist {
 Call free-structure(data).
 }
 Post information.
 }

To test the garbage collector, I wrote a program that first creates a lattice
structure, and then cycles through the following steps:
Increase the size of the lattice.
Mark-and-copy the lattice.
Collect garbage.
Unmark the lattice.
Repeat.
In one test run using an O(n{2}) algorithm, the original implementation
quickly depleted available memory in less than 12 iterations. The revised
implementation was still running at 1035 iterations, at which time I halted
the process.


_AN IMPROVED LISP-STYLE LIBRARY FOR C_
by Douglas Chubb



[LISTING ONE]

/*
 File MEMORY.C, part of C-LISP Library written by Douglas Chubb, 1991-92.
 Memory management using pointers and two marking bits as part of Object
"type"

 declaration.
*/

/** Memory Allocation and Deallocation Functions **/
/* Include Files */
#include <stdio.h>
#include <stdlib.h>
#include "lisp-header.h"
#include "int-lisp-syms.h"
/** Variables **//* memory_pointer_list -- pointer to linked list of memory
storage blocks */Pointer memory_pointer_list = NULL;/* temp_pointer_list --
pointer to linked list of temporally allocated blocks */Pointer
temp_pointer_list = NULL;/** Functions
**/void initialize_garbage_collector (void) { memory_pointer_list = NULL;
temp_pointer_list = NULL; } /* push_memory_pointer -- push pointer to block on
'memory_pointer_list' */void push_memory_pointer (Pointer p) { * (Pointer *) p
=
memory_pointer_list; memory_pointer_list = p; }/* pop_memory_pointer -- pop
pointer to block from 'memory_pointer_list' */Pointer pop_memory_pointer
(void) { Pointer p; p = memory_pointer_list; if (p != NULL) {
memory_pointer_list = * (Pointer *) p; return (p); } else error
("pop_memory_pointer: 'memory_pointer_list' is empty"); }/* push_temp_pointer
-- push pointer to block on 'memory_pointer_list' */void push_temp_pointer
(Pointer
p) { * (Pointer *) p = temp_pointer_list; temp_pointer_list = p; }/*
pop_temp_pointer -- pop pointer to block from 'temp_pointer_list' */Pointer
pop_temp_pointer (void) { Pointer p; p = temp_pointer_list; if (p != NULL) {
 temp_pointer_list = * (Pointer *) p; return (p); } else error
("pop_temp_pointer: 'temp_pointer_list' is empty"); }/* collect_garbage --
'safe_free' all malloc'ed data */void collect_garbage (void) { Pointer p, pp;
 if(memory_pointer_list == NULL) error ("collect_garbage: memory_pointer_list
empty'"); else { temp_pointer_list = NULL; while (memory_pointer_list != NULL)
{ p = pop_memory_pointer();
 pp = (char *) p + sizeof (Pointer); safe_free (pp); } while(temp_pointer_list
!= NULL) push_memory_pointer(pop_temp_pointer()); /* fill marked_block stack
*/ } }/* "C"
'free' with first byte of block set to zero */void safe_free (void *p) {
if(type((char *) p) <= 7) { * (char *) p = (char) 0; /* free block, including
header, for link in memory_pointer_list */ free ((char *) p -
sizeof (Pointer)); } else /* maybe store data temporarily on
'temp_pointer_list' */ push_temp_pointer((char *) p - sizeof (Pointer)); }/*
safe_malloc -- Unix 'malloc' wrapped inside test for sufficient memory
*/Pointer
safe_malloc (size_t size) { Pointer memory; static long num_calls = 0; /*
allocate block, including header for link in 'memory_pointer_list' */ memory =
malloc (size + sizeof (Pointer)); num_calls++; /* total_space += size;
*/ if (memory != NULL) { push_memory_pointer (memory); /* return beginning of
user data block */ return ((char *) memory + sizeof (Pointer)); } else error
("safe_malloc: out of memory" " (number
malloc calls = %ld) \n ", num_calls); }/* mark_object -- recursively marks
object "type" negative to save object iff object is either "unmarked" or, if
"marked", object has not been changed by 'put_prop' or 'remprop'
functions. */void mark_object (Object obj) { if (obj == NULL (type(obj) > 7 &&
(type(obj) & '\040') == 0)) return; /* 'obj' marked, but NOT changed => return
*/ else {
type(obj) = ntype(obj); mark2_object(obj); type(obj) = '\100' ntype(obj); /*
remove "changed = 040" tag */ } }/* mark2_object -- recursively marks the
object "type" negative */void mark2_object (Object obj) { if (obj ==
NULL) return; else switch (ntype(obj)) { case SYMBOL: if(type(obj) > 7 &&
(type(obj) & '\040') == 0) return; else { type(obj)
= '\100' ntype(obj); if(get_prop(obj, "pn") == NULL) symbol_plist(obj) =
first_put(list(make_string("pn"),
 make_string(symbol(obj)->print_name), T_EOF), symbol_plist(obj));
mark2_object(symbol_plist(obj)); mark2_object(symbol(obj)->value); }
 break; case STRING: case INTEGER: case FUNCTION: break; case PAIR: type(obj)
= type(obj) '\100'; /* mark type negative */ mark2_object
(first(obj)); mark2_object (but_first(obj)); break; default: error
("\nmark2_object: not standard object: %d", type(obj)); break; } type(obj) =
type(obj) '\100';
/* mark type negative */ }/* unmark_object -- recursively marks Object-type
positve to free Object */void unmark_object (Object obj) { if (obj == NULL
type(obj) <= 7) return; else switch (ntype(obj)) { case
SYMBOL: if(type(obj) == ntype(obj)) return; else { type(obj) = ntype(obj);
unmark_object(symbol_plist(obj));
unmark_object(symbol(obj)->value); symbol(obj)->print_name =
string(get_prop(obj, "pn")); } break; case STRING: case INTEGER: case
FUNCTION:
 break; case PAIR: type(obj) = ntype(obj); /* remove protect bit */
unmark_object (first(obj)); unmark_object (but_first(obj)); break; default:
error
("unmark_object: not standard object"); break; } type(obj) = ntype(obj); /*
remove protect bit */ } ng(symbol(obj)->print_name), T_EOF),
symbol_plist(obj));
mark2_object(symbol_plist(obj)); mark2_object(symbol(obj)->value); } break;
case STRING: case INTEGER: case FUNCTION: break; case PAIR:
 type(obj) = type(obj) '\100'; /* mark type negative */ mark2_object
(first(obj));


























September, 1992
THE UNIVERSAL DEBUGGER INTERFACE


A processor-independent specification that enables greater tool
configurability




Daniel Mann


Daniel is an engineer for Advanced Micro Devices and can be contacted at 5204
East Ben White Blvd., Austin, TX 78741.


It is generally more costly to develop programs for embedded processors than
for equivalent applications on engineering workstations. For one thing,
embedded-application code usually can't take advantage of underlying
operating-system support. Consequently, embedded systems developers might
choose to first install in their code a small debug-support monitor or
third-party executive. And, in the process of getting an embedded support
monitor running or developing application code to run directly on the
processor, emulation hardware might also be used. Thus, the availability and
configurability of debug tools is an important factor when selecting a
processor for an embedded project.
In this article, I'll discuss a processor-independent specification called the
Universal Debug Interface (UDI) that enables greater debug-tool
configurability. A number of emulator and embedded-monitor suppliers, as well
as high-level language debug-tool suppliers, are currently configuring their
tools to comply with a proposed UDI standard. Current implementations are
targeted for RISC-based code development. UDI should ease the selection of
tools and ultimately help developers move to RISC. Here I'll examine issues
involved in integrating the the UDI with GDB, the Free Software Foundation's
C-language source-level debugger.
GDB conforms to the UDI specification and is an example of a DFE process. As
an example of a TIP process, I'll look at the MiniMON monitor for the Am29000
RISC processor.


The Host/Target Problem


When developing code to run on engineering workstations, the host processor
that supports debugger execution is the same as the target. This means the
debugger can use operating-system services such as the ptrace() UNIX system
call to examine and control the program being debugged. (I'll go into more
detail about ptrace a little further on.) When developing code for an embedded
application, however, the target program and processor are usually different
from the host processor and debugger. The host and target processors do not
communicate via ptrace(), but through whatever hardware communication path
links the two. The portion of the debugger which controls communication with
the target processor is known as the "target interface module." Each time a
change or addition is required in the communications mechanism, the debugger
must be recompiled to produce a binary executable specific to the target
processor and target communications requirements.
The goal of the UDI is to provide a standard interface between the debugger
developer and the target communications module, so the two can be developed
and supplied separately. In fact, application developers could construct their
own communications module for some special hardware communications link, as
long as it complied with the standard. Additionally, debuggers from one
company could operate with emulators from another.


UDI


If UDI were a specification at the procedural level, debugger and
communications-module developers would have to supply linkable images of their
code so the debug-tool combination could be linked by the user. This isn't
desirable because a linked image would be needed for every tool combination.
Additionally, the final linked program would be required to run on a single
debug host. UDI actually relies on an interprocess communication (IPC)
mechanism to connect two different processes: the debugger front end (DFE)
process, whereby the debugger is linked into an executable program to run on
the host processor; and the target interface process (TIP), in which the
communications module is linked as a separate process which runs on the same
or a different host processor. The two processes communicate via the UDI
interprocess-communication specification.
Two IPC mechanisms have been specified by the UDI committee: One uses shared
memory and is intended for DOS developers; and the other uses sockets and is
intended for UNIX and VMS developers. Of course, when the shared-memory IPC
implementation is used, the DFE and TIP processes must both execute on the
same host processor. Using sockets with Internet domain communication enables
the DFE and TIP to execute on separate hosts on a computer network. Thus, a
target processor connected to a network node located in a remote hardware lab
can be debugged from the application developer's workstation. Using sockets
with UNIX domain addresses (the method used to implement UNIX pipes) enables
both processes to run on the same host.
Some currently available UDI-conforming debug tools are presented in Figure 1.
The interprocess-communications layer defined by UDI enables the application
developer to select any DFE front-end tool with any TIP target-control tool.
Because developers of UDI-conforming tools must each have code which
interfaces with the IPC mechanism according to the UDI protocol, the UDI
community freely shares a library of code know as the UDI-p library; see Table
1. This code presents a procedural layer which hides the IPC implementation.
In Example 1, for instance, the DFE code calls the UDIRead function, which
transports the function call to the TIP process. The TIP programmer must
resolve the function request by adding code specific to controlling the
particular target. Since the IPC layer is effectively transparent, the TIP
programmer is unaware that the procedure caller is from a different process,
possibly on a different host machine.
Table 1: UDI-p procedures.

 Procedure Operation
 --------------------------------------------------------------------

 UDIConnect Connect to selected TIP.
 UDIDisconnect Disconnect from TIP.
 UDISetCurrentConnection For multiple TIP selection.
 UDICapabilities Obtain DFE and TIP capability information.
 UDIEnumerateTIPs List multiple TIPs available.
 UDICreateProcess Load a program for debugging.
 UDISetCurrentProcess Select from multiple loaded programs.
 UDIDestroyProcess Discontinue program debugging.
 UDIInitializeProcess Prepare runtime environment.
 UDIRead Read data to target-processor memory.
 UDIWrite Write data to target-processor memory.
 UDICopy Duplicate a block of data in target memory.
 UDIExecute Start/continue target-processor execution.
 UDIStep Execute the next instruction.
 UDIStop Request the target to stop execution.
 UDIWait Inquire about target status.
 UDISetBreakpoint Insert a breakpoint.
 UDIQueryBreakpoint Inquire about a breakpoint.
 UDIClearBreakpoint Remove a breakpoint.


Example 1: Code layer hides IPC implementation.

 UDIRead( /* a UDI-p procedure */

 UDIResource from, /* source address on target */
 UDIHOSTMemPtr to, /* destination address on DFE host */
 UDICount count, /* count of objects */
 size_t size, /* size of each object */
 UDICount *count_done, /* count actual transferred */
 UDIBool host_endian); /* endian conversion flag */

Because the DFE and TIP processes may be running on different machines, you
must be careful when moving data objects between hosts. An int-sized object on
the DFE-supporting machine may be a different size from an int on the
TIP-supporting machine. Further, the machines may have different endian
formats. The UDI-p procedures use a machine-independent data-description
technique similar to the UNIX XDR library. Data is converted into a universal
data representation (UDR) format before being transferred via sockets. When
received, the data is converted from UDR format into data structures
appropriate for the receiving machine. The UDI-p procedures keep the UDR
activity hidden from the UDI user.


ptrace()


ptrace() is a UNIX system call that provides a way for one process to control
the execution of another executing on the same processor. The process being
debugged is said to be "traced." However, this does not mean that the
execution path of a process is recorded in a "trace buffer," as with many
processor emulators. Debugging with ptrace relies on the use of instruction
breakpoints and other hardware- or processor-generated signals that cause
execution to stop; for example, ptrace(request, pid, addr, data).
There are four arguments, the interpretation of which depends on the request
argument. Generally, pid is the process id of the traced process. A process
being debugged behaves normally until it encounters some signal--an internally
(processor) generated illegal instruction or externally generated interrupt,
for example. The traced process then enters a stopped state and the tracing
process is notified using the wait() system call. When the traced process is
in the stopped state, you can examine and modify its core image with ptrace().
If necessary, you can use another ptrace() request to terminate or to continue
the process. Table 2 lists ptrace() request services.
Table 2: ptrace() services.

 Request Operation
 -------------------------------------------------------------

 TraceMe Declare that the process is to be traced.
 PeekText Read one word in process's instruction space.
 PeekData Read one word in process's data space.
 PeekUser Examine the process-control data structure.
 PokeText Write one word in process's data space.
 PokeData Write one word in process's data space.
 PokeUser Write to the process-control data structure.
 Cont Startup process execution.
 Kill Terminate the process being debugged.
 SingleStep Execute the next instruction.
 GetRegs Read processor registers.
 SetRegs Write processor registers.
 ReadText Read data from process's instruction space.
 ReadData Read data from process's data space.
 WriteText Write data into process's instruction space.
 WriteData Write data into process's data space.
 SysCall Continue execution until system call.

However, you can't use ptrace to debug embedded-application software because
the process with the user-interface that controls debugging and the
application being debugged may not be executing on the same processor. DFE
must run on a separate processor and communicate with the processor supporting
execution of the application code.


The GDB-UDI Connection


In place of ptrace(), GDB can use a procedural interface which allows
communication with a remote-target processor. The procedures implement the
necessary protocols to control the hardware connecting the remote processor to
the "host" debug processor. By this means, GDB can be used to debug
embedded-application software running on application-specific hardware.
GDB 3.98 (and up) achieves this via procedure pointers which are members of a
target_ops structure. The procedures currently available are listed in Table 3
. According to GDB-configuration convention, the file remote-udi.c must be
used to implement the remote-interface procedures. In the case of interfacing
to the IPC mechanism used by UDI, the procedures in Table 2 are mapped into
the UDI-p procedures in Table 1. With the UDI-p library, it is simple to map
the GDB remote-interface procedures for socket communication with a remote
target processor.
Table 3: GDB remote-target operations.

 Function Operation
 ------------------------------------------------------------------------

 to_open() Open communication connection to remote target.
 to_close() Close connection to remote target.
 to_attach() Attach to a loaded and running program.

 to_detach() Detach for multitarget debugging.
 to_start() Load program into target-system memory.
 to_wait() Wait until target-system execution stops.
 to_resume() Startup/continue target-system execution.
 to_fetch_register() Read target-system processor register(s).
 to_store_register() Write register(s) in target-system processor.
 to_xfer_memory() Read/write data to target-system memory.
 to_insert_breakpoint() Establish an instruction break address.
 to_remove_breakpoint() Remove a breakpoint.
 to_load() Load a program into the target-processor memory.



The UDI-MiniMon Monitor Connection


MiniMon is not intended to be a stand-alone monitor. That is, it requires the
support of a software module located in a support processor--the TIP software
module. The Am29000 target processor communicates with the processor running
the TIP process via a serial link or other higher-performance channel. This
link supports a message system private to the MiniMon monitor--that is,
completely independent of the UDI protocol; see Figure 2.
Embedded systems developers are used to working with emulators that enable
code to be downloaded to application memory or installed in substitute overlay
memory. This avoids the development delays associated with running code from
EPROM. However, although emulators are the first stage in getting the target
hardware functional, once the processor can execute out of target-system
memory and a communications channel (like serial link) is available, the need
for an emulator decreases. Emulators are expensive, and it is not always
possible to make one available to each team member. The use of a debug monitor
(like MiniMon) during the software-debug stage of a project is an economical
alternative.
MiniMon must be installed in target-system ROM memory or downloaded by the
host via a shared-memory interface. The target-application code and additional
operating-system code can then be downloaded via the message system. If
changes to the code are required, the message system can quickly download new
code without changing any ROM devices.
Most monitors do not offer high-level language support. For example,
assembly-code instructions must be debugged rather than the original C code.
Using GDB in conjunction with MiniMon enables source-level debugging, which is
far more productive and necessary for large software projects.


Conclusion


Debug-tool developers are beginning to offer UDI-compliant tools. Typically
the DFEs are C source-level debuggers. This isn't surprising, as the increased
use of RISC-processor designs has resulted in a corresponding increase in
software complexity. The use of a high-level language such as C is more
productive than developing code at machine-instruction level. And further, the
use of C enables much greater portability of code among current and future
projects. The low cost of GDB makes it an attractive choice for developers.
Target processors and their control mechanisms are much more varied than DFEs.
In the MiniMon TIP I described here, for instance, a small amount of code
known as the "debugcore" is placed in processor ROM memory, enabling
examination of the processor state. The MiniMon TIP communicates with the
debugcore via a hardware link specific to the embedded-application hardware.
Additional TIPs already exist and others are under development. I know of an
Am29000 simulator (ISS) which runs on UNIX hosts. The DFE communicating with
the simulator TIP is unaware that the Am29000 processor is not present, but is
instead simulated by a process, executing on, say, a UNIX workstation. Tool
developers are also constructing TIP programs to control processor emulators,
which will make a top-of-the-line debug environment possible.
UDI broadens the choice of embedded-application development tools and
configurability options. Debugger front-end tools are supplied separately from
target-control programs. The user can consider cost, availability, and
functionality when selecting the debug environment.
Because debuggers like GDB are available in source form, developers can add
additional debug commands, such as examination of real-time operating-system
performance. This would require adding OS structural information into GDB.
When the debugger front end and, for example, emulator-interface module are
supplied as a single executable, adding new commands is not possible. Via the
use of Internet sockets, the debugger may execute on a different networked
host than the node supporting the emulator-control process.
































September, 1992
DEBUGGING REAL-TIME SYSTEMS


Modular and incremental development is the key


 This article contains the following executables: REAL.ARC


Gurjot Singh, Moses Joseph, and Dave Barnett


The authors all work for Lynx Real-Time Systems, where Gurjot is director of
product planning, Moses is vice president of marketing, and David is manager
of technical support. They can be contacted at 16780 Lark Ave., Los Gatos, CA
95030.


With some systems, missing 1 out of 10,000 samples may be quite acceptable.
When real-time, mission-critical data acquisition is involved, however,
missing this seemingly statistically insignificant sample renders the system
unreliable. The question that arises is how you build real-time systems that
logically perform the functions they're designed to and still meet the
deadlines of the time-critical tasks.
Modular and incremental development and debugging is one solution to this
problem. In this article, we'll use a simulated data-acquisition system to
describe this process. While the example has been simplified, the general
principles extend to more complex systems.
In the example, we used the LynxOS operating system, a 33-MHz 80486 PC clone,
and the DCC20A timer card from Industrial Computer Source (San Diego,
California) to create a self-profiling, real-time system. The system includes
a driver and a test application. We chose the DCC20A timer card because it
allowed us to generate programmable synchronous interrupts to simulate a
continuous synchronous real-time application, and it enabled us to profile our
real-time system.
The application was written to conform to real-time POSIX specifications
(POSIX 1003.4 and POSIX 1003.4a) and can be ported to any real-time
POSIX-compliant operating system. (POSIX.1, POSIX.4, and POSIX.4a provide a
standard Application Programming Interface (API) for real-time
implementations. POSIX.4 is at draft 12 and POSIX.4a is at draft 6. Both are
likely to be ratified in the near future.) The device driver is the only part
of the system specific to LynxOS.
A diagram of the data-acquisition system is shown in Figure 1. It has a
synchronous interrupting device, one producer task, and two consumer tasks. We
used two timers from the DCC20A timer card: Timer 5 to generate the
synchronous interrupts and Timer 1 as a count-down timer. Timer 5 is
programmed to reset Timer 1 at every interrupt. This allowed us to use Timer 1
to profile the various tasks. The interrupt service routine (ISR) for Timer 5
signals the producer task when the interrupt occurs. The producer task records
the task response-time data and signals the consumer tasks to read the times
out of the shared data. The consumer tasks execute and signal the producer
when they are done. The cycle then starts all over again at the next
interrupt. The objective is to ensure that all processing for a single pass be
done before the next interrupt arrives and that the data integrity is
preserved for the shared data. Mutexes and condition variables are used to
ensure data integrity and to synchronize access to the data. The overall flow
is shown in Figure 2.
In an ideal situation with no other interrupting devices, you'd expect to see
the sequence of events shown in Figure 3. In this case, when Timer 5 generates
an interrupt, the operating system invokes the ISR. After the ISR has
executed, the producer task is scheduled and run. Finally the consumer tasks
execute and the system waits for the next interrupt to occur. However, in the
real world you can have delays or "blocks" due to interrupts from other
devices or temporarily disabled preemption. These delays can impact the
system's overall performance.


Phase 1: Writing the Skeleton Application


During this initial phase, we created the skeleton application (see Listing
One, page 116) which includes the producer task and the two consumer tasks.
The detailed implementation of the producer and consumer modules (used for the
final testing and profiling) was completed later. Because of space
restrictions, this code is only available electronically. The C routines for
each task are simple and have profiling points built into them. The emphasis
is on debugging concepts rather than solving a specific problem.
The application consists of four modules: simulate.c, the main module;
producer.c, the producer; display.c, one of the consumers; and synch.c, the
synchronization module.
The simulate module sets up an array of consumers and then creates a thread
for each one. Each consumer runs at a priority 1 less than the producer's
priority. The main program takes the name of the timer_device as an argument.
It then sets the priority of the producer to 17 and initializes the
synchronization routines. At this point the main or root thread becomes the
producer thread.
The producer module waits for a signal from the ISR. For the debugging phase,
this is #defined to sleep(5). Notice that the record_trt() and record_tct()
are not used during the first phase. When an interrupt is received, the
producer signals the consumers that the data is present and then waits for
them to finish.
Consumer1 and Consumer2 are identical in functionality (although only
Consumer1 is shown in the display module). They wait for a signal from the
producer, complete their tasks, and then signal the producer when they are
done. We've left out the actual details for each consumer for the initial
testing.
The synch module is the most important module in the first phase. This is
where the synchronization between the producer and consumers is done.
Init_data allocates a mutex and two condition variables. It also initializes
an array of integers (used as Boolean values) to ensure a single access to
each filled buffer. A 0 value signifies that the consumer has not accessed the
buffer. Wait_for_consumers (called by the producer) simply blocks until all
the readers have finished. Signal_consumers resets the done_flags and
broadcasts to the consumers to perform their tasks. Signal_producer increments
the readers_done count when another consumer is done. If all the consumers are
done, it signals the producer with the write_cond. It also sets the done_flag
to 1 to prevent this consumer from reading again. Wait_for_producer prevents
the consumer from getting the same buffer by checking the value of the
done_flag.


Phase 2: Check Initialization and Synchronization


During this phase, we compiled and linked the application modules using Gcc
(GNU's C compiler) with the -g option that creates debug information for the
debugger. We used Ldb (Lynx's debugger) to load and debug the resulting
executable code. Our goal in debugging these modules was to ensure that the
initialization was working correctly and that the synchronization mechanism we
implemented worked as desired.
We set the following break points in Ldb: producer.c:8 (BP1)-start of
producer; display.c:7 (BP2)-start of consumer 1; and record.c:7 (BP3)-start of
consumer 2. We stepped through the simulate module and stopped at BP1. At this
point you see that the main thread (producer) is running, and the two
consumers are created but not active. Since Lynx's Ldb is a multithreaded
debugger, we could see that three threads had been created but only the main
thread, P1, was active; see Figure 4. If you click on GO again, the two
consumers generate a race condition trying to get CPU time, but we don't
really care which runs first. In this case we hit BP3 before BP2.


Slow-speed Simulation


We had #defined wait_for_interrupt to sleep(5). At this point, we had not yet
written the device driver to generate the interrupts, so we used sleep(5) to
simulate the generation of the interrupts. Ldb has a feature that informs us
whenever the SIGALRM occurs at the completion of the sleep; see Figure 5,
where we enable this condition. This allowed us to make sure that our
application ran periodically and that for every interrupt, the producer ran
first, followed by the two consumers, each of which ran only once. In fact, we
discovered a bug in the synch.c module when we ran the program using Ldb. We
found that for the first interrupt only, the consumers ran before the producer
had signaled them. This was because we had initialized the done_flag to 0
instead of 1. Ldb helped us track that down within the first few minutes of
our debugging process. Notice that we first debugged the skeleton application.
Trivial as it may sound, it's a good practice to do so before completing the
final application. If you wait until examining the final code, a
synchronization or an initialization bug is much harder to track down.
The kinds of bugs you encounter may be different, but the debugging process is
similar. If you don't have a multithreaded debugger, then you'll probably use
the age-old method of embedding printf statements to debug the application.


Phase 3: Driver and Application Completion


Once we had the synchronization mechanism debugged, we completed the rest of
our application and the driver that generates the interrupts. Some application
modules were modified so we could record the minimum and maximum values for
the parameters shown in Figure 3. As previously mentioned, the code for the
complete system is available electronically. Also provided electronically are
detailed programmer's notes that describe the programs and relevant modules.


Phase 4: Integration, Testing, and Profiling



After completing the application, we used Ldb to test it with the
LynxOS-specific written driver. Again, we set up the timer to generate
interrupts every minute so we could step through our producer and consumers to
guarantee that everything ran as expected. We also simulated a failure by
deliberately missing an interrupt.
An important goal in the debugging process of a real-time application is to
make sure that the system performs to its specifications and meets the various
response times described in the first part of this article. After we
integrated the device driver and application, we ran the application with many
different interrupt periods. The display module output for two of these
periods is shown in Figures 6(a) and 6(b) (page 116), which list the minimum
and maximum values during five-second intervals. (The column headings for
Figure 6 are: IRT, interrupt response time; IST, interrupt service time; INT,
interrupt handling time [IRT+IST]--these are the best and worst case [ IRT +
IST] numbers combined, that is, the best case INT is not the sum of the best
case of IRT and the best case of 1ST individually; SYS, system overhead
scheduling, context switch, any other interrupts; TRT, task response time.) In
each category, the left-hand column is the minimum and the right-hand column
is the maximum time in microseconds. The results were the minimums and
maximums for 8333 iterations for the 600-microsecond interval interrupt and
11,111 iterations for the 450-microsecond interval interrupt.
Figure 6(a) shows no overruns while 6(b), with its 450-microsecond period,
shows several. The overruns indicate that we had not completed the consumer
tasks before the next interrupt occurred--the period is less than the total
completion time (TRT+TCT). To find out whether the problem was in the
application code, driver routine, or in kernel overhead, we modified the
application to print each iteration instead of accumulating results for five
seconds; Figure 6(c) shows the results, in this case a small sample of the
results accumulated. We were looking for the overrun and the case prior to it.
In our system, we have three other interrupting devices: the disk interrupt,
keyboard interrupt, and system clock. The single-iteration listing shows that
SYS (case 1), IRT (case 2), or 1ST (case 3) durations are longer than the
minimum times for each category in Figure 6(a). This is probably due to a
higher-priority interrupt from one of the three aforementioned devices, but
the overall variations are within an expected range. This implies that we
probably have a long task completion time (TCT), which is causing us to miss
the next interrupt. When we raised the interrupt cycle to 600 microseconds, we
found that the overruns dropped to 0. In fact, if we were to do a rigorous
worst-case analysis, we would probably discover that the worst cycle is
between 450 and 600 microseconds.


Summary


We have stepped through a typical development and debugging cycle of a simple,
but representative, real-time application. The example demonstrates how to
develop and debug the application in a systematic and incremental way, taking
advantage of a powerful but user-friendly multithreaded debugger like Ldb.
Since deterministic performance is the most critical issue in real time, we
focused on profiling the real-time application and verifying correctness in
terms of worst-case performance. In the real world, you can expect to find
more complex situations, but the principles demonstrated here can be extended,
and in fact are even more important. Since every major real-time vendor is
promising to support the real-time POSIX programming interfaces, we avoided
using any proprietary LynxOS calls so that the example should be applicable on
any system where the POSIX-compatible versions are available.


_DEBUGGING REAL-TIME SYSTEMS_
by Gurjot Singh, Moses Joseph, and Dave Barnett


[LISTING ONE]

/* simulate.c -- Simulate a real-time system and measure and/or record.
** 1. Device interrupt response time; 2. Device driver interrupt service
** time; 3. Task response time
** Each measured time is exagerated by a constant amount of time equal to the
** length of time it takes to make the measurement.
*/

#include <stdio.h>
#include <pthread.h>

#define PRODUCER_PRIO 17

extern void producer();
extern void display();
extern void record();

struct {
 void (*f)(); /* task entry point */
 int p_bias; /* priority relative to producer (always negative) */
} consumers = {
 { display, -1 },
 { record, -1 }
};
#define NCONSUMERS (sizeof consumers / sizeof consumers[0])
main(argc, argv)
int argc;
char *argv[];
{
 int i;
 if (argc != 2) {
 fprintf(stderr, "Usage: %s timer_device\n", argv[0]);
 exit(1);
 }
 init_data(NCONSUMERS);
 for (i = 0; i < NCONSUMERS; i++) {
 pthread_attr_t attr;
 pthread_t tid;

 pthread_attr_create(&attr);
 pthread_attr_setinheritsched(&attr, PTHREAD_DEFAULT_SCHED);
 pthread_attr_setprio(&attr, PRODUCER_PRIO + consumers[i].p_bias);
 if (pthread_create(&tid, attr, consumers[i].f, i) == -1) {

 perror("pthread_create");
 exit(1);
 }
 }
 init_timer(argv[1]);
 producer();
 exit(0);
}
/* producer.c */
#define wait_for_interrupt() sleep(5)
#define record_trt()
#define record_tct()

void producer()
{
 for (;;) {
 wait_for_interrupt();
 record_trt();
 signal_consumers();
 wait_for_consumers();
 record_tct();
 }
}
/* consumer1 */
void display(id)
int id;
{
 for (;;) {
 wait_for_producer(id);
 signal_producer(id);
 }
}
/* synch.c */
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

static pthread_mutex_t mutex;
static pthread_cond_t write_cond; /* O.K. to write */
static pthread_cond_t read_cond; /* O.K. to read */
static int readers_done;
int *done;
static int num_readers;

void init_data(readers)
int readers;
{
 int i;
 if (pthread_mutex_init(&mutex, pthread_mutexattr_default) == -1) {
 perror("pthread_mutex_init");
 exit(1);
 }
 if (pthread_cond_init(&write_cond, pthread_condattr_default) == -1) {
 perror("pthread_cond_init");
 exit(1);
 }
 if (pthread_cond_init(&read_cond, pthread_condattr_default) == -1) {
 perror("pthread_cond_init");
 exit(1);

 }
 num_readers = readers;
 readers_done = num_readers;
 if (!(done = (int *)malloc(num_readers * sizeof(int)))) {
 perror("malloc");
 exit(1);
 }
 for (i = 0; i < num_readers; i++) done[i] = 1;
}
void wait_for_consumers()
{
 pthread_mutex_lock(&mutex);
 if (readers_done != num_readers) {
 pthread_cond_wait(&write_cond, &mutex);
 }
 pthread_mutex_unlock(&mutex);
}
void signal_consumers()
{
 int i;
 pthread_mutex_lock(&mutex);
 readers_done = 0;
 for (i = 0; i < num_readers; i++) done[i] = 0;
 pthread_cond_broadcast(&read_cond);
 pthread_mutex_unlock(&mutex);
}
void signal_producer(id)
int id;
{
 pthread_mutex_lock(&mutex);
 readers_done++;
 if (readers_done == num_readers) {
 pthread_cond_signal(&write_cond);
 }
 done[id] = 1;
 pthread_mutex_unlock(&mutex);
}
void wait_for_producer(id)
int id;
{
 pthread_mutex_lock(&mutex);
 if (done[id]) pthread_cond_wait(&read_cond, &mutex);
 pthread_mutex_unlock(&mutex);
}


















September, 1992
CRASH TRACEBACKS IN UNIX


Let your program tell you where it crashed


 This article contains the following executables: TRACE.ARC


Alan Dunham


Alan is manager of graphics development at Landmark/ITA, a Calgary,
Alberta-based division of Landmark Graphics. He can be reached at adunham@ita
.lgc.com.


Some time ago, when I developed code on VAX/VMS systems, programs that crashed
would give a stack traceback, a list of subroutines that told me what part of
the program was currently being executed (see Figure 1). Subroutines compiled
with debug information had a line number associated with them. This traceback
often enabled me to find the cause of the crash without using a debugger.
Figure 1: List of subroutines identifying parts of a program being executed
(VAX/VMS).

 %FOR-E-OUTCONERR, output conversion error
 unit -5 file
 user PC 00AC9E6F
 %TRACE-E-TRACEBACK, symbolic stack dump follows

 module name routine name line rel PC abs PC
 SIO_OPEN SIO_OPEN 90 0000018A 00AC023E
 RFE_INFO RFE_INFO 87 11 00AB7AE1
 REFED REFED 210 6F 00AB766F

When we ported our code from VAX to UNIX, program crashes no longer gave a
traceback. Instead, we got informative error messages like "Segmentation
Violation." Since our Fortran programs were huge and didn't have dynamic
memory allocation, we had to disable the generation of the resulting huge core
files. To make matters worse, we had a couple of numerical programmers who had
never used a debugger! It was clear we needed some kind of crash traceback--so
we wrote one.
In addition to subroutine names and line numbers, our traceback gives
parameter values and local symbol values. This is quite an asset, even though
the current implementation doesn't dump structures. Figure 2 is an example of
the traceback format.
Figure 2: An example of the traceback format.

 user=alan host=vader date=Thu Apr 16 12:08:25 1992
 program=./sparc
 SIGSEGV: segmentation violation signal=11(3)
 no mapping at the fault address
 ---------- traceback ----------
 file test.c line 21 function tst_segv()
 file test.c line 104 function tst_lv1()
 file test.c line 166 function main()
 -------------------------------
 tst_segv()
 -- local symbols for tst_segv --
 laa = -19088744 0xfedcba98 (int)
 prt = 0x0 (pointer to char)
 tst_lv1(paramtext)
 paramtext = "call 1"
 -- local symbols for tst_lv1 --
 l1 = -554829090 (0xdeedfade) (int)
 ttext = "level1"
 main(argc,argv)
 argc = 1 (0x1) (int)
 argv = 0xf7fffb04 (pointer to *char)
 *argv = "sparc"
 -- local symbols for main --
 b = 32 (0x20) (int)
 pi = 3.14159 (float)

 ss = 0 (0x0) (short)
 random = 5 (0x5) (int)

The traceback has been implemented for the SUN OS and IBM AIX versions of
UNIX. Some of the code is similar, while some is system dependent. Since I
developed most of the code by exploring, I'll pass on information that will
enable you to extend the method to other hardware. Because of space
constraints, the entire system, which I've implemented for SPARC and the IBM
RS/6000, is available electronically. It includes: stackdump.c, a program to
give a stack dump; a side-by-side listing of two SPARC stack dumps showing
changes when a different function is called (comments and frame boundaries
have been added to the listing); a side-by-side listing of two IBM RS/6000
stack dumps showing changes when a different function is called (again, I've
added comments and frame boundaries); a sample frame dump; a set of tracebacks
for the seven different values of the variable "random" in file test.c;
test.c, a program to demonstrate and test the traceback; sparc.c, the
traceback code for the SPARC systems; and ibm.c, the traceback code for
RS/6000.


Overview


A simple traceback requires code that installs a signal handler, traces back
each stack frame, and converts addresses to function-line numbers. A more
informative traceback also has code that prints subroutine parameters and
local symbols.
When a program crashes, control is transferred to our traceback subroutine by
the UNIX signal-handling mechanism. Near the start of a program, we call the
system function SIGNAL with the signals we wish to catch and the name of the
traceback subroutine.
Tracing back the stack is done by finding all the stack frames, each of which
corresponds to one function call. Each stack frame contains a return address
and a stack pointer. The stack pointer is used to find the next stack frame.
"Walking the stack" is continued until there are no more stack frames. We
store the return addresses for the stack frames and then translate them to
function-line numbers.
When an executable file is generated by compiling and linking with -g, that
executable contains debugging information. Part of this information is a
memory address for each line of each function. Each function is also
identified by name. We scan the relevant portion of the executable file to see
if each line-number address matches any of our return addresses.


Catching Signals


The SIGNAL function takes two parameters: the name of the signal to be caught
and the name of the subroutine to do the catching. Example 1is a typical call.
Example 1: A typical call to the SIGNAL function.

 signal ( SIGSEGV.
 signal_handler_routine ):

 The important signals we wish to
 catch are:

 SIGSEGV segmentation violations:
 SIGABRT sent by system abort
 SIGFPE floating point exception
 SIGBUS bus error

Other signals are defined in signal.h (or sys/signal.h). Most crashes are
caught by the signals in subroutine trb_signal in the signal handling and
stack traceback code; see sparc.c and ibm.c, available electronically. On the
SPARC, I couldn't catch integer divide at the correct function without
SIGABRT. Instead, the traceback would start at the function that called the
crashing function. The SPARC also needs a call to ieee_handler to catch
floating-point errors. On the RS/6000, I'm currently only catching SIGSEGV.
Floating-point errors are usually turned off to speed up the pipelining. The
function fp_enable_all is supposed to turn on error catching, but I haven't
had any success with it so far.


The Signal Handler


When a program crashes and issues a signal we're interested in, control is
transferred to the signal-handler function. This function walks the stack,
storing a return address for each function. One of the arguments passed to the
signal handler is a structure which contains an initial stack pointer and an
initial program counter (return address). This initial stack pointer does two
things: It tells us where to find the stack, and it tells us what stack
pointers look like in terms of a string of hex digits. When we examine the
stack to find where in a stack frame the stack pointer is, we will be looking
for a string of similar hex digits. Similarly, the initial return address
tells us what other return addresses look like.
The signal handler has several tasks to do. It should describe the cause of
the crash, trace back the stack, get information from the executable, and
print out the traceback. It should also decide whether to stop program
execution or to continue. It can also help us to build itself by printing out
a stack dump and frame dumps.
To find the arguments to the signal handler, type man signal at the UNIX
prompt; see Example 2. To find the fields in the scp structure, look for a
definition in signal.h or look at it via dbx. Hopefully, these fields
correspond to the stack pointer and the return address.
Example 2: Signal-handler arguments.

 void trb_handle (sig, code, scp)
 int sig, code;
 struct sigcontext *scp;



More Detail on the Stack


The skeleton program in Figure 3 illustrates the stack for a simple calling
sequence. Each function that has had its execution suspended as a result of
calling another function will have a stack frame on the stack. The stack grows
towards low addresses. Since we are starting at the bottom function, we start
at the low-address end of the stack and walk back to the high-address end.
Figure 3: Skeleton program that illustrates the stack for a simple calling
sequence.

 main()
 { fun1 ();
 }
 void fun1 ()

 { fun2();
 }
 void fun2()
 {float a,b,c;
 c=0.0;
 a=b/c;
 __________
 Low Stack frame
 address for fun2
 __________
 Stack frame
 for fun1
 __________
 High Stack frame
 address for main
 __________

The stack pointer in one stack frame points to the stack frame of the calling
function. I Figure 3, the stack frame for fun2 contains a stack pointer that
points to the stack frame for fun1. Similarly, the stack frame for fun1
contains a stack pointer that points to the stack frame for main. The stack
frame for main contains a stack pointer that points to the end of main's stack
frame. The contents of this next frame are 0, indicating that main is the top
level function.
The return address provides us with the address inside the calling function
where the function is called. To continue our example, the return address in
the stack frame of fun2 gives the address in fun1 where fun2 was called. When
we translate the address to a source-code file-line number, we get the line
number in fun1 in which fun2 was called.
If we are looking at a new hardware architecture, we may know only the initial
stack pointer and the initial return address. We need to know where in a stack
frame the stack pointer and the return address are.


Using a Stack Dump


The locations of the return address and the stack pointer within a stack frame
and found by examining a dump of the stack. To maximize the information
contained in the dump, it's best to nest several subroutine calls, the bottom
of which prints out the dump in hex and ASCII. Finding the middle of each
subroutine's stack frame is simplified if each subroutine contains a unique
text string. Stackdump.c (Listing One, page 113) generates a stack dump.
Example 3shows an abbreviated version of the sample output generated by
stackdump.c. (Complete versions of both SPARC and RS/6000 stack dumps are
available electronically.) The first column gives an address in the stack,
while the second column gives the value at that address. If you circle all
values in the second column that start with Ohf7ff, you will have circled all
potential stack pointers. If you can find these values in the first column,
you should circle them there as well. This will give all possible starting
positions for stack frames. Now you must eyeball the stack dump and look for
circles on the right that are a constant position from a circle on the left.
For the SPARC, this difference is 14 longwords.
Example 3: Sample output generated by stackdump.c (Listing One).
 f7fff9c8 7efefeff #start of frame
 ...
 f7fff9ec f7fffa40 #stack pointer
 ...
 f7fffa2c 66756e32 fun2 #local variable
 f7fff9ec 20746578 text
 ...
 f7fffa40 7efefeff #start of next frame
 ...



Dissection of a Stack Frame


Assuming that we've been successful in finding the offset of the stack pointer
from the start of a stack frame, we can now explore the stack frame. If the
value of FRAMEDUMP is changed from 0 to 1 in sparc.c, we'll get a stackdump
broken up into frames. (An example dump of the stack when FRAMEDUMP = 1 is
available electronically.) Stackdump.c ( Listing One) prints addresses for the
start of all functions. Hopefully, we'll find similar addresses in each stack
frame which are a fixed offset away from the start of each frame. If we're
still having difficulty in finding them, we should substitute the call to
fun1a() in main with fun1b(); the stack frame for fun1b should be very similar
to the stack frame for fun1a, with the major difference being the return
address for fun1b being different than the return address for fun1a.
The output of the stackdump, summarized inExample 4, is in two pairs of
columns. The left pair is the output when main calls function fun1a while the
right pair is the output when main calls function fun1b. The important
difference is at address 7ffffa04, the return address from stack frame fun2.
The return address 22e0 corresponds to function fun1a while address 2310
corresponds to fun1b. These addresses are between the appropriate function
starting addresses.
Example 4: SPARC stackdumps showing the difference between calling function
fun1a vs. function fun1b from main. Note that the return address in function
fun2 is different. This file has been edited to remove lines that are the same
for both runs; this saves space and emphasizes the difference.

 main address=2290 main address=2290
 fun1a address=22c0 fun1a address=22c0
 fun1b address=22f0 fun1b address=22f0
 fun2 address=2320 fun2 address=2320
 /*_______________________________________________________*/
 /* stack frame for function fun2 */
 /*_______________________________________________________*/
 f7fff9c8 7efefeff ~ f7fff9c8 7efefeff ~
 ...
 f7fffa00 f7fffa40 @ f7fffa00 f7fffa40 @ /* stack pointer */
 f7fffa04 000022e0 " f7fffa04 00002310 # /* return address DIFFERS */
 ...
 f7fffa2c 66756e32 fun2 f7fffa2c 66756e32 fun2

 f7fffa30 20746578 tex f7fffa30 20746578 tex
 f7fffa34 74000000 t f7fffa34 74000000 t
 /*_______________________________________________________*/
 /* stack frame for functions fun1a(left) & fun1b (right) */
 /*_______________________________________________________*/
 f7fffa40 7efefeff ~ f7fffa40 7efefeff ~
 ...
 f7fffa78 f7fffab0 f7fffa78 f7fffab0 /* stack pointer */
 f7fffa7c 000022b0 " f7fffa7c 000022b0 " /* return address */
 ...
 f7fffaa0 66756e31 fun1 f7fffaa0 66756e31 fun1
 f7fffaa4 61207465 a te f7fffaa4 62207465 b te
 f7fffaa8 787400b0 xt f7fffaa8 787400b0 xt
 /*_______________________________________________________*/
 /* stack frame for function main */
 /*_______________________________________________________*/
 f7fffab0 11400086 @ f7fffab0 11400081 @
 ...
 f7fffae8 f7fffb20 f7fffae8 f7fffb20 /* stack pointer */
 f7fffaec 00002064 d f7fffaec 00002064 d /* return address */
 ...
 f7fffb10 6d61696e main f7fffb10 6d61696e main
 f7fffb14 20746578 tex f7fffb14 20746578 tex
 f7fffb18 74000020 t f7fffb18 74000020 t

Except for the change of text string, the only difference between fun1a and
fun1b for the IBM output is the return address. The function starting
addresses turn out to be pointers to the real addresses.
Figure 4 illustrates the concept of a stack frame. I'll refer to the low
address end of a stack frame as the "stack pointer," and the high address end
of a stack frame as the "frame end." On the SPARC, local variables are
referenced via a negative offset relative to the frame end. This means that we
must find the next stack pointer to find the locals for a given stack pointer.
Parameters coming into a function are referenced via a positive offset from
the calling function's stack pointer as the space for the parameters was
allocated in the calling function.
Figure 4: A stack frame.

 _________
 Low <--Frame start
 address _________ (Stack pointer)
 
 _________
 
 _________
 
 _________
 SP +n Next SP Next stack
 bytes _________ pointer
 PC Program
 _________ counter
 
 _________
 
 _________
 Last declared
 _________ local variable
 
 _________
 First declared
 _________ local variable
 
 _________ <--Frame end
 
 _________ Parameters
 High 
 address _________




The Executable File


The executable files on a SPARC are similar to the System V COFF executables.
An executable file contains a header structure whose fields contain the
location of the symbol table, the location of the string table, and the number
of symbols. An executable that is linked with the -g flag contains a symbol
table and a string table. The symbol table contains information about all
symbols in the source file, including source-file names, function names,
source-line numbers, subroutine parameters, local symbols, and more. The
symbol table does not contain text information; instead it has a pointer into
the string table. Each symbol in the symbol table is read into a structure
which contains the symbol's type (and other fields).
The symbol type determines the meaning of the rest of the structure. For a
source-line symbol the structure contains the line number and the address. For
a source filename, the structure contains an offset into the string table,
which contains the name of the source file (ditto for subroutine names).
Parameter symbols contain a parameter type, a stack offset value, and a
pointer to the string table. Local symbols are the same as parameters.
A traceback conversion consists of reading the symbols sequentially and
storing the last file and subroutine names found until an address is found
that matches one of the addresses in the traceback list. At this point we
store the filename, subroutine name, and line number and continue reading
symbols in order to match the remaining addresses.


Parameters


To print out subroutine parameter values, each level of the traceback needs to
know its associated filename and function name. If we also know the position
in the symbol table for the start of the function, we can quickly find the
parameters for the function by seeking to the beginning of the symbols for the
function, then reading symbols. Any symbols of type parameter are printed,
until a new filename or subroutine name is found. The string table contains
the name of the parameter (as it is named in the source file). The stack
offset value lets us find the parameter's location in the stack frame. The
parameter type tells us how the parameter was declared, enabling us to print
its value as an int, float, and so on. Local symbols are found using the same
steps used for parameters.


Speed Considerations and Code Limitations


Because you must compile and link the system with -g, compiler optimization is
removed. As such, tracebacks may only be appropriate for in-house usage. This
is still quite helpful, as it gives a user something concrete to report,
especially if the traceback is written to a file. Speed is not a major factor
after the crash. For a large executable, there could be 100K symbols, but we
only need to scan the file once. Example 5 shows how to call the traceback
start-up routine from a user program.
Example 5: Calling the traceback startup routine from a user program.

 main(argc,argv)
 int argc;
 char *argv[];
 {
 /* declarations */

 /* enable crash tracebacks */
 trb_signalinit(argc,argv);

 /* the program
 ...
 */
 }

Note that neither the SPARC or IBM system prints structures yet, nor does the
IBM yet catch floating-point errors or have a data dictionary. (The SPARC
data-dictionary code has yet to be optimized.) Also, the RS/6000 version has a
hardwired define called PCADJUST that converts stack addresses to COFF file
addresses. There should be a function to make this conversion, as it may not
be constant. The code does not handle executables not linked with the -g flag,
and not all kinds of arrays are printed out:


Conclusion


Not only is the exploration of program stacks and COFF files interesting, it
is also very practical. Stack traceback can often be used to find the cause of
a program crash very quickly. A stack traceback that prints values of
parameters and local symbols is often as informative as dbx and is quicker.



[LISTING ONE]

/* stackdump.c -- a program to dump the stack */

#define SPARC 1
#define IBM 0

void fun1a();
void fun1b();
void fun2();
void stackdump();

main() /* call function fun1a or function fun1b */

{
 char text[16];
 strcpy(text,"main text");
 fun1a();
}
void fun1a()
{
 char text[16];
 strcpy(text,"fun1a text");
 fun2();
}
void fun1b()
{
 char text[16];
 strcpy(text,"fun1b text");
 fun2();
}
void fun2()
{ int jj;
 char text[16];
 strcpy(text,"fun2 text");
#if SPARC
 printf("main address=%x\n",main);
 printf("fun1a address=%x\n",fun1a);
 printf("fun1b address=%x\n",fun1b);
 printf("fun2 address=%x\n\n",fun2);
#else if IBM
 printf("main address=%x -> %x\n",main, *(unsigned long *)main);
 printf("fun1a address=%x -> %x\n",fun1a, *(unsigned long *)fun1a);
 printf("fun1b address=%x -> %x\n",fun1b, *(unsigned long *)fun1b);
 printf("fun2 address=%x -> %x\n",fun2, *(unsigned long *)fun2);
#endif
 stackdump(&jj-32); /* the 32 gives us the stack before variable jj */
}
void stackdump(start)
unsigned long start;
{
 int i,j;
 for (i=0;i<128;i++)
 {
 printf("%08x ", (long)start);
 printf("%08x ", *(unsigned long *)(start));
 for (j=0;j<4;j++,start++)
 printf("%c", isprint( *(unsigned char *)(start)) ?
 *(unsigned char *)(start) : ' ');
 printf("\n");
 }
}














September, 1992
WINDOWS TOOLHELP


Presenting a ToolHelp-based programmer's tool for unloading apps and DLLs


 This article contains the following executables: UNLOAD.ARC


Mike Sax


Mike is a Windows programming and development consultant. He has been
programming in Windows since the advent of version 1.02 and can be reached via
CompuServe at 75740, 1403.


ToolHelp is a dynamic link library (DLL) that provides functions to peek and
poke into the internals of Windows 3.1. You can avoid reengineering your
application every time a new version of Windows is released by using the
documented ToolHelp API, instead of relying on undocumented Windows features.
Microsoft intends to include a new version of TOOLHELP.DLL for every new
release of Windows.
This article provides an overview of ToolHelp functions and how they can be
used in your applications. I then present a programming utility that lets you
remove DLLs and programs stuck in memory without having to restart Windows to
recompile.


ToolHelp Functions


The first category of ToolHelp functions lets you obtain information about the
local or global heap, and about current tasks, modules, and window classes in
the system. These ToolHelp functions use a calling protocol similar to the DOS
FindFirst and FindNext functions for getting a directory of files.
For example, to get a list of all currently loaded modules, you first call the
ModuleFirst function with a long pointer to a ModuleEntry structure, which is
declared in TOOLHELP.H. In addition to various fields that contain information
about the module you retrieved, the ModuleEntry structure also contains a
dwSize field.
It is important to initialize this field to sizeof(MODULENTRY) before calling
GlobalFirst. ToolHelp checks the dwSize field to determine how much
information about the module your GlobalEntry structure can contain. Future
versions of ToolHelp will be able to support applications written for
different versions of ToolHelp by examining the dwSize field.
If the ModuleFirst function returns True, you can call ModuleNext. You can
keep calling ModuleNext with a long pointer to the same data structure as long
as the return value is True; all ToolHelp enumeration functions work this way.
Although these enumeration functions don't let you modify the system, some
interesting uses are nevertheless possible. For example, you can use the
ToolHelp enumeration functions to verify certain parameters passed to a
function. If a function in your program is passed an hDC parameter (a handle
to a Windows device context), you can walk the local heap of the GDI to
determine whether the hDC is valid.
The InterruptRegister and InterruptUnregister functions let you install a
function to intercept special events: processor-generated events like division
by 0; parity error; invalid opcode and protection faults; debug interrupts;
and even a Ctrl+Alt+SysRq key press. Utilities like Microsoft's Dr. Watson and
Borland's WinSpector use these ToolHelp functions to produce postmortem
information when an application encounters a serious error. These utilities
dump a stack trace and a disassembled listing of the code that produced the
error.
Say you have an extensive beta program; you can incorporate your own interrupt
handler to dump global (or even local) variables and other
application-specific status information that general-purpose utilities like
Dr. Watson or WinSpector don't provide. Installing such handlers on beta
testers' PCs gives you invaluable information and saves hours of development
time.
To intercept less critical events you can use the NotifyRegister and
NotifyUnregister functions. Your notification handler can be called when a DLL
or program is loaded or freed, when a segment is loaded or discarded, or when
a task switch occurs. Windows can also call your notification handler function
when it wants to show a debug message or get debug input from the programmer.
ToolHelp contains a number of functions that don't really fit into any
category. These include MemoryRead() and MemoryWrite(), which let you write to
any memory location, even if the memory doesn't belong to your program. The
functions also work if you want to write to a code segment. The TimerCount
function lets you retrieve information about timer resolution. Other, more
specialized functions let you trace the stack of a sleeping task, instantly
terminate an application, or give control directly to a certain task.


Unload


Unload is a programmer's utility that lets you remove any Windows program or
DLL from memory. It is useful when developing a Windows application or DLL
that has a problem getting unloaded. Instead of exiting Windows to remove the
module from memory, you can just unload and recompile. Listing One (page 118)
shows UNLOAD.C. The files UNLOAD.H, UNLOAD.ICO, UNLOAD.RC, and UNLOAD.EXE are
available electronically.
You only need Unload when you can't remove a library or program using
"standard" methods like selecting the End Task option in the task list or
closing the application's window. This can occur if a library's usage count is
higher than 0, but the application using it was terminated because of an
unrecoverable application error (UAE), or if an application does not have any
visible windows.
Unload displays a combo box of all modules (programs and DLLs) loaded in your
system. Since fonts and drivers are also DLLs, they are in the list. Programs
are displayed in upper case, DLLs in lower case. To fill up this combo box,
Unload calls the ToolHelp ModuleFirst and ModuleNext functions (see
FillUpComboBox() in Listing One) and sends a CB_ADDSTRING message to add the
module name of every module to the combo box. Immediately after each module
name, CB_SETITEMDATA attaches the module handle and usage count.
When you select a module from that combo box, Unload displays information
about it: the usage count, whether it's a library or a program, the module's
filename, and the module handle. As soon as the combo box drops down or
selects an item, it sends a WM_COMMAND notification message to its parent, the
dialog box. When the dialog function has been notified that the combo box's
list box is dropped down, we refresh the combo box's contents just before it's
displayed. When a new item is selected, we display the corresponding
information.
Almost all the information displayed about the module--usage count, module
filename, and module name--is available in the ToolHelp MODULEENTRY data
structure, which is filled up with a call to ModuleFindName. To get the module
handle of the currently selected item, we send a CB_GETITEM_DATA message to
the combo box. When filling up the combo box, we use CB_SETITEMDATA to attach
this value to every item.
The only information we need that is not in the MODULEENTRY structure is the
type of the module: library or program. How can we check this? A module handle
is really the selector of a block of memory that contains the module file
header; the header is in the "new EXE" format and has been documented by
Microsoft. If you take the word at offset 0xC and you AND it with 0x8000, you
have the bit which indicates the type of module. If this bit is on, the module
is a DLL; otherwise, it's a progam.
Although Windows 3 lets you get away with creating a long pointer from the
selector and the offset and reading the bit directly, future versions of
Windows might impose a stricter separation between different applications.
Therefore, always use ToolHelp's MemoryRead function to read any part in
memory.
After selecting a module, you can press the Unload button to remove it from
memory. If you've selected a DLL, there is only one way to unload it: Decrease
the reference count using the Windows FreeLibrary function.
In the case of a program, the situation is a little more complicated. There
can be several reasons why your application won't terminate: The application's
main window might be hidden so you can't close it; your message loop might
continue forever because your program didn't process a WM_QUIT message; or you
forgot to call PostQuitMessage() when your main window was closed.
To accommodate these cases, Unload offers three methods to remove an
application: You can post a WM_CLOSE message to the application's main
windows, even if they are hidden; post a WM_QUIT message to the application
(this is the equivalent of calling PostQuitMessage() inside your application);
or instantly terminate the application, as if a UAE had occurred.


Conclusion


Unload is one of those utilities that you won't need very often, but when you
need it, you need it badly. Writing a utility like Unload without ToolHelp
would require a lot of inside information and a very clear understanding of
Windows internals. Thanks to ToolHelp, creating your own tools has become
easier and safer.



[LISTING ONE]


////////////////////////////////////////////////////////////////////////////
// UNLOAD.C Copyright (c) 1992 by Mike Sax
// Unload is a small programmer's utility that lets you remove any program
// or DLL that is stuck in memory.
////////////////////////////////////////////////////////////////////////////
#define STRICT 1
#include <windows.h>
#include <string.h>
#include <toolhelp.h>
#include "unload.h"

// Global variables:
static HINSTANCE ghInstance;

// Exported functions:
BOOL FAR PASCAL _export MainDlgProc(HWND hDlg, unsigned message, WORD wParam,
 LONG lParam);
BOOL FAR PASCAL _export WarningDlgProc(HWND hDlg, WORD wMessage, WORD wParam,
 LONG lParam);
BOOL FAR PASCAL _export EnumTaskWindowsFunc(HWND hWnd, DWORD lParam);

// Internal functions:
void static FillupComboBox(HANDLE hComboBox);
void static ShowItemInfo(HWND hDlg, HWND hComboBox);
void static KillTask(HTASK hTask, int nMethod);
BOOL static IsDLL(HMODULE hModule);

// The WinMain function is called at the beginning of our program
int PASCAL WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance,
 LPSTR lpCmdLine, int nCmdShow)
 {
 ghInstance = hInstance;
 (void)lpCmdLine;
 if (!hPrevInstance)
 {
 WNDCLASS wc;
 // Register private dialog class
 wc.style = 0l;
 wc.lpfnWndProc = DefDlgProc;
 wc.cbClsExtra = 0;
 wc.cbWndExtra = DLGWINDOWEXTRA;
 wc.hInstance = ghInstance;
 wc.hIcon = LoadIcon(hInstance, "Unload");
 wc.hCursor = LoadCursor(NULL, IDC_ARROW);
 wc.hbrBackground = GetStockObject(WHITE_BRUSH);
 wc.lpszMenuName = NULL;
 wc.lpszClassName = "Unload";
 if (!RegisterClass(&wc))
 return -1;
 }
 // Use a dialog box as our main window and pass the nCmdShow parameter
 // in lParam of MainDlgProc's WM_INITDIALOG.
 return DialogBoxParam(ghInstance, "Unload", NULL, (DLGPROC)
 MakeProcInstance((FARPROC)MainDlgProc, ghInstance),
 (LONG)nCmdShow);
 // Proc Instance will be automatically cleaned up by Windows.
 }
// MainDlgProc handles all messages for our main window.
BOOL FAR PASCAL _export MainDlgProc(HWND hDlg, unsigned message, WORD wParam,

 LONG lParam)
 {
 switch (message)
 {
 case WM_INITDIALOG:
 // We passed the nCmdShow parameter of WinMain in lParam
 ShowWindow(hDlg, LOWORD(lParam));
 break;
 case WM_CLOSE:
 EndDialog(hDlg, FALSE);
 break;
 case WM_COMMAND:
 switch(wParam)
 {
 // Combobox notification message
 case IDD_COMBOBOX:
 if (HIWORD(lParam) == CBN_DROPDOWN)
 FillupComboBox((HANDLE)LOWORD(lParam));
 else if (HIWORD(lParam) == CBN_SELCHANGE)
 {
 static BOOL bFirst = TRUE;
 if (bFirst)
 {
 SetDlgItemText(hDlg, IDD_MESSAGE, "Compuserve: 75470,1403");
 bFirst = FALSE;
 }
 ShowItemInfo(hDlg, (HANDLE)LOWORD(lParam));
 }
 break;
 // User pressed the "Unload" button
 case IDD_UNLOAD:
 {
 FARPROC lpProc;
 int nCurSel = (int)SendDlgItemMessage(hDlg, IDD_COMBOBOX,
 CB_GETCURSEL, 0, 0l);
 // If no item selected, do nothing
 if (nCurSel == CB_ERR)
 {
 MessageBeep(0);
 break;
 }
 lpProc = MakeProcInstance((FARPROC)WarningDlgProc,
 ghInstance);
 // Call "Unload" dialog box and pass the module handle
 // in lParam of WarningDlgProc's WM_INITDIALOG
 if (DialogBoxParam(ghInstance, "WARNING", hDlg,
 (DLGPROC)lpProc, SendDlgItemMessage(hDlg,
 IDD_COMBOBOX, CB_GETITEMDATA, nCurSel, 0l)))
 {
 // Give Windows a chance to process the WM_QUIT
 // or WM_CLOSE messages we might have posted
 Yield();
 FillupComboBox(GetDlgItem(hDlg, IDD_COMBOBOX));
 }
 FreeProcInstance(lpProc);
 }
 }
 break;
 default:

 return FALSE; // We did not process the message
 }
 return TRUE; // We processed the message
 }
// FillupComboBox fills up the combo box with a list of all modules
// that are currently loaded. Every item in the list box also contains
// a "long" data item (attached using CB_SETITEMDATA) that is a combination
// of the module handle and the usage count.
void static FillupComboBox(HANDLE hComboBox)
 {
 int nIndex;
 BOOL bSucces;
 HMODULE hSelectedModule;
 MODULEENTRY ModuleEntry;
 // Keep the module handle of the item that is currently selected
 nIndex = SendMessage(hComboBox, CB_GETCURSEL, 0, 0l);
 hSelectedModule = (HMODULE) ((CB_ERR == nIndex) ? -1 :
 HIWORD(SendMessage(hComboBox, CB_GETITEMDATA, nIndex, 0l)));
 SendMessage(hComboBox, CB_RESETCONTENT, 0, 0l);
 ModuleEntry.dwSize = sizeof (MODULEENTRY);
 bSucces = ModuleFirst(&ModuleEntry);
 while(bSucces)
 {
 if (IsDLL(ModuleEntry.hModule))
 AnsiLower(ModuleEntry.szModule);
 nIndex = (int)SendMessage(hComboBox, CB_ADDSTRING, 0,
 (LONG) (LPSTR) ModuleEntry.szModule);
 if ((nIndex != CB_ERR) && (nIndex != CB_ERRSPACE))
 {
 SendMessage(hComboBox, CB_SETITEMDATA, nIndex,
 MAKELONG(ModuleEntry.wcUsage, ModuleEntry.hModule));
 bSucces = ModuleNext(&ModuleEntry);
 }
 else
 bSucces = FALSE;
 }
 // Check if the previously selected module is still in the list and
 // if so, reselect it.
 for (nIndex = SendMessage(hComboBox, CB_GETCOUNT, 0, 0l) - 1;
 nIndex >= 0 ; --nIndex)
 if ((HMODULE) HIWORD(SendMessage(hComboBox, CB_GETITEMDATA,
 nIndex, 0)) == hSelectedModule)
 {
 SendMessage(hComboBox, CB_SETCURSEL, nIndex, 0l);
 break;
 }
 ShowItemInfo(GetParent(hComboBox), hComboBox);
 }
// Show information about the currently selected item in the combobox.
void static ShowItemInfo(HWND hDlg, HWND hComboBox)
 {
 int nCurSel;
 nCurSel = (int)SendMessage(hComboBox, CB_GETCURSEL, 0, 0l);
 if (CB_ERR == nCurSel)
 {
 SetDlgItemText(hDlg, IDD_FILENAME, "");
 SetDlgItemText(hDlg, IDD_MODULE, "");
 SetDlgItemText(hDlg, IDD_KIND, "");
 SetDlgItemText(hDlg, IDD_USAGE, "");

 EnableWindow(GetDlgItem(hDlg, IDD_UNLOAD), FALSE);
 }
 else
 {
 char szScrap[MAX_PATH + 1];
 char *pcFilename;
 DWORD dwData;
 dwData = SendMessage(hComboBox, CB_GETITEMDATA, nCurSel, 0l);
 GetModuleFileName((HMODULE)HIWORD(dwData), szScrap, MAX_PATH);
 // Remove the path from the filename
 pcFilename = strrchr(szScrap, '\\');
 pcFilename = (pcFilename == NULL) ? szScrap : pcFilename + 1;
 SetDlgItemText(hDlg, IDD_KIND, (IsDLL((HMODULE)HIWORD(dwData)) ?
 "Library" : "Program"));
 SetDlgItemText(hDlg, IDD_FILENAME, pcFilename);
 SetDlgItemInt(hDlg, IDD_USAGE, LOWORD(dwData), FALSE);
 wsprintf(szScrap, "%04x", HIWORD(dwData));
 SetDlgItemText(hDlg, IDD_MODULE, (LPSTR) szScrap);
 EnableWindow(GetDlgItem(hDlg, IDD_UNLOAD), LOWORD(dwData));
 }
 }
// When the user pressed the Unload button, the "warning dialog" appears
BOOL FAR PASCAL _export WarningDlgProc(HWND hDlg, WORD wMessage, WORD wParam,
 LONG lParam)
 {
 static HMODULE hModule; // Only one dialog can be active!
 switch(wMessage)
 {
 case WM_INITDIALOG:
 // The handle of the module to be freed is in the hiword of iParam,
 // passed on using DialogBoxParam. Since we use same dialog box for
 // both programs and libraries, we have to adjust our dialog a
 // a little, depending on the type of dialog.
 if (IsDLL((HMODULE)HIWORD(lParam)))
 {
 EnableWindow(GetDlgItem(hDlg, IDD_TERMINATE), FALSE);
 EnableWindow(GetDlgItem(hDlg, IDD_DESTROY), FALSE);
 }
 else
 SetDlgItemText(hDlg, IDD_REFERENCEZERO, "Post WM_QUIT message");
 CheckDlgButton(hDlg, IDD_REFERENCEZERO, 1);
 hModule = (HMODULE)HIWORD(lParam);
 break;
 case WM_COMMAND:
 switch(wParam)
 {
 case IDOK:
 {
 if (IsDLL(hModule))
 {
 int nUsage = GetModuleUsage(hModule);
 while (nUsage--)
 FreeLibrary(hModule);
 }
 else
 {
 BOOL bSucces;
 TASKENTRY TaskEntry;
 int nMethod =

 (IsDlgButtonChecked(hDlg, IDD_REFERENCEZERO)) ? 0 :
 (IsDlgButtonChecked(hDlg, IDD_TERMINATE)) ? 1 : 2;
 TaskEntry.dwSize = sizeof(TASKENTRY);
 bSucces = TaskFirst(&TaskEntry);
 while(bSucces)
 {
 if (TaskEntry.hModule == hModule)
 KillTask(TaskEntry.hTask, nMethod);
 bSucces = TaskNext(&TaskEntry);
 }
 }
 EndDialog(hDlg, TRUE);
 }
 break;
 case IDCANCEL:
 EndDialog(hDlg, FALSE);
 break;
 }
 break;
 case WM_CLOSE:
 EndDialog(hDlg, FALSE);
 break;
 default:
 return FALSE;
 }
 return TRUE;
 }
// KillTask kills a task using a method of your choice. It is called
// from the "Warning" dialog box when the user presses Ok.
void static KillTask(HTASK hTask, int nMethod)
 {
 switch(nMethod)
 {
 case 0: // Post WM_QUIT message
 PostAppMessage(hTask, WM_QUIT, 0, 0l);
 break;
 case 1: // Terminate application
 TerminateApp(hTask, NO_UAE_BOX);
 break;
 case 2: // Close all the task's windows
 {
 FARPROC lpProc = MakeProcInstance((FARPROC)EnumTaskWindowsFunc,
 ghInstance);
 EnumTaskWindows(hTask,(WNDENUMPROC)lpProc, 0l);
 FreeProcInstance(lpProc);
 }
 break;
 }
 }
// EnumTaskWindowsFunc is called for every toplevel window that belongs to
// a task. It simply posts a WM_CLOSE message to this window.
BOOL FAR PASCAL _export EnumTaskWindowsFunc(HWND hWnd, DWORD lParam)
 {
 (void)lParam; // Avoid compiler warnings
 if (GetParent(hWnd) == NULL)
 PostMessage(hWnd, WM_CLOSE, 0, 0l);
 return TRUE;
 }
// IsDLL returns TRUE if the specified module is a Dynamic Link Library, or

// FALSE if it is a program.
BOOL static IsDLL(HMODULE hModule)
 {
 int i;
 // The module handle is really the selector of a far pointer to
 // the new-style .EXE header of the module. The bit at 0x8000 of
 // the word at offset 0xC in this structure is set if it's a DLL.
 MemoryRead((WORD)hModule, 0xCl, &i, sizeof(i));
 return (i & 0x8000) ? TRUE : FALSE;
 }




















































September, 1992
A VIDEO COMPATIBILITY INTERFACE FOR TURBO DEBUGGER


High-resolution graphics support for TDW




Danny Thorpe


Danny is a quality assurance engineer in the Turbo Pascal development team at
Borland International. He can be reached on CompuServe at 76646,1035.


The Video Compatibility Interface (VCI) of Borland's Turbo Debugger for
Windows 3.0 (TDW) allows an external DLL to handle all the video-mode
switching for a particular video card or chipset. This means that, among other
benefits, if TDW's default VGA support can't handle your favorite
high-resolution Windows graphics modes, VCI lets you create or install a
custom video driver that can.
TDW ships with DLLs to support the ATI and Tseng chip sets and has built-in
support for standard VGA mode (640x480 with 16 colors). If you're familiar
with your VGA card's modes and memory layout, you can write TDW video DLLs to
suit specific needs. This article describes services your video DLL must
provide for TDW, showing code that supports the S3 graphics accelerator of the
Orchid Fahrenheit 1280 card.


A Little History


TDW is a Windows application that runs in text mode. As you debug Windows
applications, TDW switches between text and graphics modes, preserving the
video state each time. Text mode normally uses the same video memory as
graphics mode, so preserving the state of the video system frequently requires
copying some or all of the graphics screen to an off-screen buffer. This chore
requires knowledge of the video system's internal architecture.
Saving the state of a VGA system running in a standard resolution with the
stock Windows VGA driver is a reliable operation--the video registers, color
palette, and video memory layout are all accepted industry standards. But with
Super-VGA, all bets are off from one video card to the next. At super-VGA
resolutions (800x600 or higher), each video card's memory can be laid out
differently, or different numbers can be used to represent the various video
modes that the card supports. The Video Electronics Standards Association
(VESA) Super-VGA standards have helped this situation, but those standards
were not well established at the time TDW was in development.
As a result, TDW 2.5 officially supported only the standard VGA resolution on
the stock Windows VGA driver. Some video cards' 800x600 modes work in TDW 2.5,
but not all, so little was said about them. TDW 3.0 addresses this by
providing an external DLL that TDW calls to perform mode switches and other
video-related chores. As a file separate from the TDW executable, the video
DLL can be removed or replaced with a version specific to a particular brand
of video hardware.


Using a Video DLL


When TDW 3.0 loads, it looks in the TDW directory and on the path for a file
called TDVIDEO.DLL. If it finds such a file, TDW loads it into memory and uses
that file for all its video-mode switching. Users can install a video DLL by
simply renaming a distribution file to TDVIDEO.DLL. For example, if you have
an Orchid Prodesigner IIs and would like to run Windows in 800x600 mode with
TDW, you rename the TSENG.DLL shipped with TDW to TDVIDEO.DLL. (Note that TDW
3.01 reads the video DLL name from the TDW.INI file.) The Tseng DLL should
allow TDW to work with the high-resolution modes of most video cards built
around the Tseng Labs chip set, such as the Orchid Prodesigner IIs.


The Video DLL Interface


Table 1 lists the nine functions that TDW requires of a video DLL. The example
code is in Turbo Pascal. For conversions to C, just remember that an integer
is a 16-bit signed int, a word is a 16-bit unsigned int, a procedure is a void
function, and Pascal is case insensitive.
Table 1: The TDW video-interface functions.

 function VideoInit: Word;
 function VideoDone: Word;
 procedure VideoDebuggerScreen;
 procedure VideoWindowsScreen;
 function VideoGetTextSelector
 (Display: Integer): Word;
 procedure VideoSetCursor(X, Y: Word);
 function VideoBigSize: Word;
 procedure VideoSetSize(BigFlag: Word);
 function VideoIsColor: Word;

The VideoInit function is called by TDW to initialize the video DLL and to
determine whether the DLL can support the video hardware and current graphics
mode. Note that this is not the DLL initialization code called by Windows when
the DLL is loaded into memory. (The DLL initialization code is the Begin..End
block at the end of a Turbo Pascal library source file, or LibMain in C.) TDW
will call VideoInit some time after the DLL has been loaded. It is recommended
that all long-term memory allocations and initializations occur in VideoInit
and not in the DLL initialization code. Remember that the system will already
be in graphics mode--an interesting reversal from most DOS programs.
The return value of VideoInit should indicate whether an error occurred during
initialization of the DLL, or if the video system or current mode is not
supported by the DLL. The return values and their meanings are listed in Table
2. Any nonzero return value is considered an error and will cause TDW to
unload the video DLL. Except for the result code NoNeed, errors will be
reported to the user, and TDW will terminate. VideoDone will not be called if
VideoInit fails, so clean up any memory allocations before failing VideoInit.
Table 2: VideoInit return codes.

 Success = 0; { Success }
 BadCard = 1; { Incorrect video hardware was detected }
 BadMode = 2; { Unsupported video mode of supported card was detected }
 NoMemory = 3; { Could not allocate the memory needed from Windows }

 NoNeed = 4; { Standard VGA detected (TDVIDEO.DLL not required }
 Error = 5; { Miscellaneous error }

You will probably wind up using BIOS services to query the video card and set
modes. Keep in mind that TDW and your video DLL run in protected mode, but
most BIOS execution requires real mode. Windows does a pretty good job of
handling the protected-to-real mode switching, but sometimes you have to
manually set up the BIOS request using special DPMI service requests.
BIOS functions that require a realmode buffer address to be passed in a
segment register (such as ES or DS) will most likely require a DPMI Simulate
Real Mode Interrupt call, because realmode addresses cannot be placed in a
segment register in protected mode. This is demonstrated in the Fahrenheit DLL
code shown in Listing One, page 120.
The VideoDone function seen in Table 1 is called by TDW to shut down the video
DLL before TDW exits back to Windows. It is not called if VideoInit has
failed. The return value should be 1 to indicate success or 0 to indicate
failure. All memory allocated by the DLL, presumably in VideoInit, should be
disposed of in VideoDone. Do not rely on the DLL's Windows Exit Procedure
(WEP) as a means of freeing up resources. VideoWindowsScreen will be called
before VideoDone, so VideoDone doesn't need to worry about screen modes or
states.
The VideoDebuggerScreen procedure is responsible for switching the video
system from graphics mode to text mode. Any copying of video memory or
registers should be done here before the switch to text mode. For many
systems, only a simple BIOS interrupt (Int $10, function $83) is needed to
switch to text mode. The preservation beforehand is the hardware-specific
part.
The VideoWindowsScreen procedure is responsible for switching from text mode
back to graphics mode and restoring anything saved away earlier. If you're on
a VESA-compatible video system, you should be able to use Int $10, function
$4F02 to switch to your graphics mode. Again, restoring the video state is the
challenging part, not the switch itself.
Your graphics mode shouldn't change during a TDW session, so the graphics-mode
number should be noted in VideoInit. If the graphics mode does change in a
debugging session (probably because of an error in your video DLL), bail out
fast!


Call Sequences


VideoInit will be the first DLL function called by TDW; VideoDone will be the
last function TDW calls before unloading the DLL. The last five functions
listed in Table 1 may be called as TDW sets up for text-mode operation or at
other times, but their call order is not particularly significant or
interesting.
It is important that the VideoDebuggerScreen and VideoWindowsScreen procedures
will be called frequently. Except for lightweight instructions in the program
under test (such as variable assignments or register operations), TDW will
first call VideoWindowsScreen, execute the test code, and then call
VideoDebuggerScreen to switch back to the debugger's text mode. This sequence
of switch-to-graphics, execute, switch-to-text will be performed every time a
CALL instruction is stepped over, or a "Run to breakpoint" is requested.
You should keep VideoDebuggerScreen and VideoWindowsScreen as lean and fast as
possible. If you have to copy graphics video memory to a main memory buffer,
try to move only what's absolutely necessary. For example, text mode certainly
doesn't use the entire megabyte of video memory found in most Super-VGA cards.
You might be able to get by with saving just the first few kilobytes of each
graphics color plane, depending upon the layout of your video card's memory.
You'll probably have to save the VGA registers and color palette, but those
are relatively small and painless tasks. Between calls to VideoWindowsScreen
and VideoDebuggerScreen, of course, the graphics screen image may change,
because Windows and the program under test are given an opportunity to
execute. VideoDebuggerScreen will have to save the current graphics screen
every time it is called.


The Other Support Functions


The VideoGetTextSelector function is called when TDW needs the selector
(protected-mode segment handle) of the tested mode's text-video memory. If
Display is 0, return the selector for the physical address 0xB800 (color). If
Display is 1, return the selector for the physical address 0xB000
(monochrome). Use the Windows predefined selectors _B800H and _B000H.
The VideoSetCursor function is responsible for setting the cursor position on
the text-mode screen. The text-cursor position is controlled by a standard VGA
register, so most VGA cards can use the sample code accompanying this article.
TDW will call this function to make the text cursor disappear by placing the
cursor at an off-screen position.
The VideoBigSize function should return the highest number of text lines that
the video hardware and video DLL support. This will usually be 43 lines for
EGA or 50 lines for VGA. TDW does not currently make use of modes higher than
80x50.
The VideoSetSize function is used to change the debugger text screen from
25-line mode to 43/50-line mode (BigFlag=1) or from 43/50-line mode to 25-line
mode (BigFlag=0). An appropriate text font for the indicated text mode must be
loaded by this function. This would be an 8x8 pixel font for 43/50-line mode,
or the standard 8x14 pixel font for 25-line mode.
Finally, the VideoIsColor function should return 1 if the video system is
running in color mode, or 0 if running in monochrome mode.


Name Requirements and Indexes


The functions exported by the video DLL must have the names shown in this
article, in upper case. Pascal is internally case insensitive, but all
exported DLL function names are forced into upper case when the compiler
generates the DLL. The DLL function indexes are not important, as TDW locates
the DLL functions by name rather than index.
The module name of the video DLL must be TDVIDEO. Therefore, in Turbo Pascal
for Windows, the filename of the DLL source module should be TDVIDEO.PAS, so
that the compiler will make TDVIDEO the DLL's internal module name. As
mentioned earlier, the video DLL must be named TDVIDEO .DLL in order for TDW
to find and load the file, but for distribution your video DLL should have a
human-readable filename indicating what type of card or chipset the DLL
supports. TDW ships with ATI.DLL and TSENG.DLL; I called my driver
FAHR1280.DLL.


Debugging a Debugger Driver


It doesn't make sense to sic the debugger on a DLL that the debugger is using,
and you can't u e TDW to debug a test program of your video routines, because
TDW doesn't support the video modes that your video routines require. However,
you can certainly write a small Windows program to load your prototype video
DLL and call a sequence of the DLL functions with "tight" delay loops between
the calls. You don't want Windows to get control while you're staring at the
text-mode screen, so just have the test program spin in a loop for a few
seconds before moving on to the next video-DLL function call.
If your video DLL doesn't switch modes 100 percent correctly, it will probably
corrupt the Windows graphics system beyond salvation or lock up the CPU. Be
prepared for frequent reboots and power-downs while experimenting with your
video DLL. You'll probably apologize to your monitor before it's all over,
too.
It's fairly safe to use Windows' MessageBeep function to find out if a section
of code is being executed. You can also write to a log file without
interfering with the video system. Be careful about writing anything out to
the display, however. When you're in text mode, your compiler's text output
library routines should work, but don't be surprised if they don't. In Turbo
Pascal, Writeln works fine on the text-mode screen that the Fahrenheit DLL
sets up, but I discovered that the screen-position index is reset to 0,0 each
time the system reenters text mode. I had my program Writelning text to the
screen after each mode switch to verify that the text-mode memory was not
being erased, but I found that a call to GotoXY was needed to keep the Writeln
statements from overwriting each other. Similar caveats may exist in other
languages or standard output libraries.


Sample Source Code


The sample code (see Listing One, page 120) for this article is the complete
TPW source code for a TDW video DLL to support the Orchid Fahrenheit 1280
video card. The Pascal code contains blocks of inline assembly language where
machine instructions or register access is needed or convenient.
I chose the Fahrenheit because TDW doesn't support the Fahrenheit's S3
graphics accelerator architecture, and it has a special dual text/graphics
mode which reduces the amount of effort required to switch between graphics
and text modes. The lazy approach works quite well in the Fahrenheit's 640x
480x256 and 800x600x256 graphics modes. The dual text/graphics mode of the
card sets aside the top 64K of video memory for text mode and remaps all
text-mode memory accesses into that high block. As long as the graphics system
doesn't write in that top area of video memory (a big assumption, as noted
later), and you're careful not to clear the graphics screen or the text screen
when switching modes, the dual mode is a quick and useful solution. I did have
to copy and save the VGA color palette.
The higher modes (1024 and 1280) of the Fahrenheit card use the entire 1 Mbyte
of the card's video memory, so I don't expect the dual text/graphics mode
feature to work in those modes without some memory preservation and swapping.
To support the higher modes, the DLL would probably have to revert to copying
a portion of the graphics memory to a main-memory buffer.
Orchid warns that all the Fahrenheit Windows drivers may use the top 64K of
video memory as a scratch pad or off-screen storage area. The Windows drivers
are not aware of the dual text/graphics mode, so the video DLL should take
precautions and save the graphics image from the top memory area before each
switch to text mode. However, in the months since I created this video DLL, I
haven't encountered a memory overwrite problem in the higher modes, so I
haven't gone to the extra trouble of copying the image back and forth.
This Fahrenheit video DLL may work with other manufacturers' S3 86C911
accelerator-based video cards, but because this code looks for the Orchid
signature in the VESA description field, its VideoInit will fail with an
"incompatible hardware" return code. If you disable the signature checking,
you should be able to at least experiment with this code on your S3-based
video card.
You are now armed with a functional specification of the Turbo Debugger for
Windows 3.0 Video Compatibility Interface. You also have source code for a
Turbo Pascal for Windows DLL performing the low-level video manipulations
required for that specification, supporting the fast and affordable new S3
graphics-accelerator architecture at the heart of the Orchid Fahrenheit 1280.
We took advantage of a fairly obscure dual-mode feature of the S3 chip to make
the video DLL an almost trivial exercise. Armed with this specification and
sample code, and with some additional information about your particular video
hardware, you can make TDW debug Windows applications on your video hardware
running in your favorite graphics modes.


Acknowledgments


Thanks to Dave Wilhelm and Jeff Peters at Borland, and to Marc Warden at
Orchid.



References


86C911 GUI Accelerator Technical Reference. Santa Clara, CA: S3 Inc.
DOS Protected Mode Interface (DPMI) Specification 1.0. Intel Corp., 1991.
Ferraro, Richard F. Programmer's Guide to the EGA and VGA Cards, second
edition. Reading, MA: Addison-Wesley, 1990.


_A VIDEO-COMPATIBILITY INTERFACE FOR TURBO DEBUGGER_
by Danny Thorpe



[LISTING ONE]
{------------------------------------------------------------

 Windows video interface DLL for Borland's Turbo Debugger for Windows 3.0

 Orchid Fahrenheit 1280 graphics accellerator card driver version 1.0

 ------------------------------------------------------------}

{ Copyright (C) 1992 by Danny Thorpe }

{$D TDW 3.0 Video DLL 1.0 for Orchid Fahrenheit 1280 }

{$A+,B-,D-,F-,G+,L-,N-,R-,S+,V+,W-,X+}



library fahr1280; {

 This driver is for the Orchid Fahrenheit 1280 video card. The modes supported
are 640x480x256 and 800x600x256.

 Borland does not provide technical support on this file or the TDW video DLL
interface, and reserves the right to change the DLL requirements in future
versions of TDW.

 The Fahrenheit's graphics coprocessor is an 86C911 by S3 Inc. The S3
Technical Reference was the source for all enhanced mode video information
used by this driver. VESA bios calls are used to identify the Orchid card.

-Danny Thorpe }

uses Wintypes, Winprocs, strings;

Type

 PWordArray = ^WordArray; WordArray = array [0..($FFF0 div sizeof(Word))] of
Word;

 VesaInfoBlock = record Signature : array [1..4] of char; { not a PChar, =
'VESA'} VersionMinor : byte; VersionMajor : byte; OEMString : PChar; { points
into BIOS } Capabilities : longint; VideoModes :
PWordArray; { points to string of supported modes, $FFFF terminated}
OrchidCardName: array [0..255-18] of char; { a copy of OEMString } { a copy of
VideoModes array follows the OEMString null terminator} end;

 RealRegs = record { for DPMI simulated real mode interrupts } rDI, rSI, rBP,
Reserved, rBX, rDX, rCX, rAX: Longint; rFLAGS, rES, rDS, rFS, rGS, rIP, rCS,
rSP, rSS: Word; end;



const

 { the Fahrenheit 1280 graphics modes }


 m640x480x256 = $201; { Works fine } m800x600x16 = $202; { not supported in
this driver version} m800x600x256 = $203; { Works fine } m1024x768x16 = $204;
{ not supported in this driver version} m1024x768x256 = $205; { not
supported in this driver version} m1280x960x16 = $206; { not supported in this
driver version} m1280x1024x16 = $208; { not supported in this driver version}

 HighestVGAMode = $13;

 RegisterLock1 = $48; RegisterLock2 = $A0;

 VesaDontErase = $8000;





var { global variables }

 GraphicsMode: Word; VGAPalette: array [0..255,0..2] of byte; LockStates:
array [0..2] of byte; ColorScreen: Boolean; VGA50LineMode: Boolean;

{ Utility functions }

procedure SavePalette; near; var i,j: byte; begin Port[$3C7] := 0; for j := 0
to 255 do for i := 0 to 2 do VGAPalette[j,i] := Port[$3C9]; end;



procedure RestorePalette; near; var i,j: byte; begin Port[$3C8] := 0; for j :=
0 to 255 do for i := 0 to 2 do Port[$3C9] := VGAPalette[j,i]; end;



procedure LockEnhancedRegisters; near; { Disable coprocessor commands and
register access } begin Port[$3d4] := $40; Port[$3d5] := Port[$3d5] and not 1;
Port[$3d4] := $40; Port[$3d4] := $38; Port[$3d5] := $0; { lock 1 } Port[$3d4]
:= $39;
Port[$3d5] := $0; { lock 2 } end;



procedure UnlockEnhancedRegisters; near; { Give ourselves access to the
graphics coprocessor commands & registers } begin Port[$3d4] := $38;
Port[$3d5] := $48; { Enhanced mode unlock 1 } Port[$3d4] := $39; Port[$3d5] :=
$a0; { Enhanced mode
unlock 2 } Port[$3d4] := $40; Port[$3d5] := Port[$3d5] or 1; { Enable Enhanced
commands & registers } end;



procedure SaveLockStates; near; begin Port[$3d4] := $38; LockStates[0] :=
Port[$3D5]; { Enhanced mode lock 1 = $48 if unlocked} Port[$3d4] := $39;
LockStates[1] := Port[$3d5]; { Enhanced mode lock 2 = $A0 if unlocked }
Port[$3d4] := $40;
LockStates[2] := Port[$3d5] and 1; { Enable Enhanced commands & registers }
end;



procedure RestoreLockStates; near; begin UnlockEnhancedRegisters; Port[$3d4]
:= $40; Port[$3d5] := Port[$3d5] or LockStates[2]; Port[$3d4] := $38;
Port[$3d5] := LockStates[0]; Port[$3d4] := $39; Port[$3d5] := LockStates[1];
end;



function VESAGetVideoMode: Word; near; assembler; asm mov ax, $4F03 int $10
mov ax,bx { move video mode from bx to ax - our function result has to be in
ax } end;



function VESAGetInfo(var InfoBlock: VesaInfoBlock): Boolean; near; var Regs:
RealRegs; Twins: Longint; { Use VESA bios calls to verify that this is an
Orchid Fahrenheit 1280 card } { Since we have to pass a pointer to a buffer to
this VESA call in
the es:di registers, and BIOS requires real mode addresses, and real mode
address can't be loaded into es:di in protected mode, we have to use a DPMI
Simulate Real Mode Interrupt function. }

begin Twins := GlobalDosAlloc(sizeof(InfoBlock)); asm mov di, ss mov es, di
lea di, Regs push di mov cx, 21 { size of Regs } xor ax,ax cld rep stosw {
zero out Regs data } pop di

 mov ax, $300 { DPMI Simulate Real Mode Interrupt function } mov word ptr
Regs.rAX, $4F00 { vesa get info } mov bx, word ptr Twins[2] mov word ptr
Regs.rES, bx { info block (for bios) is at es:0000, real mode } mov cx, 0 mov
 bx, $10 int $31

 jc @@1 { jump if there was a DPMI error } mov ax, word ptr Regs.rAX push ds
mov ds, word ptr Twins[0] xor si, si { info block (for us) is at ds:si,
protected mode } les di, InfoBlock mov cx, 128 rep

movsw { copy the data from the local block to the parameter } pop ds

 cmp al, $4F { Was vesa call accepted? } jne @@1 { If not, jump to error
section }

 or ah,ah { Did the function execute successfully? } jnz @@1 { If not, jump to
error section }

 mov @Result, 1 { vesa info successfully retreived } jmp @@2

@@1: mov @Result, 0 { fail the initiallization - incorrect video card }

@@2: end; GlobalDosFree(LoWord(Twins)); end;



{****************************************************************} function
VideoInit: Word; export; { Called when TDW first loads up. All dynamic
allocation and chip and video mode detection should be preformed here. Any
failure will cause
TDW to unload TDVIDEO.DLL.

 Return codes are: }

const

 Success = 0; { success } BadCard = 1; { Incorrect video card was detected }
BadMode = 2; { Unsupported video mode of correct card was detected } NoMemory=
3; { Could not allocate the memory needed from Windows } NoNeed = 4; { Regular
video mode detected (TDVIDEO.DLL not required) } Error = 5; { Misc. error }

var CardInfo: VESAInfoBlock;

begin

 { Verify that we're running an Orchid Fahrenheit 1280 card. } if (not
VESAGetInfo(CardInfo)) or (stricomp(CardInfo.OrchidCardName, 'Orchid
Technology Fahrenheit 1280') <> 0) then begin VideoInit := BadCard; Exit; end;

 GraphicsMode := VESAGetVideoMode; case GraphicsMode of m640x480x256,
m800x600x256 : ; { do nothing } else begin if GraphicsMode < HighestVGAMode
then VideoInit := NoNeed else VideoInit := BadMode;
Exit; end; end;

 VGA50LineMode := False; { 80x25 standard text mode}

 SaveLockStates;

 UnlockEnhancedRegisters;

 { Put the Fahrenheit card into dual-page (graphics and text) mode } asm mov
ax, 4fffh mov bx, 2 int 10h end;

 ColorScreen := True;

 VideoInit := Success; end;

{****************************************************************} function
VideoDone: Word; export; { Called when TDW exits back to Windows. All memory
allocated must be freed by the end of this function (Don't rely on ExitProc).
1 means
success, 0 means it failed. } { All memory for this DLL is statically stored
in the data segment. } begin RestoreLockStates; { Put the enhanced mode locks
back the way we found them. } VideoDone := 1; end;



{ Magic functions to get the selectors of these areas of physical memory }
procedure __B000; far; external 'KERNEL' index 181; procedure __B800; far;
external 'KERNEL' index 182;

{****************************************************************} function
VideoGetTextSelector(Display: Integer): Word; export; { Called when TDW needs
the selector (protected mode segment) value of the text mode screen requested.
If display
is 0, return the selector for 0xB800 (color). If display is 1, return the
selector for 0xB000 (mono). This can be done with the Windows pre-defined
selectors: _B800H and _B000H. } begin ColorScreen := Display = 0; if
ColorScreen then
VideoGetTextSelector := Ofs(__B800) else VideoGetTextSelector := Ofs(__B000);
end;

{****************************************************************} procedure
VideoSetCursor(X, Y: Word); export; { Called when TDW needs to set the cursor
position on the text mode screen. Most VGA cards can use the code here (since
it's a
non SuperVga register that controls the cursor position). TDW will call this
function when it needs to make the cursor disappear (by placing it at an
off-screen position). } var P: word; begin if ColorScreen then P := $3D4 else
P :=
$3B4;


 X := X + Y * 80; Port[P] := $E; { cursor location high byte reg. }
PortW[P+1]:= Hi(X); Port[P] := $F; { cursor location low byte reg. }
PortW[P+1]:= Lo(X); end;

{****************************************************************} procedure
VideoSetSize(BigFlag: Word); export; assembler; { Called when TDW wants to
switch the resolution of the text mode screen. Bigflag will be 1 if high res,
0 if low res.
} asm lea bx, BigFlag mov ax, ss:[bx] or ax, ax jz @@1 { BigFlag = 1, so do 50
line text mode } mov byte ptr VGA50LineMode, 1 mov ax, $1112 { load 8x8 font }
xor bx, bx int $10
 jmp @@2 @@1: { else BigFlag = 0, so do 25 line mode } mov byte ptr
VGA50LineMode, 0 mov ax, $1111 { load 8x14 font } xor bx, bx int $10 mov ax,
$83 int $10 @@2: end;

{****************************************************************} procedure
VideoDebuggerScreen; export; { Called when TDW wants to switch to the text
mode screen. This function must save the appropriate memory locations, save
the VGA
palette, and switch to text mode. } begin if ColorScreen then { We're assuming
that a mono screen will be dual monitor } begin SavePalette; asm mov ax, $83 {
select text mode $03, don't clear the screen ($80) } int $10
end; VideoSetSize(word(VGA50LineMode)); end; end;

{****************************************************************} procedure
VideoWindowsScreen; export; { Called when TDW wants to switch back to the
Windows screen. This function must switch back to the original graphics mode,
restore the
palette, and restore the SuperVGA graphics memory planes that were blown away
by text mode. } begin if ColorScreen then begin asm mov ax, $4f02 mov bx, word
ptr GraphicsMode or bx, VesaDontErase { don't erase the graphics
or text screens } int 10h end; RestorePalette; end; end;

{****************************************************************} function
VideoBigSize: Word; export; assembler; { Called when TDW needs to determine if
there is a higher resolution text mode availible (usually 43 or 50 lines). The
maximum
number of lines that this you are able to support should be returned here. }
asm mov ax,50 end;

{****************************************************************} function
VideoIsColor: Word; export; assembler; { Returns 1 for color, and 0 for
monochrome; } asm xor ax, ax mov al, byte ptr ColorScreen end;



exports

 VideoDone, VideoInit, VideoGetTextSelector, VideoDebuggerScreen,
VideoWindowsScreen, VideoSetCursor, VideoSetSize, VideoBigSize, VideoIsColor;



begin end.

































September, 1992
PROGRAMMING PARADIGMS


Visitors from the East




Michael Swaine


Each of them was a Visitor from the East, as they used to say on the "Tonight
Show" before Leno. Saying that, I've said nothing, since "Visitor from the
East" describes just about anyone who drops in here at The Prose Lab where I
live and work. Between me and the Pacific Ocean ten miles to the west and half
a mile downhill there's nobody but a few mountainfolk who don't get out any
more than I do, so any visitor here who hasn't come from the east must have
come from the East.
These two were actually visitors from the east, as opposed to the East. The
difference between the east and the East is the difference between a relative
and an absolute frame of reference. I know, Albert Einstein demonstrated 90
years ago that all frames of reference are relative. But there are still
relatively absolute frames of reference, like East, and we need them.
The driveway up which my visitors from the east drove comes in from the west,
but that's a misleading local detail, of the kind that led early astronomers
into erroneous geocentric and epicyclic theories. It's awfully hard to
interpret the motions of the planets when you're on one. And since pedagogy
recapitulates discovery, students continue to be faced with that old problem:
It's hard to make sense of your senses unless you can observe from a
(relatively) absolute frame of reference.


The Simulation Paradigm


One of my visitors from the east had with him a product that helps one see
things from different frames of reference. Greg Baszucki of Knowledge
Revolution had brought Version 2 of Interactive Physics, hot off the
production line. He even brought some of his own productions.
I have to admit up front that Interactive Physics is not a programming tool.
It has a scripting languange, but so does any good spreadsheet program, and
the comparison is apt. It's that kind of language. What Interactive Physics
is, above all else, is a teaching tool. It's apparently a very good tool for
teaching physics: Version 1 (now being sold for $99 as Fun Physics) earned a
Mac-User Eddy award for best educational or exploratory product for the Mac,
and Version 2 is definitely better. Also, Prentice-Hall and other book
publishers are including a modified version of it with their physics texts.
But Version 2 is useful for other purposes. I can imagine it being used by
animators, engineers, physicists, and anyone who needs to visualize moderately
complex systems under natural or artificial constraints. It's a visualization
tool, and that's my justification for describing it here.
The basic idea is pretty obvious: You build two-dimensional objects from
simple forms as with any CAD or drawing program. You then associate physical
properties like mass, charge, elasticity, and friction with these objects; set
environmental properties like gravity, air resistance, and electrostatics; tie
in predefined simulation elements like ropes, pulleys, actuators, pin joints,
springs, dampers, motors, and forces.
At this point you've defined a physical system, and you can set it running and
watch what it does. A row of dominoes nudged at one end will fall naturally,
with all the rebounds and ricochets that complicate the so-called "domino
effect." A mass on the end of a bungee cord will do its damped dance.
The objects can be complex, built from simple shapes. When you build, say, a
car from elements, IP knows how to find the center of mass of the grouped
object. You can also quickly set all the attributes of an object by picking
from a list of generic object types, like plastic. You can actually build
working models of backhoes and rocketships using IP, as long as you don't want
to model every last detail and can live with its two dimensions. Baszucki
demonstrated a two-stage rocket he built, showing how it speeds up as the
second stage drops off. The rocket also speeds up as its weight decreases from
burning its fuel.
There's a lot of pedagogical value in such simulations, but IP also has other
ways of displaying its results. You can display any variables in tables or
figures, which can be dynamically drawn, filled, or adjusted. This is nice in
the case of the rocket, where it may not be apparent that you're looking at an
increase in acceleration, not velocity. And you can export the results of
experiments to other programs or publish them for other programs to subscribe
to dynamically, or export the animation as a QuickTime movie for later
playback.
The user interface is highly customizable. Any menu command can be turned into
a button, there are sliders and other controls to choose from, and the program
can be put into player mode to limit the student's actions.
The program runs on most existing Macs, so you might expect it to be
low-powered, but the author, David Baszucki, has built in some scaling
abilities that make it take advantage of whatever machine it's on. You can
zoom the screen in or out or adjust the time slice. It even gives you a choice
of integration methods.
And you can model the solar system, then decide from what point of view you
want to view it. If Galileo had had Interactive Physics, he would probably
have had better luck convincing the church that the Earth moved.


The User-programming Paradigm


Another visitor from the east brought a real development tool, albeit not a
real programming tool.
WindowCraft is a program for developing Windows applications. It would have to
be called a scripting, rather than a programming tool, but it does provide all
the expected Windows UI objects; has nice control over patterns, colors, drop
shadows, line styles, and icon styles; and lets one create real Windows
applications. These applications can manage multiple windows, call DLLs, and
function as DDE clients or servers. Except for one thing, you'd say that this
is a Visual Basic-like product.
The one thing is that the scripting language of WindowCraft is HyperTalk. Plus
some extensions. That's right; this is another tool for letting HyperCard
stack developers port their work to Windows in the hope of finally earning
back their investment.
WindowCraft includes a conversion utility (two, actually: one for the Mac and
one for the PC) for turning stacks into WindCraft format. Once in this format,
they can be tweaked to better fit the Windows environment, and compiled to
.EXE form. The conversion is not hitchless: Some resources will have to be
re-created and some Mac user-interface conventions won't make sense in
Windows. But in this it's no different from ConvertIt!, the stack-conversion
tool available from Heizer Software and now, with Version 1.5, supporting
HyperCard 2.0. ConvertIt! uses a published conversion format that should allow
people to develop converters for all sorts of HyperCard-like products, but
currently it only converts from HyperCard to Asymetrix ToolBook format.
The WindowCraft folks seem to look upon the HyperCard connection as a
jumpstart for their business, but emphasize the Windows features of the
product more. It is a quick way to get a Windows app written: You draw some
objects (it includes a drawing mode that lets you generate fully rotatable,
scalable objects), script the objects (with a script editor similar to
HyperCard's), add menus (there's a menu editor), test the result (in a fast
run mode), and build the EXE. WindowCraft requires no royalties or license
fees for your applications.
But it is a good HyperCard clone. In fact, Claris thought enough of
WindowCraft to consider acquiring the company, although the deal didn't go
through. Perhaps it was just a hedge: Claris surely wants its own inhouse,
deep expertise in Windows development, which it wouldn't get by buying a
Windows version of HyperCard.
In fact, support for independent third parties ought to be more strategic for
an extensible product like HyperCard than for most products. Third parties are
where HyperCard really gets loose. (That metaphor is problematic: At certain
parties, anyway, people tend to get loose in direct proportion to the extent
to which they get tight. I guess it's an irony of English idiom.) At one of
these HyperCard third parties, over at Ray Heizer's place, the standard
HyperCard stack gets so loose it dresses up like a real Mac application and
struts around in drop-and-drag.
WindowScript, written by Leonard Buck, basically lets you create and manage
windows in HyperCard. Sounds trivial, but Buck puts a lot of user-interface
control behind this model of development. You can, for example, put a
scrolling list of icons in a window. Many of the feature limitations of
HyperCard are addressed by WindowScript. WindowScript pretty much lets you
work with the entire Mac user interface from within HyperCard: pop-up windows,
whatever.
And it does this while maintaining HyperCard's ease of use and interactivity.
User-programmers can learn to use WindowScript if they can use HyperTalk. More
advanced developers will benefit from the fact that WindowScript is
interactive, letting them put together a quick program and test it
interactively. Also, depending on the application, licensing is either free or
quite reasonable. I can't think of another product that gives this much
control over the interface at this level of programming (scripting).
Claris is plugging some of the holes in HyperCard, as well. According to
MacWeek's Raines Cohen, Version 2.5, due out this year, will have integral
color support, significantly improved performance, object buttons, and
system-style and 3-D icons.


The Pencentric Paradigm


Just for the record: The word "paradigm" was overused before I got hold of it.
Philosopher of science Thomas Kuhn was taken to task back in the '60s for
using the word umpteen different ways in a single book. Its rhythmic
invocation is one of the stylistic traits that make film critics and pop
psychologists sound like they all went to the same prep school. In 1974,
self-described "process futurist" Joel Barker introduced management
consultants to the term, just as Popular Electronics was introducing
electronics hobbyists to the personal computation paradigm. For decades now
it's been a favorite weasel word of academics and marketers of all stripes.
So you see, there's tradition to uphold.
And John Sculley is doing his part for tradition.
Scully claims that the pen-based Apple Newton machine represents a new
technological paradigm. He's talking through his advertising hat, of course,
so we must give him a lot of latitude. Advertising people have this term "high
concept," meaning--well, I'm not sure what it means, if anything. But I
propose for them a term that sounds like it might be related: "high
definition," which I define as "meaning a lot of different things." For short,
"HD." Maybe Sculley was speaking in HD mode when he called Newton a new
paradigm.
Newton and its progeny definitely represent a foray into some new markets for
Apple computer. These are the kinds, of things you expect to buy in the places
where you expect to buy Sony. But there is certainly some question as to
whether it's a new technological paradigm.
The question is not just a definitional one, despite my crack about
advertising and high-definition mode. For example, the very fact that these
products will address a new market could lead to distinctly different styles
of use, which could feed back to changes in the programming model or hardware
architecture that could legitimately be called a new technological paradigm.
For example, greater-than-expected storage demands could fuel new storage
technology developments. So we could see a market-driven, new technological
paradigm. But that doesn't make Newton today a new technological paradigm.
Of course it has nice handprinting recognition and a no-keyboard interface,
and if you want to call these a new paradigm, fine. But they're not
exclusively Newton's: These technologies are pervasive, if not yet ready for
market everywhere. Newton does represent an interesting step away from file
systems, but it's not a pure, fileless storage paradigm as in Xanadu.
The most interesting notion about the Newton paradigm is that it is a new
model for desktop personal computers. This notion comes from Jean-Louis
Gassee, who argues that attempts to scale system-software technology down
never work (No OS Lite), but that scaling up can and does. In Newton, Apple
has a new operating system and user-interface model, unencumbered by
compatibility with past mistakes, and, Apple hopes, soon to become familiar to
godzillions of people through low pricing and aggressive, consumer-product
marketing. What if Apple then scales it up to the desktop? Newton could be, as
its early-development code-name Wedge suggests, a wedge to bring a new
operating system into the personal-computer market. Certainly it would be a
step in the direction of simplifying things for the user, which, by itself,
looks like a smart move.

So was John Sculley's use of the word "paradigm" correct? Maybe. And maybe we
can stipulate that "HD mode" also stands for "Humpty Dumpty mode," in honor of
Humpty Dumpty's assertion that a word means "what I want it to mean." And
stipulate that any use of the word "paradigm" is automatically in HD mode.


Paradigmatic Miscellany


Regarding some other paradigms: Greg Panos is trying to become the ultimate
source on virtual-reality work. In addition to his Virtual Reality Sourcebook,
he is looking into providing info by phone as well. The book is available for
$129 from Sophistech Research, 6936 Seaborn Street, Lakewood, CA 90713-2832;
310-421-7295 (voice) or 310-425-0890 (fax). And Craig LaGrow, onetime managing
editor of this magazine, has a series of books out on multimedia. The
Multimedia PhoneBook is a directory of over 2000 companies involved in
multimedia development; he also has a calender of events and a Multimedia
Dictionary. There's also an electronic edition of the PhoneBook. All three
books can be had for a total price of $94.95, and the electronic version is
$199.95 from Global Intermedia, 7 Fourth Street, Suite 51, Petaluma, CA 94952;
707-778-7488 (voice) or 707-778-7564 (fax).
I need to acknowledge Hal Hardenbergh for yet more data on the
high-definitioning of the word "paradigm," which he picked up from some
obscure source. Hal must be one of the ten people in America who subscribe to
more magazines than I do. But I suppose I'm exaggerating. Eight people.
And finally, I need to answer a letter and mention a couple of History of
Computing items. John Johnson of Seattle wrote to ask a question inspired by
my column on Ada Lovelace. Yes, as I understand it, a model of Babbage's
Difference Engine was eventually built, though not by him, and did work. His
Analytical Engine was never built. And as long as I'm recommending sources and
educational material, let me put in a word for the Deutsches Museum in Munich,
which I visited recently. Anyone planning to be in Munich ought to allot some
time for its exhibits on microelectronics, telecommunications, and digital
computers. There are many working machines (although not the Difference or
Analytical Engine) and clever demos, and some of the accompanying explanations
are in English. Even better were the--what would you call them--construction
toys. Ancestors of Lego blocks and Erector Sets. Neat stuff. Finally, for
those interested in the history of computation, the ACM/SIGPLAN second annual
History of Programming Languages conference (HOPL-II) will be held in
Cambridge, Massachusetts., April 20-23. Dennis Ritchie, Niklaus Wirth, Alain
Colmerauer, Alan Kaye, and Bill Whitaker are among the featured speakers. For
more information, contact Dan Halbert, DEC Cambridge Research Lab, One Kendall
Square, Bldg. 700, Cambridge, MA 02139; 617-621-6616 (voice) or 617-621-6650
(fax).





















































September, 1992
C PROGRAMMING


Help and the Installation Blues


 This article contains the following executables: DFLT14.ARC D14TXT.ARC


Al Stevens


This month I cover the D-Flat help system, look at the new Microsoft C/C++
compiler, and review some books. D-Flat is winding down; DF++ is heating up.
The Brevard County Food Bank still needs all the help they can get, so DF++
will continue the Careware tradition.


Help for D-Flat


The D-Flat help system consists of four parts: a text help database; a
compression/decompression algorithm; the hooks in the dialog boxes, menus, and
program code that make a particular text display the current message; and the
HELPBOX window class that displays the help text and allows the user to
navigate it. We'll discuss compression next month.
D-Flat supports context-sensitive help: Pressing the F1 key and choosing the
Help command button on a dialog box displays an appropriate help text. If the
user has a menu popped down, the help text describes the current menu
selection. If the user is working in a dialog box, the help text describes the
currently selected control. Usually, the selection of the text is automatic.
The application code does not need to do anything to support it. Each help
text in the database has a name that resembles a C identifier. A help text's
name is the same as a menu selection's or control's command code. So if you
are poised on the File menu's New command, the associated help text is named
ID_NEW. The names are surrounded by angle brackets in the text, so it is
really <ID_NEW>. Each item on the menu bar has a help text. Its name is the
same as the menu's identifier. The help text for the File menu is named
<File>. Each dialog box has a help text. Its name is the same as the DIALOGBOX
entry. The help text for the File Open dialog box is named <FileOpen>.
Each help text can reference the one that logically precedes it and the one
that logically follows it. These references follow the text's name and are
identified by [<<] and [>>] tokens. The tokens are followed by the name of the
help text that is prior or next.
The first line of text that follows the name and reference token lines is the
title of the help window.


HyperHelp


The D-Flat help system includes a limited form of hypertext. You can encode a
word or phrase in the text to display in a highlighted font. The user can
select that highlighted text, and the help system will switch to the help text
that you associated with the phrase. A hypertext reference is indicated by
this sequence: [..key phrase]<helpname>. The words "key phrase" represent the
highlighted text, and <helpname> is the name of the help text displayed when
the user chooses the key phrase.
Alternatively, you can supply a brief definition window for the word or
phrase. The definition window will display momentarily, only as long as the
user holds down the Enter key or mouse button after selecting the phrase. A
definition reference is indicated by this sequence: [**key phrase]<helpname>.
A definition help text does not have a title. Its first line is the first line
of its text and is displayed in the definition window's body. The hypertext
and definition references may be anywhere in the text. The help text's own
name and its forward and backward references must be at the beginning of the
help text. They must each start on a new line with the name coming first.
Any line in the database that begins with a semicolon is a comment and does
not display in the help window. Example 1 shows a sample help text with all
the components.
Example 1: Sample D-Flat Help text.

 <File>
 [<<]<Pulldowns>
 [>>]<Edit>
 The File Menu
The File menu contains commands that open, save, and print files. The menu
also has the command that exits the program. Following are the commands and
associated [**function keys]<shortcut>.

 [..New]<ID_NEW>
 [..Open]<ID_OPEN>
 [..Save]<ID_SAVE>
 [..Save as]<ID_SAVEAS>
 [..Print]<ID_PRINT>
 [..Exit]<ID_EXIT>



Building a Help Database


You prepare the help database for the application with a text editor. The
memopad help database, MEMOPAD.TXT, provides the format and a large example of
an almost complete help database. It also includes all the texts that describe
the generic D-Flat operations, such as what a button is and how a dialog box
works. You might want to use them in your application. The file is big, and it
would serve no purpose to publish it here. When you get the source code for
D-Flat, you get the entire help database. See the end of this discussion for
instructions about where to get the D-Flat source code.


Using Help


When the user presses F1 or chooses a Help command button, D-Flat opens a
HELPBOX window and displays the appropriate help text. The window has four
command buttons at the bottom: The Close button closes the window and exits
from the help system back to the application; the Back button displays the
window that was displayed immediately prior to the current one; and the Prev
and Next buttons display the previous and next help texts as defined in the
help database. If there is a highlighted definition phrase, the user can tab
to it and press Enter or click it with the mouse to see the definition.
Releasing the key or button erases the definition. If a hypertext phrase is
highlighted, the user can select it in the same way to display the associated
help window.



The HELPBOX


Listing One, page 152, is helpbox.c, the code that implements the HELPBOX
window class. The application program calls the LoadHelpFile function once to
build the table of help texts. The menu and dialog-box control windows call
the DisplayHelp function to tell it to display a help window and take over.
The application program can do the same thing. If the user presses F1 or
clicks a Help command button from somewhere, whatever help text is current
gets displayed. If no help text is current, the help system defaults to a text
that describes the class of window that currently has the focus. In a
well-designed help database, these texts are all available to the user through
the navigation process.
A help window is a dialog box with some text and the command buttons we
already discussed. The buttons permit the navigation. The table built by
LoadHelpFile includes the names of the help texts that precede and follow the
current one. The HelpStack structure forms a history of help windows that the
user has selected since pressing F1 or choosing a Help command. Each Next or
Prev command pushes an entry on the stack. Each Back command pops an entry and
makes that help text the current one.
When the user selects a help text, the program finds its entry in the table
and reads it in a line at a time by calling the GetHelpLine function. If the
line of text has a hypertext or definition reference, the program modifies the
text to display the keyword in a highlighted color.
After a page of help text is displayed in the help dialog box's edit-box
control, the user uses the command buttons to navigate or clicks on or tabs to
one of the highlighted reference words or phrases. Clicking a reference
selects it for the next display. Tabbing to the reference displays it in
reverse colors so that the user can tab past several to get to the desired
one. The Enter key selects a tabbed reference. If the reference is hypertext,
the program selects the associated help text to display. If the reference is a
definition, the program opens another, smaller window and displays the
definition in it, leaving the window on the screen until the user releases the
mouse button or the Enter key.
The program always tries to position the help window so that it does not
obscure whatever had the focus when the user chose Help. The BestFit function
performs that task, trying to position the help window completely outside of
the other window, or, if that is not possible, where it least obscures the
application.
Next month we'll discuss the help system's text-compression algorithms and the
File Open and Save dialog boxes.


How to Get D-Flat Now


The D-Flat source code is on CompuServe in Library 0 of the DDJ Forum and on
M&T Online. If you cannot use either online service, send a formatted 360K or
720K diskette and an addressed, stamped diskette mailer to me in care of
D-Flat, Dr. Dobb's Journal, 411 Borel Ave., San Mateo, CA 94402. I'll send you
the latest version of D-Flat. The software is free, but if you'd care to,
stuff a dollar bill in the mailer for the Brevard County Food Bank. They help
the homeless and hungry. We call it DDJ's program of "careware." If you want
to discuss D-Flat with me, use CompuServe. My ID is 71101,1262, and I monitor
the DDJ Forum daily.


Microsoft C/C++ 7


I had beta versions of the new Microsoft C++ compiler early on, but they would
never install. Our development system holds all the Goliath compiler products
on a Netware file server, which stores the newest of every new C product
without strangling my hard disk and allows Judy and I to work separately and
share a laser printer without strangling each other. The beta versions of
Microsoft C++ got confused about file-server drives and would not install.
Being in no hurry, I waited for the released version, which arrived not long
ago. I want to compile D-Flat with Microsoft C 7 and launch a first version of
D-Flat++ that works with Microsoft C++. So, to that end, I opened the package
and ran the Setup program. If Setup finds Windows installed, it runs itself
under Windows, opening the Notepad application so you can browse the README
file while it occasionally interrupts to ask for diskettes to install.
Now, here's a situation when Windows' shine fades to a dull haze. If you ever
want true preemptive multitasking, it is while scrolling a document--not
exactly a CPU-intensive task--at the same time the computer reads and writes
disk files in the background--also not a heavy drain on the old time slicer.
Nonetheless, when you are doing these two things with Windows, the cursor
spasmodically twitches around the screen as you try to scroll and page. I gaze
longingly at that unopened copy of IBM OS/2 2.0, but that's another story.
Back to the installation.
After about four disks worth of C/C++ 7 installation, the process stopped
abruptly with a dialog box telling me that Setup failed and that I should call
Microsoft Product Support immediately. No clue in that error message about
what is wrong. It's 8 o'clock in the morning here on the Florida Space Coast.
Nobody's going to the riding the tech support desk in Redmond, Washington at
this hour. I wish I had the phone number of the Gates mansion. Wake up, Bill,
there's a crotchety columnist on the line. If I have time later today, I'll
give it another try.


Later That Same Day...


Before trying the Microsoft C/C++ installation a second time, I looked at my
own setup. Every piece of software that is running in my computer comes
directly from Redmond: DOS 5.0, Windows 3.1, EMM386.EXE, SMARTDRV, DOSKEY,
SETVER, HIMEM, RAMDRIVE, and LMOUSE.COM. Even the network shell came with
Windows 3.1. If the stupid error message would have said something about what
caused the program to quit, I might have a clue about what to remove.
But, persevere and try again. I use several different operating
configurations, depending on what I am going to do. Borland C++ 3.0 and Brief
3.1, my principal development environment for DOS, do not get along with the
EMM386.EXE expanded memory manager that comes with Windows 3.1. They simply
blow up. To keep them running, I need to use the EMM386.EXE that comes with
DOS 5.0. When I want to use Desqview, there are different memory managers to
install. So, depending on whether I am going to run Windows or DOS, I use
different CONFIG.SYS and AUTOEXEC.BAT files. Well, the C/C++ 7 setup program
just hauls off and starts Windows without asking about such things, so it was
running Windows under the DOS expanded-memory manager. That's never been a
problem before. Why should it be? They both come from Microsoft and are
currently supported programs. But, just to be sure, I removed all traces of
the earlier failed installation, started up under my normal Windows
configuration with the correct memory manager for Windows, and the
installation went much better.
Up to a point, that is. When it came time for the setup program to make its
modifications to some "system" files, it told me that it could not do that. It
did not tell me why it could not, what system files it was trying to change,
or what it was trying to change them to. It just said that it couldn't. How
helpful. Then it finished the installation by building a nice Windows Program
Manager group with all these pretty new icons that are supposed to run the
newly installed software.
Guess what? Nothing works. Most of the programs tell me that some necessary
driver is not installed. The Programmer's Workbench causes Windows to report
that an application "has violated system integrity due to execution of an
invalid instruction..." and that I had better quit Windows and start over
again, or else. Don't you just love that? I now have 20 Mbytes of useless
software installed on my file server and no clue as to how to get it going.
Well, maybe not so useless. It's the Windows stuff that doesn't work. D-Flat
and D-Flat++ are DOS text-mode libraries. To use C/C++ 7 under DOS, you need a
DPMI manager. Windows provides that support, but DOS does not, so Microsoft
bundles 386MAX for that purpose, and I installed it. However, when compiling
D-Flat with C 7, the make utility runs the CL.EXE file, which gives an error
message saying that the DOS extender cannot find the CL.EXE file. The error
message includes the path where the program is looking for that file, and when
I look in that subdirectory, there it is. DOS found the file. Twice. Why can't
the extender? The so-called Comprehensive Index and Errors Reference is no
help at all.
I left several messages to the Microsoft sysop on CompuServe. Because I write
this column and they know me, I'll probably get a lot of attention and get
things running real soon. That's one of the perks of being an international
media superstar: Vendors jump through hoops to keep us happy. But how about
the rest of you -- those without all this media clout that I enjoy? I wonder
what you do. Next time I'll flame incognito to find out.


Clout? Not!


Forget media clout. It took two days to get an answer. No need to wonder how
they might treat the huddled masses. I figured that I was as likely to get
Microsoft C/C++ 7 installed and running, as a Haitian boat person was to be
invited to dinner at the White House. But, patience. The answer came, and I
have to map the network drive where I installed C/C++ 7 in a different way.
Use MAP ROOT instead of MAP INS. I don't really like that, because Netware
eats the first entry in the DOS path when I use MAP or MAP ROOT for a Search
drive. Oh, well, if that's what it takes.


Redmond Catchup


The suggestion worked. I now have Microsoft C/C++ 7 installed and compiling
D-Flat and DF++. Here are my initial observations. No modifications to the
D-Flat code or the makefile were necessary to get a compiled copy that works.
MSC 6.0 always did give me more warnings than BC, and the new version adds
more still. I might look into what is needed to kill the warnings, but they do
not hurt the running program. DF++ was not so easy to port because there was
no prior version of MS C++ to port from. I had been using BC++ exclusively,
and there are some things in the program that are compiler-specific, such as
interrupt functions for the timers. It took about a half hour to build a
portability layer, mainly because I had done it before for D-Flat. My test
DF++ program compiles and runs with MS C++ without any problems.


Benchmark Lite


For the first time in the many-years war between Borland and Microsoft, the
youngsters at Redmond seemed to have gotten the word. Programmers prefer fast
compilers. It took MSC 6.0 about 12 minutes to compile all of D-Flat using the
default compiler settings. MSC 7 does it in about five minutes. That seems to
be a significant improvement, and you might conclude that the programmers at
Microsoft had finally caught up. But when you look under the surface, you see
that what they did was change the defaults. MSC 6 optimizes by default, and
MSC 7 defaults to no optimizations. If you turn them both around, you'll see
the real picture. Table 1 shows how it looks with Borland and Watcom thrown in
for good measure.
Table 1: Compiler comparison.

 Compiler Optimized for size Compile/Link Code size
 ---------------------------------------------------------


 MSC 6.0 yes 11:58 167193
 no 6:52 195641
 MSC 7.0 yes 10:25 169653
 no 5:17 217957
 BC 3.0 yes 2:53 170082
 no 2:40 177362
 WATCOM C 8.0 yes 9:21 146078
 no 8:59 165726

Version 6 did a better job on code size in both configurations, and version 7
compiles about a minute-and-a-half faster for both, so there has been an
improvement. But in the overall picture, Borland is the runaway winner for
compile time, while Watcom takes the gold for code size.
Many factors can affect the speed performance of a C compiler. Objective
benchmarks are difficult to run because of these. Where and how the compiler
is installed will alter its performance. Table 1 shows that a compiler's own
options affect its performance. For example, the Borland compiler has a
feature that saves precompiled header files. If a subsequent compile uses the
same header files, BC will use the precompiled version, speeding up the
compile. I use that feature and store the precompiled headers on a RAM disk. A
compiler's Make program and the make-file can have features that optimize
performance. Borland's Make program lets you tell it to make the same number
of target files as the number of source filenames you can fit on the command
line. You can speed up the other compilers by using Borland's make utility.
You can improve the performance of all those compilers by using the Windows
3.1 smartdrv.exe cache program instead of the 386MAX qcache program. And you
can bias a benchmark by tweaking the environment of each compiler to show the
results you want to show. The benchmark itself can be biased to favor the
features of a particular product. As a rule I do not pay much attention to
compiler benchmarks, particularly if the vendor of one of the compilers
conducts the benchmark.


Book Reports


Effective C++ by Scott Meyers (Addison-Wesley, 1992) is one of several books
essential to every C++ programmer's library. It begins with the obligatory
discussions of going from C to C++ and then slides into some advanced
treatments of C++ program design and code. The book is organized into "items,"
where each item addresses an area of concern to the C++ programmer. You can
read the table of contents and go directly to an item without reading
everything up to that point. This practice assumes that you are conversant in
C++, of course. The items are detailed and complete, and they frequently refer
to one another for discussions of related issues. This book is very well
organized in this respect. The writing style occasionally gets a little chummy
for my taste, which is forgivable in an author's first book. However, the
style doesn't overwhelm the book. Effective C++ is readable and educational.
But beyond that, it is the only reference work I have seen where all these
specific issues are addressed and explained. The book is as comprehensive in
this respect as an advanced C++ book can be -- in light of the current state
of the C++ language. It stays on my desk within easy reach.
By the way, the three other essential C++ books are Bjarne Stroustrup's second
edition of The C++ Programming Language, The Annotated C++ Reference Manual by
Stroustrup and Ellis (also called the ARM), and Advanced C++ Programming
Styles and Idioms by James O. Coplien, which I reviewed last month. All three
are from Addison-Wesley.


_C PROGRAMMING COLUMN_
by Al Stevens




[LISTING ONE]

/* ------------ helpbox.c ----------- */
#include "dflat.h"
#include "htree.h"

extern DBOX HelpBox;

/* -- strings of D-Flat classes for calling default help text collections --
*/
char *ClassNames[] = {
 #undef ClassDef
 #define ClassDef(c,b,p,a) #c,
 #include "classes.h"
 NULL
};
#define MAXHEIGHT (SCREENHEIGHT-10)

/* --------- linked list of help text collections -------- */
struct helps {
 char *hname;
 char *NextName;
 char *PrevName;
 long hptr;
 int bit;
 int hheight;
 int hwidth;
 WINDOW hwnd;
 struct helps *NextHelp;
};
static struct helps *FirstHelp;
static struct helps *LastHelp;
static struct helps *ThisHelp;


/* --- linked stack of help windows that have been used --- */
struct HelpStack {
 char *hname;
 struct HelpStack *PrevStack;
};
static struct HelpStack *LastStack;
static struct HelpStack *ThisStack;

/* -- linked list of keywords in current help text collection -- */
struct keywords {
 char *hname;
 int lineno;
 int off1, off2, off3;
 int isDefinition;
 struct keywords *nextword;
 struct keywords *prevword;
};

static FILE *helpfp;
static char hline [160];
static BOOL Helping;

static void SelectHelp(WINDOW, char *);
static void ReadHelp(WINDOW);
static void FindHelp(char *);
static void FindHelpWindow(WINDOW);
static void DisplayDefinition(WINDOW, char *);
static void BestFit(WINDOW, DIALOGWINDOW *);

/* ------------- CREATE_WINDOW message ------------ */
static void CreateWindowMsg(WINDOW wnd)
{
 Helping = TRUE;
 GetClass(wnd) = HELPBOX;
 ClearAttribute(wnd, SHADOW);
 InitWindowColors(wnd);
 if (ThisHelp != NULL)
 ThisHelp->hwnd = wnd;
}

/* ------------- COMMAND message ------------ */
static BOOL CommandMsg(WINDOW wnd, PARAM p1)
{
 switch ((int)p1) {
 case ID_CANCEL:
 ThisStack = LastStack;
 while (ThisStack != NULL) {
 LastStack = ThisStack->PrevStack;
 if (ThisStack->hname != NULL)
 free(ThisStack->hname);
 free(ThisStack);
 ThisStack = LastStack;
 }
 break;
 case ID_PREV:
 FindHelpWindow(wnd);
 if (ThisHelp != NULL)
 SelectHelp(wnd, ThisHelp->PrevName);
 return TRUE;

 case ID_NEXT:
 FindHelpWindow(wnd);
 if (ThisHelp != NULL)
 SelectHelp(wnd, ThisHelp->NextName);
 return TRUE;
 case ID_BACK:
 if (LastStack != NULL) {
 if (LastStack->PrevStack != NULL) {
 ThisStack = LastStack->PrevStack;
 if (LastStack->hname != NULL)
 free(LastStack->hname);
 free(LastStack);
 LastStack = ThisStack;
 SelectHelp(wnd, ThisStack->hname);
 }
 }
 return TRUE;
 default:
 break;
 }
 return FALSE;
}

/* ------------- KEYBOARD message ------------ */
static BOOL KeyboardMsg(WINDOW wnd, PARAM p1)
{
 WINDOW cwnd;
 struct keywords *thisword;
 static char HelpName[50];

 cwnd = ControlWindow(wnd->extension, ID_HELPTEXT);
 if (cwnd == NULL inFocus != cwnd)
 return FALSE;
 thisword = cwnd->thisword;
 switch ((int)p1) {
 case '\r':
 if (thisword != NULL) {
 if (thisword->isDefinition)
 DisplayDefinition(GetParent(wnd), thisword->hname);
 else {
 strncpy(HelpName, thisword->hname,
 sizeof HelpName);
 SelectHelp(wnd, HelpName);
 }
 }
 return TRUE;
 case '\t':
 if (thisword == NULL)
 thisword = cwnd->firstword;
 else {
 if (thisword->nextword == NULL)
 thisword = cwnd->firstword;
 else
 thisword = thisword->nextword;
 }
 break;
 case SHIFT_HT:
 if (thisword == NULL)
 thisword = cwnd->lastword;

 else {
 if (thisword->prevword == NULL)
 thisword = cwnd->lastword;
 else
 thisword = thisword->prevword;
 }
 break;
 default:
 thisword = NULL;
 break;
 }
 if (thisword != NULL) {
 cwnd->thisword = thisword;
 if (thisword->lineno < cwnd->wtop 
 thisword->lineno >=
 cwnd->wtop + ClientHeight(cwnd)) {
 int distance = ClientHeight(cwnd)/2;
 do {
 cwnd->wtop = thisword->lineno-distance;
 distance /= 2;
 }
 while (cwnd->wtop < 0);
 }
 SendMessage(cwnd, PAINT, 0, 0);
 return TRUE;
 }
 return FALSE;
}

/* ---- window processing module for the HELPBOX ------- */
int HelpBoxProc(WINDOW wnd, MESSAGE msg, PARAM p1, PARAM p2)
{
 DBOX *db = wnd->extension;

 switch (msg) {
 case CREATE_WINDOW:
 CreateWindowMsg(wnd);
 break;
 case INITIATE_DIALOG:
 ReadHelp(wnd);
 break;
 case COMMAND:
 if (p2 != 0)
 break;
 if (CommandMsg(wnd, p1))
 return TRUE;
 break;
 case KEYBOARD:
 if (WindowMoving)
 break;
 if (KeyboardMsg(wnd, p1))
 return TRUE;
 break;
 case CLOSE_WINDOW:
 if (db != NULL) {
 if (db->dwnd.title != NULL) {
 free(db->dwnd.title);
 db->dwnd.title = NULL;
 }

 }
 FindHelpWindow(wnd);
 if (ThisHelp != NULL)
 ThisHelp->hwnd = NULL;
 Helping = FALSE;
 break;
 default:
 break;
 }
 return BaseWndProc(HELPBOX, wnd, msg, p1, p2);
}

/* ----- select a new help window from its name ----- */
static void SelectHelp(WINDOW wnd, char *hname)
{
 if (hname != NULL) {
 WINDOW pwnd = GetParent(wnd);
 PostMessage(wnd, ENDDIALOG, 0, 0);
 PostMessage(pwnd, DISPLAY_HELP, (PARAM) hname, 0);
 }
}

/* ---- PAINT message for the helpbox text editbox ---- */
static int PaintMsg(WINDOW wnd, PARAM p1, PARAM p2)
{
 struct keywords *thisword;
 int rtn;
 if (wnd->thisword != NULL) {
 WINDOW pwnd = GetParent(wnd);
 char *cp;
 thisword = wnd->thisword;
 cp = TextLine(wnd, thisword->lineno);
 cp += thisword->off1;
 *(cp+1) =
 (pwnd->WindowColors[SELECT_COLOR][FG] & 255) 0x80;
 *(cp+2) =
 (pwnd->WindowColors[SELECT_COLOR][BG] & 255) 0x80;
 rtn = DefaultWndProc(wnd, PAINT, p1, p2);
 *(cp+1) =
 (pwnd->WindowColors[HILITE_COLOR][FG] & 255) 0x80;
 *(cp+2) =
 (pwnd->WindowColors[HILITE_COLOR][BG] & 255) 0x80;
 return rtn;
 }
 return DefaultWndProc(wnd, PAINT, p1, p2);
}

/* ---- LEFT_BUTTON message for the helpbox text editbox ---- */
static int LeftButtonMsg(WINDOW wnd, PARAM p1, PARAM p2)
{
 struct keywords *thisword;
 int rtn, mx, my;

 rtn = DefaultWndProc(wnd, LEFT_BUTTON, p1, p2);
 mx = (int)p1 - GetClientLeft(wnd);
 my = (int)p2 - GetClientTop(wnd);
 my += wnd->wtop;
 thisword = wnd->firstword;
 while (thisword != NULL) {

 if (my == thisword->lineno) {
 if (mx >= thisword->off2 &&
 mx < thisword->off3) {
 wnd->thisword = thisword;
 SendMessage(wnd, PAINT, 0, 0);
 if (thisword->isDefinition) {
 WINDOW pwnd = GetParent(wnd);
 if (pwnd != NULL)
 DisplayDefinition(GetParent(pwnd),
 thisword->hname);
 }
 break;
 }
 }
 thisword = thisword->nextword;
 }
 return rtn;
}

/* --- window processing module for HELPBOX's text EDITBOX -- */
int HelpTextProc(WINDOW wnd, MESSAGE msg, PARAM p1, PARAM p2)
{
 struct keywords *thisword;
 int rtn, mx, my;
 switch (msg) {
 case PAINT:
 return PaintMsg(wnd, p1, p2);
 case LEFT_BUTTON:
 return LeftButtonMsg(wnd, p1, p2);
 case DOUBLE_CLICK:
 PostMessage(wnd, KEYBOARD, '\r', 0);
 break;
 case CLOSE_WINDOW:
 thisword = wnd->firstword;
 while (thisword != NULL) {
 struct keywords *nextword = thisword->nextword;
 if (thisword->hname != NULL)
 free(thisword->hname);
 free(thisword);
 thisword = nextword;
 }
 break;
 default:
 break;
 }
 return DefaultWndProc(wnd, msg, p1, p2);
}

/* -------- read the help text into the editbox ------- */
static void ReadHelp(WINDOW wnd)
{
 WINDOW cwnd = ControlWindow(wnd->extension, ID_HELPTEXT);
 int linectr = 0;
 if (cwnd == NULL)
 return;
 cwnd->wndproc = HelpTextProc;
 /* ----- read the help text ------- */
 while (TRUE) {
 unsigned char *cp = hline, *cp1;

 int colorct = 0;
 if (GetHelpLine(hline) == NULL)
 break;
 if (*hline == '<')
 break;
 hline[strlen(hline)-1] = '\0';
 /* --- add help text to the help window --- */
 while (cp != NULL) {
 if ((cp = strchr(cp, '[')) != NULL) {
 /* ----- hit a new key word ----- */
 struct keywords *thisword;
 if (*(cp+1) != '.' && *(cp+1) != '*') {
 cp++;
 continue;
 }
 thisword = DFcalloc(1, sizeof(struct keywords));
 if (cwnd->firstword == NULL)
 cwnd->firstword = thisword;
 if (cwnd->lastword != NULL) {
 ((struct keywords *)
 (cwnd->lastword))->nextword = thisword;
 thisword->prevword = cwnd->lastword;
 }
 cwnd->lastword = thisword;
 thisword->lineno = cwnd->wlines;
 thisword->off1 = (int) (cp - hline);
 thisword->off2 = thisword->off1 - colorct * 4;
 thisword->isDefinition = *(cp+1) == '*';
 colorct++;
 *cp++ = CHANGECOLOR;
 *cp++ =
 (wnd->WindowColors [HILITE_COLOR] [FG] & 255) 0x80;
 *cp++ =
 (wnd->WindowColors [HILITE_COLOR] [BG] & 255) 0x80;
 cp1 = cp;
 if ((cp = strchr(cp, ']')) != NULL) {
 if (thisword != NULL)
 thisword->off3 =
 thisword->off2 + (int) (cp - cp1);
 *cp++ = RESETCOLOR;
 }
 if ((cp = strchr(cp, '<')) != NULL) {
 char *cp1 = strchr(cp, '>');
 if (cp1 != NULL) {
 int len = (int) (cp1 - cp);
 thisword->hname = DFcalloc(1, len);
 strncpy(thisword->hname, cp+1, len-1);
 memmove(cp, cp1+1, strlen(cp1));
 }
 }
 }
 }
 PutItemText(wnd, ID_HELPTEXT, hline);
 /* -- display help text as soon as window is full -- */
 if (++linectr == ClientHeight(cwnd))
 SendMessage(cwnd, PAINT, 0, 0);
 if (linectr > ClientHeight(cwnd) &&
 !TestAttribute(cwnd, VSCROLLBAR)) {
 AddAttribute(cwnd, VSCROLLBAR);

 SendMessage(cwnd, BORDER, 0, 0);
 }
 }
}

/* ---- compute the displayed length of a help text line --- */
static int HelpLength(char *s)
{
 int len = strlen(s);
 char *cp = strchr(s, '[');
 while (cp != NULL) {
 len -= 4;
 cp = strchr(cp+1, '[');
 }
 cp = strchr(s, '<');
 while (cp != NULL) {
 char *cp1 = strchr(cp, '>');
 if (cp1 != NULL)
 len -= (int) (cp1-cp)+1;
 cp = strchr(cp1, '<');
 }
 return len;
}

/* ----------- load the help text file ------------ */
void LoadHelpFile()
{
 char *cp;

 if (Helping)
 return;
 UnLoadHelpFile();
 if ((helpfp = OpenHelpFile()) == NULL)
 return;
 *hline = '\0';
 while (*hline != '<') {
 if (GetHelpLine(hline) == NULL) {
 fclose(helpfp);
 return;
 }
 }
 while (*hline == '<') {
 if (strncmp(hline, "<end>", 5) == 0)
 break;

 /* -------- parse the help window's text name ----- */
 if ((cp = strchr(hline, '>')) != NULL) {
 ThisHelp = DFcalloc(1, sizeof(struct helps));
 if (FirstHelp == NULL)
 FirstHelp = ThisHelp;
 *cp = '\0';
 ThisHelp->hname=DFmalloc(strlen(hline+1)+1);
 strcpy(ThisHelp->hname, hline+1);

 HelpFilePosition(&ThisHelp->hptr, &ThisHelp->bit);

 if (GetHelpLine(hline) == NULL)
 break;


 /* ------- build the help linked list entry --- */
 while (*hline == '[') {
 HelpFilePosition(&ThisHelp->hptr, &ThisHelp->bit);
 /* ---- parse the <<prev button pointer ---- */
 if (strncmp(hline, "[<<]", 4) == 0) {
 char *cp = strchr(hline+4, '<');
 if (cp != NULL) {
 char *cp1 = strchr(cp, '>');
 if (cp1 != NULL) {
 int len = (int) (cp1-cp);
 ThisHelp->PrevName=DFcalloc(1,len);
 strncpy(ThisHelp->PrevName, cp+1,len-1);
 }
 }
 if (GetHelpLine(hline) == NULL)
 break;
 continue;
 }
 /* ---- parse the next>> button pointer ---- */
 else if (strncmp(hline, "[>>]", 4) == 0) {
 char *cp = strchr(hline+4, '<');
 if (cp != NULL) {
 char *cp1 = strchr(cp, '>');
 if (cp1 != NULL) {
 int len = (int) (cp1-cp);
 ThisHelp->NextName=DFcalloc(1,len);
 strncpy(ThisHelp->NextName, cp+1,len-1);
 }
 }
 if (GetHelpLine(hline) == NULL)
 break;
 continue;
 }
 else
 break;
 }
 ThisHelp->hheight = 0;
 ThisHelp->hwidth = 0;
 ThisHelp->NextHelp = NULL;

 /* ------ append entry to the linked list ------ */
 if (LastHelp != NULL)
 LastHelp->NextHelp = ThisHelp;
 LastHelp = ThisHelp;
 }
 /* -------- move to the next <helpname> token ------ */
 if (GetHelpLine(hline) == NULL)
 strcpy(hline, "<end>");
 while (*hline != '<') {
 ThisHelp->hwidth = max(ThisHelp->hwidth, HelpLength(hline));
 ThisHelp->hheight++;
 if (GetHelpLine(hline) == NULL)
 strcpy(hline, "<end>");
 }
 }
 fclose(helpfp);
}

/* ------ free the memory used by the help file table ------ */

void UnLoadHelpFile(void)
{
 while (FirstHelp != NULL) {
 ThisHelp = FirstHelp;
 if (ThisHelp->hname != NULL)
 free(ThisHelp->hname);
 if (ThisHelp->PrevName != NULL)
 free(ThisHelp->PrevName);
 if (ThisHelp->NextName != NULL)
 free(ThisHelp->NextName);
 FirstHelp = ThisHelp->NextHelp;
 free(ThisHelp);
 }
 ThisHelp = LastHelp = NULL;
 free(HelpTree);
}

/* ---------- display a specified help text ----------- */
BOOL DisplayHelp(WINDOW wnd, char *Help)
{
 if (Helping)
 return TRUE;
 FindHelp(Help);
 if (ThisHelp != NULL) {
 if (LastStack == NULL 
 stricmp(Help, LastStack->hname)) {
 /* ---- add the window to the history stack ---- */
 ThisStack = DFcalloc(1,sizeof(struct HelpStack));
 ThisStack->hname = DFmalloc(strlen(Help)+1);
 if (ThisStack->hname != NULL)
 strcpy(ThisStack->hname, Help);
 ThisStack->PrevStack = LastStack;
 LastStack = ThisStack;
 }
 if ((helpfp = OpenHelpFile()) != NULL) {
 DBOX *db;
 int offset, i;

 db = DFcalloc(1,sizeof HelpBox);
 memcpy(db, &HelpBox, sizeof HelpBox);
 /* -- seek to the first line of the help text -- */
 SeekHelpLine(ThisHelp->hptr, ThisHelp->bit);
 /* ----- read the title ----- */
 GetHelpLine(hline);
 hline[strlen(hline)-1] = '\0';
 db->dwnd.title = DFmalloc(strlen(hline)+1);
 strcpy(db->dwnd.title, hline);
 /* ----- set the height and width ----- */
 db->dwnd.h = min(ThisHelp->hheight, MAXHEIGHT)+7;
 db->dwnd.w = max(45, ThisHelp->hwidth+6);
 /* ------ position the help window ----- */
 BestFit(wnd, &db->dwnd);
 /* ------- position the command buttons ------ */
 db->ctl[0].dwnd.w = max(40, ThisHelp->hwidth+2);
 db->ctl[0].dwnd.h = min(ThisHelp->hheight, MAXHEIGHT)+2;
 offset = (db->dwnd.w-40) / 2;
 for (i = 1; i < 5; i++) {
 db->ctl[i].dwnd.y =
 min(ThisHelp->hheight, MAXHEIGHT)+3;

 db->ctl[i].dwnd.x += offset;
 }
 /* ---- disable ineffective buttons ---- */
 if (ThisStack != NULL)
 if (ThisStack->PrevStack == NULL)
 DisableButton(db, ID_BACK);
 if (ThisHelp->NextName == NULL)
 DisableButton(db, ID_NEXT);
 if (ThisHelp->PrevName == NULL)
 DisableButton(db, ID_PREV);
 /* ------- display the help window ----- */
 DialogBox(wnd, db, TRUE, HelpBoxProc);
 free(db);
 fclose(helpfp);
 return TRUE;
 }
 }
 return FALSE;
}

/* ------- display a definition window --------- */
static void DisplayDefinition(WINDOW wnd, char *def)
{
 WINDOW dwnd;
 WINDOW hwnd = wnd;
 int y;

 if (GetClass(wnd) == POPDOWNMENU)
 hwnd = GetParent(wnd);
 y = GetClass(hwnd) == MENUBAR ? 2 : 1;
 FindHelp(def);
 if (ThisHelp != NULL) {
 clearBIOSbuffer();
 if ((helpfp = OpenHelpFile()) != NULL) {
 clearBIOSbuffer();
 dwnd = CreateWindow(
 TEXTBOX,
 NULL,
 GetClientLeft(hwnd),
 GetClientTop(hwnd)+y,
 min(ThisHelp->hheight, MAXHEIGHT)+3,
 ThisHelp->hwidth+2,
 NULL,
 wnd,
 NULL,
 HASBORDER NOCLIP SAVESELF);
 if (dwnd != NULL) {
 clearBIOSbuffer();
 /* ----- read the help text ------- */
 SeekHelpLine(ThisHelp->hptr, ThisHelp->bit);
 while (TRUE) {
 clearBIOSbuffer();
 if (GetHelpLine(hline) == NULL)
 break;
 if (*hline == '<')
 break;
 hline[strlen(hline)-1] = '\0';
 SendMessage(dwnd,ADDTEXT,(PARAM)hline,0);
 }

 SendMessage(dwnd, SHOW_WINDOW, 0, 0);
 SendMessage(NULL, WAITKEYBOARD, 0, 0);
 SendMessage(NULL, WAITMOUSE, 0, 0);
 SendMessage(dwnd, CLOSE_WINDOW, 0, 0);
 }
 fclose(helpfp);
 }
 }
}

/* ------ compare help names with wild cards ----- */
static BOOL wildcmp(char *s1, char *s2)
{
 while (*s1 *s2) {
 if (tolower(*s1) != tolower(*s2))
 if (*s1 != '?' && *s2 != '?')
 return TRUE;
 s1++, s2++;
 }
 return FALSE;
}

/* --- ThisHelp = the help window matching specified name --- */
static void FindHelp(char *Help)
{
 ThisHelp = FirstHelp;
 while (ThisHelp != NULL) {
 if (wildcmp(Help, ThisHelp->hname) == FALSE)
 break;
 ThisHelp = ThisHelp->NextHelp;
 }
}

/* --- ThisHelp = the help window matching specified wnd --- */
static void FindHelpWindow(WINDOW wnd)
{
 ThisHelp = FirstHelp;
 while (ThisHelp != NULL) {
 if (wnd == ThisHelp->hwnd)
 break;
 ThisHelp = ThisHelp->NextHelp;
 }
}

static int OverLap(int a, int b)
{
 int ov = a - b;
 if (ov < 0)
 ov = 0;
 return ov;
}

/* ----- compute the best location for a help dialogbox ----- */
static void BestFit(WINDOW wnd, DIALOGWINDOW *dwnd)
{
 int above, below, right, left;
 if (GetClass(wnd) == MENUBAR 
 GetClass(wnd) == APPLICATION) {
 dwnd->x = dwnd->y = -1;

 return;
 }
 /* --- compute above overlap ---- */
 above = OverLap(dwnd->h, GetTop(wnd));
 /* --- compute below overlap ---- */
 below = OverLap(GetBottom(wnd), SCREENHEIGHT-dwnd->h);
 /* --- compute right overlap ---- */
 right = OverLap(GetRight(wnd), SCREENWIDTH-dwnd->w);
 /* --- compute left overlap ---- */
 left = OverLap(dwnd->w, GetLeft(wnd));

 if (above < below)
 dwnd->y = max(0, GetTop(wnd)-dwnd->h-2);
 else
 dwnd->y = min(SCREENHEIGHT-dwnd->h, GetBottom(wnd)+2);
 if (right < left)
 dwnd->x = min(GetRight(wnd)+2, SCREENWIDTH-dwnd->w);
 else
 dwnd->x = max(0, GetLeft(wnd)-dwnd->w-2);

 if (dwnd->x == GetRight(wnd)+2 
 dwnd->x == GetLeft(wnd)-dwnd->w-2)
 dwnd->y = -1;
 if (dwnd->y ==GetTop(wnd)-dwnd->h-2 
 dwnd->y == GetBottom(wnd)+2)
 dwnd->x = -1;
}



































September, 1992
STRUCTURED PROGRAMMING


Downstream




Jeff Duntemann, KG7JF


I'll not be telling any tales this issue, in courtesy to poor Jon Erickson,
who has to figure out how to fit nearly 600 lines of listing in back of the
mag this month. Brevity is a necessity this time.
The listing is the final, streamable version of HCALC. From the Mortgage menu
you can save the top mortgage window as a stream file, and load a named
mortgage file back from disk into a window. It works beautifully: In late
April I refinanced our house and used HCALC mercilessly to compare payments
and accelerated payoff schedules. It's great fun to add a few hundred dollars
to your next payment and see five or ten payments vanish off the back end of
the mortgage--and a powerful incentive to pay the damned thing off completely
and own the place free and clear.


Registering Things


I went over the process of registering types for stream I/O last month.
HCALC.PAS contains stream-registration records for all the non-Borland types
that need to be put out to streams. It also contains a procedure,
RegisterAllTypes, that does all the necessary program registering.


Using the Standard Dialogs


One thing HCALC does now that it didn't do before is make use of Borland's
standard file-open dialog. The standard dialogs are a sort of "free gift" that
came along with Turbo Pascal 6.0. They're not mentioned anywhere in the
documentation, and to figure them out you have to read the source code for the
STDDLG.PAS unit and the code for the example programs that use the standard
dialogs defined in STDDLG.PAS.
When you get the chance, I encourage you to print out and read STDDLG.PAS
closely. It's only 1400 lines or so, and it's a wonderful demonstration of
some of Turbo Vision's more arcane features. The TFileList class is a great
lesson in the use of broadcast messages. It communicates which file is
currently selected in a list of files by broadcasting the fact that a new file
has been selected--and then inviting any message receiver to inquire as to the
name of the file in question. Neat trick--one you might want to use yourself
someday.
There are other things in STDDLG.PAS that can be lifted out and used
independently of either TV or OOP: a Contains function that quickly tells
whether characters in one string are present in another, and an Equal function
that allows you to test the first n characters of two strings for
case-insensitive equality. Not a big deal codewise--but they're there; canned,
labeled, and ready to eat.
Using the standard file-open dialog is a snap. You don't have to allocate
space for the dialog before you use it. You allocate it, execute it, and
destroy it all in one statement, which in most cases will look a lot like the
statement in Example 1.
Example 1: Using the standard file-open dialog.

 VAR
 FileName : FNameStr;

 FileName := '*.MTG';
 IF ExecDialog (New(PFileDialog,
 Init('*.*', 'Load Mortgage File',
 '~N~ame', fdOpenButton, 100)),
 @FileName) <> cmCancel
 THEN { You have a filename, so do
 something with it: }
 BEGIN
 ...
 END

ExecDialog creates a dialog object on the heap with New, calls its constructor
to set it up, and then executes it. You pass to the dialog the address of a
variable (here, FileName) that will contain the filename that the dialog
selects. Once you close the dialog box, ExecDialog automatically removes it
from memory. Very slick.
The name of the file-open dialog is misleading; it indirectly returns the name
of a selected file but does not actually open the file. You have to do that.
In fact, what the file-open dialog returns is one of Turbo Vision's predefined
command constants. If the command is cmCancel, you know that the user pressed
the Cancel button on the dialog. Otherwise, you can assume that a filename was
in fact selected.


The Problems of an Object Web


Once you use the file-open dialog to select a filename, you've got the problem
of doing one of two things with it: reading the file stored on disk somewhere
under that name as a stream, or writing some object or objects under that name
to disk as a stream. And before we do this, we had better confront the reality
of what we're trying to stuff onto that stream.
Figure 1 is a conceptual diagram of a TMortgageView object. The relationships
among the parts are critical to getting the whole thing onto and off of a
stream intact. The TMortgageView object contains a TMortgage object as one of
its fields. The TMortgage object, in turn, contains a pointer to a
variable-length array of records that contains the mortgage amortization
table. (I covered the details of the TMortgage object last month.)
The TMortgageView object is associated with two other objects defined in
HCALC.PAS: TMortgageTopInterior and TMortgageBottomInterior. TMortgageView is
a group, and as a group it "contains" one instance of each of the two interior
objects. These interior objects are instantiated and added to the
TMortgageView group in TMortgageView's constructor Init, using the Insert
method.
So far, so good. There is a further complication: Each of the two interior
objects has its own pointer to the TMortgage object contained within
TMortgageView as a field.
This all works, and within Turbo Vision's somewhat Byzantine view of the
world, it all makes sense. The weblike nature of the TMortgageView object
makes it tough to conceptualize how this thing is put onto a stream. As usual,
explaining it is a lot harder than actually doing it. For ample evidence of
that, see the TMortgageView.Store method in Listing One (page 158). It's only
four lines long.



Storing to a Stream is Easy


The TMortgage object knows how to put itself out onto the stream, as we saw
last month. Ditto the TMortgageView object, which inherits streamability from
its parent class, TWindow.
So the first thing you do is allow the TWindow.Store method to store the
TWindow-ness of TMortgageView onto the stream. That done, you must store any
additional fields added by the TMortgageView definition.
Mostly, that's TMortgage. You tell the stream to put TMortgage onto itself,
and the stream will call TMortgage's Store method to do, in a sense, the
actual putting. I explained this complication two issues ago; it kept me
scratching my head for a good long time.
Both TMortgageTopInterior and TMortgageBottomInterior rely entirely on their
parent types to store them to a stream, since they add no new fields (but only
methods) to the definition of their parent types. Better still, the group that
owns the interior objects takes care of telling the interior objects to put
themselves out to the stream, using the hidden machinery contained in the
group object.
Two lines remain in TMortgageView.Store:
PutSubViewPtr(S,BottomInterior);
PutSubViewPtr(S,TopInterior);
These are a little tougher to explain.


Subviews and Subview Pointers


Note from Figure 1 that TMortgageView has its own pointers to the two interior
objects. And remember that these two pointers are pointers added within the
definition of TMortgageView, and have nothing to do with the completely
separate connections that exist between a group and the views that it owns. A
group can call certain methods in the views that it owns--subviews--but the
group can't call the object-specific methods that you add to the views you
define. Worse, this behind-the-scenes access works only when the group wants
it to work, not when you want it to work. If you want to call a certain method
in a view owned by a group, you need your own connection to that view.
Such a connection to a subview is called a subview pointer, and a subview
pointer is exactly what BottomInterior and TopInterior are. Each is a pointer
defined in the group object (here, TMortgageView) that points to one of the
views that the group owns.
These connections are part of the state of the TMortgageView object, and they
need to be saved out to the stream somehow. Keep in mind that we're not
storing the interior objects to the stream here. TMortgageView owns the
objects and will take care of putting them onto the stream behind the scenes.
You have to be sure that your own connections to the subviews don't get mixed
up in the process.
The method TGroup.PutSubViewPtr doesn't store the subview pointer itself, but
an entry that indicates which subview the pointer points to, referenced
against the group's own internal subview list. PutSubViewPtr says, in effect,
"Store the private ID number of the subview to which this pointer points."
This may seem unnecessary when storing a complex object to a stream, but it
gets real damned important later on when you want to bring the object back.


Bringing an Object Back from a Stream


Order is the main challenge here. One terrific way to go wrong is to bring
things back from the stream in an order different from the way they went out
to the stream.
If the first thing we stored was the TWindow elements of TMortgageView, that
should be the first thing to come back. (We're looking at TMortgageView.Load
here.) The TMortgage object must be the next thing to come off the stream.
Remember that the TMortgage.Load method called by the stream behind the scenes
is a constructor, and can fully instantiate the object without having to call
TMortgage.Init.
The TMortgage object comes in from the stream to a temporary copy allocated on
the heap. Streams always read from disk to the heap; they're built that way.
Once on the heap, the heap-based TMortgage object can be copied to its
rightful place inside the TMortgageView object with Move.
Now the dicey stuff starts. Although we have no immediate evidence of it, in
calling TWindow.Load, all the TMortgageView's subviews were already loaded
from the stream. So the subviews are out there already, connected to their
parent group through the parent's private connections. Our own connections to
the subviews do not yet exist.
These connections are the pointers TopInterior and BottomInterior, and we
derive these pointers from the stream using the TGroup.GetSubViewPtr method.
Earlier, we saved the group's private ID codes for its subviews onto the
stream. GetSubViewPtr converts these private ID codes to actual pointers to
the subviews represented by those ID codes. Otherwise, we'd have no
unambiguous way to derive a pointer to either of the subviews.
Don't make the two calls to GetSubViewPtr in the opposite order that you made
the calls to PutSubViewPtr when you wrote the object to the stream. I made
that mistake and it earned me some truly bizarre behavior--order counts,
big-time!
Finally, we give each of the subviews its own pointer to the TMortgage object.
You can see these pointers in the arrows in Figure 1. Each subview accesses
the TMortgage object for its own purposes, and both need pointers to the
TMortgage object. These pointers don't need to come from the stream; once we
have pointers to the subviews, we can just use the address-of operator to
generate pointers to the TMortgage instance within the TMortgageView instance.


Whew!


If all of that makes your head spin, don't feel bad. It took me a couple of
months of loose moments to figure it all out and get it right, and some people
consider me a pretty bright guy.
Some pointers on the actual HCALC.PAS code: For space and clarity reasons I
didn't put any error handling in THouseCalcApp's LoadMortgage, and
StoreMortgage methods. If either method can't connect with the specified file
for some reason, it just exits without giving the user another chance. If you
intend to build on HCALC, it would pay to add some means to alert the user to
a problem with the disk and give them a chance to correct the problem (say,
trying to read a nonexistent file) and try again.
Now that you understand streams (I hope), you might be able to take a sensible
step and store the two dialog boxes out as resources. Read up on resources in
the TV documentation; they're essentially random-access streams in which you
can stash any standard object type for fast retrieval. A fair amount of code
in HCALC is taken up building the two dialogs, and you can cut the size of the
program down considerably by converting the dialogs to resources. This is the
method used by the excellent Turbo Vision Developer's Toolkit from Blaise
Computing (Berkeley, California).


An Exercise for the Reader


One thing I haven't done with HCALC (because it would make the listing too
large to comfortably print in the magazine) is provide a feature that saves
the current state of the HCALC desktop out to a stream when you exit the
program. This is what people mean when they speak of "saving the program to a
stream." If you have a number of mortgage windows open and positioned at
various points in their amortization tables, you can save the state of every
mortgage window exactly as it exists when you exit HCALC. That way, the next
time you run HCALC, it can automatically load the desktop--and all your
mortgage windows--just as they were when you packed it all in and went to bed.
The way to do this isn't exactly to save the application to the stream. That
wouldn't be necessary, because it's the TDesktop type--not TApplication--that
owns all the windows you open on the desktop. If you save the desktop, you
save everything between the menu bar and status line, including all opened
windows and the background.
So to demonstrate your mastery of Turbo Vision, add all necessary code to
HCALC to save the desktop to a stream or restore it from a stream. Here are
some hints:
Add methods that save and restore the desktop to your application-specific
child class of TApplication. The save and restore methods should work with the
DeskTop predefined global variable, which is a pointer that points to the
current desktop object.
Define commands for saving and restoring the desktop, and add code to identify
and service these commands in the HandleEvent method of your application
object.
Add code that issues the save and restore commands where you want. One place
would be the Mortgage menu, but the obvious place is the code that processes
the cmQuit command.
TProgram.HandleEvent calls EndModal(cmQuit) to end the program. Create a shell
method in your application type that intercepts cmQuit through the
application's HandleEvent method, saves the desktop to the stream, and then
calls EndModal(cmQuit). The place to load the desktop, pretty obviously, is in
the application's constructor. Be sure to call DeskTop^.DrawView to make the
newly loaded desktop visible.


Turbo Vision Wrap


With all that accomplished, I think I'm going to set Turbo Vision aside for a
while. As powerful a platform as it is, there are plenty of times when I mess
with it for a few hours and feel more than a little chewed on. Still, it does
a hell of a lot of work for you, and provides two particular "soft" benefits:

It provides a level of conceptual compatibility with the Windows GUI on
machines incapable of running Windows. If people can use your Turbo Vision
application, they won't suffer a great deal learning an eventual Windows port.
It gives you the programmer a taste for Turbo Pascal for Windows programming.
The code itself isn't especially compatible between the two platforms for a
number of reasons, but if you can grasp the ideas behind Turbo Vision, you'll
slide right into TPW like a bearing into a greased sleeve. On the other hand,
if you have tried and abhor Turbo Vision (and judging by my recent mail, a
fair number of people do) TPW will send your agony meter off the scale.
The problem, as I've hinted before, is this: To get good at either Turbo
Vision or TPW, you must use them relentlessly, all the time, forever and ever
amen. There are just too many essential little details, rules, exceptions, and
nonsystematic hassles to keep it all in the back of your head if you get maybe
two hours to program every Sunday night.
Which doesn't mean I don't endorse them both. I do. But in addition to people
who program for a living, there are people who program to support their other
work, part-time and as required. This sort of programming is every bit as
worthy as the all-the-time kind, but it requires tools that do much more of
the work and (most important of all) manage much more of the underlying
complexity of the platform.
A lot of new and intriguing products have piled up while we've been wrestling
with Turbo Vision, and it's high time to haul them out and get a look at them.
I also think it's time to do a little more exploring of easier ways to program
for Microsoft Windows, and simpler ways to implement databases in Pascal. Stay
tuned.


_STRUCTURED PROGRAMMING COLUMN_
by Jeff Duntemann


[LISTING ONE]

PROGRAM HCalc; { By Jeff Duntemann; Update of 5/2/92 }
 { Requires Turbo Pascal 6.0! }
USES App,Dialogs,Objects,Views,Menus,StdDlg,Drivers,
 FInput, { By Allen Bauer; on CompuServe BPROGA }
 Mortgage; { By Jeff Duntemann; from DDJ 10/91 }
CONST
 cmNewMortgage = 199;
 cmExtraPrin = 198;
 cmCloseAll = 197;
 cmCloseBC = 196;
 cmPrintSummary = 195;
 cmLoadMortgage = 194;
 cmSaveMortgage = 193;

 WindowCount : Integer = 0;

TYPE
 MortgageDialogData =
 RECORD
 PrincipalData : Real;
 InterestData : Real;
 PeriodsData : Integer;
 END;
 ExtraPrincipalDialogData =
 RECORD
 PaymentNumber : Integer;
 ExtraDollars : Real;
 END;
 THouseCalcApp =
 OBJECT(TApplication)
 InitDialog : PDialog; { Dialog for initializing a mortgage }
 ExtraDialog : PDialog; { Dialog for entering extra principal }
 CONSTRUCTOR Init;
 PROCEDURE InitMenuBar; VIRTUAL;
 PROCEDURE CloseAll;
 PROCEDURE HandleEvent(VAR Event : TEvent); VIRTUAL;
 PROCEDURE NewMortgage;
 PROCEDURE LoadMortgage;
 PROCEDURE SaveMortgage;
 END;
 PMortgageTopInterior = ^TMortgageTopInterior;
 TMortgageTopInterior =
 OBJECT(TView)
 Mortgage : PMortgage;
 CONSTRUCTOR Init(VAR Bounds : TRect);
 CONSTRUCTOR Load(VAR S : TStream);

 PROCEDURE Store(VAR S : TStream);
 PROCEDURE Draw; VIRTUAL;
 END;
 PMortgageBottomInterior = ^TMortgageBottomInterior;
 TMortgageBottomInterior =
 OBJECT(TScroller)
 { Points to Mortgage object owned by TMortgageView }
 Mortgage : PMortgage;
 CONSTRUCTOR Init(VAR Bounds : TRect;
 AHScrollBar, AVScrollbar : PScrollBar);
 CONSTRUCTOR Load(VAR S : TStream);
 PROCEDURE Store(VAR S : TStream);
 PROCEDURE Draw; VIRTUAL;
 END;

 PMortgageView = ^TMortgageView;
 TMortgageView =
 OBJECT(TWindow)
 Mortgage : TMortgage;
 TopInterior : PMortgageTopInterior;
 BottomInterior : PMortgageBottomInterior;
 CONSTRUCTOR Init(VAR Bounds : TRect;
 ATitle : TTitleStr;
 ANumber : Integer;
 InitMortgageData :
 MortgageDialogData);
 CONSTRUCTOR Load(VAR S : TStream);
 PROCEDURE Store(VAR S : TStream);
 PROCEDURE HandleEvent(Var Event : TEvent); VIRTUAL;
 PROCEDURE ExtraPrincipal;
 PROCEDURE PrintSummary;
 DESTRUCTOR Done; VIRTUAL;
 END;
VAR
 HouseCalc : THouseCalcApp; { This is the application object itself }
 TempMtg : PMortgageView; { Temporary pointer for mortgage windows }
CONST
 DefaultMortgageData : MortgageDialogData =
 (PrincipalData : 100000;
 InterestData : 10.0;
 PeriodsData : 360);
 RMortgageView : TStreamRec =
 (ObjType : 1101;
 VMTLink : Ofs(TypeOf(TMortgageView)^);
 Load : @TMortgageView.Load;
 Store : @TMortgageView.Store);
 RMortgageTopInterior : TStreamRec =
 (ObjType : 1102;
 VMTLink : Ofs(TypeOf(TMortgageTopInterior)^);
 Load : @TMortgageTopInterior.Load;
 Store : @TMortgageTopInterior.Store);
 RMortgageBottomInterior: TStreamRec =
 (ObjType : 1103;
 VMTLink : Ofs(TypeOf(TMortgageBottomInterior)^);
 Load : @TMortgageBottomInterior.Load;
 Store : @TMortgageBottomInterior.Store);

PROCEDURE RegisterAllTypes;
BEGIN

 RegisterType(RScrollBar);
 RegisterType(RFrame);
 RegisterFInputLine;
 RegisterType(RMortgage); { RMortgage defined in unit MORTGAGE.PAS }
 RegisterType(RMortgageView);
 RegisterType(RMortgageTopInterior);
 RegisterType(RMortgageBottomInterior);
END;

FUNCTION ExecDialog(P: PDialog; Data: Pointer): Word;
VAR
 Result: Word;
BEGIN
 Result := cmCancel;
 P := PDialog(Application^.ValidView(P));
 IF P <> NIL THEN
 BEGIN
 IF Data <> NIL THEN P^.SetData(Data^);
 Result := DeskTop^.ExecView(P);
 IF (Result <> cmCancel) AND (Data <> NIL) THEN P^.GetData(Data^);
 Dispose(P, Done);
 END;
 ExecDialog := Result;
END;

{------------------------------}
{ METHODS: THouseCalcApp }
{------------------------------}
CONSTRUCTOR THouseCalcApp.Init;
VAR
 R : TRect;
 aView : PView;
BEGIN
 TApplication.Init; { Always call the parent's constructor first! }
 { Create the dialog for initializing a mortgage: }
 R.Assign(20,5,60,16);
 InitDialog := New(PDialog,Init(R,'Define Mortgage Parameters'));
 WITH InitDialog^ DO
 BEGIN
 { First item in the dialog box is input line for principal: }
 R.Assign(3,3,13,4);
 aView := New(PFInputLine,Init(R,8,DRealSet,DReal,0));
 Insert(aView);
 R.Assign(2,2,12,3);
 Insert(New(PLabel,Init(R,'Principal',aView)));

 { Next is the input line for interest rate: }
 R.Assign(17,3,26,4);
 aView := New(PFInputLine,Init(R,6,DRealSet,DReal,3));
 Insert(aView);
 R.Assign(16,2,25,3);
 Insert(New(PLabel,Init(R,'Interest',aView)));
 R.Assign(26,3,27,4); { Add a static text "%" sign }
 Insert(New(PStaticText,Init(R,'%')));

 { Up next is the input line for number of periods: }
 R.Assign(31,3,36,4);
 aView := New(PFInputLine,Init(R,3,DUnsignedSet,DInteger,0));
 Insert(aView);

 R.Assign(29,2,37,3);
 Insert(New(PLabel,Init(R,'Periods',aView)));

 { These are standard buttons for the OK and Cancel commands: }
 R.Assign(8,8,16,10);
 Insert(New(PButton,Init(R,'~O~K',cmOK,bfDefault)));
 R.Assign(22,8,32,10);
 Insert(New(PButton,Init(R,'Cancel',cmCancel,bfNormal)));
 END;

 { Create the dialog for adding additional principal to a payment: }
 R.Assign(20,5,60,16);
 ExtraDialog := New(PDialog,Init(R,'Apply Extra Principal to Mortgage'));
 WITH ExtraDialog^ DO
 BEGIN
 { First item in the dialog is the payment number to which }
 { we're going to apply the extra principal: }
 R.Assign(9,3,18,4);
 aView := New(PFInputLine,Init(R,6,DUnsignedSet,DInteger,0));
 Insert(aView);
 R.Assign(3,2,12,3);
 Insert(New(PLabel,Init(R,'Payment #',aView)));

 { Next item in the dialog box is input line for extra principal: }
 R.Assign(23,3,33,4);
 aView := New(PFInputLine,Init(R,8,DRealSet,DReal,2));
 Insert(aView);
 R.Assign(20,2,35,3);
 Insert(New(PLabel,Init(R,'Extra Principal',aView)));

 { These are standard buttons for the OK and Cancel commands: }
 R.Assign(8,8,16,10);
 Insert(New(PButton,Init(R,'~O~K',cmOK,bfDefault)));
 R.Assign(22,8,32,10);
 Insert(New(PButton,Init(R,'Cancel',cmCancel,bfNormal)));
 END;
END;

{ This method sends out a broadcast message to all views. Only the
{ mortgage windows know how to respond to it, so when cmCloseBC is
{ issued, only the mortgage windows react--by closing. }

PROCEDURE THouseCalcApp.CloseAll;
VAR
 Who : Pointer;
BEGIN
 Who := Message(Desktop,evBroadcast,cmCloseBC,@Self);
END;

PROCEDURE THouseCalcApp.LoadMortgage;
VAR
 FileName : FNameStr;
 FetchMtg : PBufStream;
BEGIN
 FileName := '*.MTG';
 IF ExecDialog(New(PFileDialog, Init('*.*', 'Load Mortgage File',
 '~N~ame', fdOpenButton, 100)), @FileName) <> cmCancel
 THEN
 BEGIN

 FetchMtg := New(PBufStream,Init(FileName,stOpenRead,1024));
 IF FetchMtg^.Status <> 0 THEN Halt(1);
 TempMtg := PMortgageView(FetchMtg^.Get);
 IF FetchMtg^.Status <> 0 THEN
 BEGIN
 Writeln('Status code =',FetchMtg^.Status);
 Halt(1);
 END;
 Dispose(FetchMtg,Done);
 DisableCommands([cmSaveMortgage]);
 IF TempMtg <> NIL THEN
 Desktop^.Insert(TempMtg);
 EnableCommands([cmSaveMortgage]);
 END;
END;

PROCEDURE THouseCalcApp.SaveMortgage;
VAR
 FileName : FNameStr;
 SaveMtg : PBufStream;
BEGIN
 FileName := '*.MTG';
 IF ExecDialog(New(PFileDialog, Init('*.*', 'Save Mortgage File',
 '~N~ame', fdOpenButton, 100)), @FileName) <> cmCancel
 THEN
 BEGIN
 SaveMtg := New(PBufStream,Init(FileName,stCreate,1024));
 IF SaveMtg^.Status <> 0 THEN Halt(1);
 IF DeskTop^.Current <> NIL THEN
 BEGIN
 SaveMtg^.Put(DeskTop^.Current);
 IF SaveMtg^.Status <> 0 THEN
 BEGIN
 Writeln('Status value =',SaveMtg^.Status);
 Halt(1);
 END;
 END;
 Dispose(SaveMtg,Done);
 END;
END;

PROCEDURE THouseCalcApp.HandleEvent(VAR Event : TEvent);
BEGIN
 TApplication.HandleEvent(Event);
 IF Event.What = evCommand THEN
 BEGIN
 CASE Event.Command OF
 cmNewMortgage : NewMortgage;
 cmLoadMortgage : LoadMortgage;
 cmSaveMortgage : SaveMortgage;
 cmCloseAll : CloseAll;
 ELSE
 Exit;
 END; { CASE }
 ClearEvent(Event);
 END;
END;

PROCEDURE THouseCalcApp.NewMortgage;

VAR
 Code : Integer;
 R : TRect;
 Control : Word;
 ThisMortgage : PMortgageView;
 InitMortgageData : MortgageDialogData;
BEGIN
 { First we need a dialog to get the intial mortgage values from }
 { the user. The dialog appears *before* the mortgage window! }
 WITH InitMortgageData DO
 BEGIN
 PrincipalData := 100000;
 InterestData := 10.0;
 PeriodsData := 360;
 END;
 InitDialog^.SetData(InitMortgageData);
 Control := Desktop^.ExecView(InitDialog);
 IF Control <> cmCancel THEN { Create a new mortgage object: }
 BEGIN
 R.Assign(5,5,45,20);
 Inc(WindowCount);
 { Get data from the initial mortgage dialog: }
 InitDialog^.GetData(InitMortgageData);
 { Call the constructor for the mortgage window: }
 ThisMortgage :=
 New(PMortgageView,Init(R,'Mortgage',WindowCount,InitMortgageData));
 { Insert the mortgage window into the desktop: }
 Desktop^.Insert(ThisMortgage);
 END;
END;

PROCEDURE THouseCalcApp.InitMenuBar;
VAR
 R : TRect;
BEGIN
 GetExtent(R);
 R.B.Y := R.A.Y + 1; { Define 1-line menu bar }

 MenuBar := New(PMenuBar,Init(R,NewMenu(
 NewSubMenu('~M~ortgage',hcNoContext,NewMenu(
 NewItem('~N~ew','F6',kbF6,cmNewMortgage,hcNoContext,
 NewItem('~O~pen','F3',kbF3,cmLoadMortgage,hcNoContext,
 NewItem('~S~ave Top','F2',kbF2,cmSaveMortgage,hcNoContext,
 NewItem('~E~xtra Principal ','',0,cmExtraPrin,hcNoContext,
 NewItem('~C~lose all','F7',kbF7,cmCloseAll,hcNoContext,
 NewItem('E~x~it','Alt-X',kbAltX,cmQuit,hcNoContext,
 NIL))))))),
 NIL)
 )));
END;

{---------------------------------}
{ METHODS: TMortgageTopInterior }
{---------------------------------}
CONSTRUCTOR TMortgageTopInterior.Init(VAR Bounds : TRect);
BEGIN
 TView.Init(Bounds); { Call ancestor's constructor }
 GrowMode := gfGrowHiX; { Permits pane to grow in X but not Y }
END;


CONSTRUCTOR TMortgageTopInterior.Load(VAR S : TStream);
BEGIN
 TView.Load(S);
END;

PROCEDURE TMortgageTopInterior.Store(Var S : TStream);
BEGIN
 TView.Store(S)
END;

PROCEDURE TMortgageTopInterior.Draw;
VAR
 YRun : Integer;
 Color : Byte;
 B : TDrawBuffer;
 STemp : String[20];
BEGIN
 Color := GetColor(1);
 MoveChar(B,' ',Color,Size.X); { Clear the buffer to spaces }
 MoveStr(B,' Principal Int. Periods Payment',Color);
 WriteLine(0,0,Size.X,1,B);

 MoveChar(B,' ',Color,Size.X); { Clear the buffer to spaces }
 { Here we convert payment data to strings for display: }
 Str(Mortgage^.Principal:7:2,STemp);
 MoveStr(B[1],STemp,Color); { At beginning of buffer B }
 Str(Mortgage^.Interest*100:7:2,STemp);
 MoveStr(B[10],STemp,Color); { At position 14 of buffer B }
 Str(Mortgage^.Periods:4,STemp);
 MoveStr(B[20],STemp,Color); { At position 25 of buffer B }
 WriteLine(0,1,Size.X,1,B);
 Str(Mortgage^.MonthlyPI:7:2,STemp);
 MoveStr(B[27],STemp,Color); { At position 29 of buffer B }
 WriteLine(0,1,Size.X,1,B);

 MoveChar(B,' ',Color,Size.X); { Clear the buffer to spaces }
 MoveStr(B,
 ' Extra Principal Interest',
 Color);
 WriteLine(0,2,Size.X,1,B);

 MoveChar(B,' ',Color,Size.X); { Clear the buffer to spaces }
 MoveStr(B,
 'Paymt # Prin. Int. Balance Principal So far So far ',
 Color);
 WriteLine(0,3,Size.X,1,B);
END;

{------------------------------------}
{ METHODS: TMortgageBottomInterior }
{------------------------------------}
CONSTRUCTOR TMortgageBottomInterior.Init(VAR Bounds : TRect;
 AHScrollBar, AVScrollBar : PScrollBar);

BEGIN
 { Call ancestor's constructor: }
 TScroller.Init(Bounds,AHScrollBar,AVScrollBar);
 GrowMode := gfGrowHiX + gfGrowHiY;

 Options := Options OR ofFramed;
END;

CONSTRUCTOR TMortgageBottomInterior.Load(VAR S : TStream);
BEGIN
 TScroller.Load(S);
END;

PROCEDURE TMortgageBottomInterior.Store(Var S : TStream);
BEGIN
 TScroller.Store(S)
END;

PROCEDURE TMortgageBottomInterior.Draw;
VAR
 Color : Byte;
 B : TDrawBuffer;
 YRun : Integer;
 STemp : String[20];
BEGIN
 Color := GetColor(1);
 FOR YRun := 0 TO Size.Y-1 DO
 BEGIN
 MoveChar(B,' ',Color,80); { Clear the buffer to spaces }
 Str(Delta.Y+YRun+1:4,STemp);
 MoveStr(B,STemp+':',Color); { At beginning of buffer B }
 { Here we convert payment data to strings for display: }
 Str(Mortgage^.Payments^[Delta.Y+YRun+1].PayPrincipal:7:2,STemp);
 MoveStr(B[6],STemp,Color); { At beginning of buffer B }
 Str(Mortgage^.Payments^[Delta.Y+YRun+1].PayInterest:7:2,STemp);
 MoveStr(B[15],STemp,Color); { At position 15 of buffer B }
 Str(Mortgage^.Payments^[Delta.Y+YRun+1].Balance:10:2,STemp);
 MoveStr(B[24],STemp,Color); { At position 24 of buffer B }
 { There isn't an extra principal value for every payment, so }
 { display the value only if it is nonzero: }
 STemp := '';
 IF Mortgage^.Payments^[Delta.Y+YRun+1].ExtraPrincipal > 0
 THEN
 Str(Mortgage^.Payments^[Delta.Y+YRun+1].ExtraPrincipal:10:2,STemp);
 MoveStr(B[37],STemp,Color); { At position 37 of buffer B }
 Str(Mortgage^.Payments^[Delta.Y+YRun+1].PrincipalSoFar:10:2,STemp);
 MoveStr(B[50],STemp,Color); { At position 50 of buffer B }
 Str(Mortgage^.Payments^[Delta.Y+YRun+1].InterestSoFar:10:2,STemp);
 MoveStr(B[64],STemp,Color); { At position 64 of buffer B }
 { Here we write the line to the window, taking into account the }
 { state of the X scroll bar: }
 WriteLine(0,YRun,Size.X,1,B[Delta.X]);
 END;
END;

{------------------------------}
{ METHODS: TMortgageView }
{------------------------------}
CONSTRUCTOR TMortgageView.Init(VAR Bounds : TRect;
 ATitle : TTitleStr;
 ANumber : Integer;
 InitMortgageData :
 MortgageDialogData);
VAR

 HScrollBar,VScrollBar : PScrollBar;
 R,S : TRect;
BEGIN
 TWindow.Init(Bounds,ATitle,ANumber); { Call ancestor's constructor }
 { Call the Mortgage object's constructor using dialog data: }
 WITH InitMortgageData DO
 Mortgage.Init(PrincipalData,
 InterestData / 100,
 PeriodsData,
 12);
 { Here we set up a window with *two* interiors, one scrollable, one }
 { static. It's all in the way that you define the bounds, mostly: }
 GetClipRect(Bounds); { Get bounds for interior of view }
 Bounds.Grow(-1,-1); { Shrink those bounds by 1 for both X & Y }

 { Define a rectangle to embrace the upper of the two interiors: }
 R.Assign(Bounds.A.X,Bounds.A.Y,Bounds.B.X,Bounds.A.Y+4);
 TopInterior := New(PMortgageTopInterior,Init(R));
 TopInterior^.Mortgage := @Mortgage;
 Insert(TopInterior);

 { Define a rectangle to embrace the lower of two interiors: }
 R.Assign(Bounds.A.X,Bounds.A.Y+5,Bounds.B.X,Bounds.B.Y);

 { Create scroll bars for both mouse & keyboard input: }
 VScrollBar := StandardScrollBar(sbVertical + sbHandleKeyboard);
 { We have to adjust vertical bar to fit bottom interior: }
 VScrollBar^.Origin.Y := R.A.Y; { Adjust top Y value }
 VScrollBar^.Size.Y := R.B.Y - R.A.Y; { Adjust size }
 { The horizontal scroll bar, on the other hand, is standard: }
 HScrollBar := StandardScrollBar(sbHorizontal + sbHandleKeyboard);

 { Create bottom interior object with scroll bars: }
 BottomInterior :=
 New(PMortgageBottomInterior,Init(R,HScrollBar,VScrollBar));
 { Make copy of pointer to mortgage object: }
 BottomInterior^.Mortgage := @Mortgage;
 { Set the limits for the scroll bars: }
 BottomInterior^.SetLimit(80,InitMortgageData.PeriodsData);
 { Insert the interior into the window: }
 Insert(BottomInterior);
END;

CONSTRUCTOR TMortgageView.Load(VAR S : TStream);
VAR
 MortgageTemp : PObject;
BEGIN
 TWindow.Load(S); { Load what you've inherited from parent type}

 MortgageTemp := S.Get; { Load the contained TMortgage object }
 { Now we have to copy the heap-based copy of TMortgage to the copy }
 { embedded in the TMortgageView object we're in the process of loading: }
 Move(MortgageTemp^,Mortgage,Sizeof(TMortgage));

 GetSubViewPtr(S,BottomInterior);
 GetSubViewPtr(S,TopInterior);
 TopInterior^.Mortgage := @Mortgage;
 BottomInterior^.Mortgage := @Mortgage;
END;


PROCEDURE TMortgageView.Store(VAR S : TStream);
BEGIN
 TWindow.Store(S); { Store what you've inherited from parent type}
 S.Put(@Mortgage); { Store the contained TMortgage object }
 PutSubViewPtr(S,BottomInterior);
 PutSubViewPtr(S,TopInterior);
END;

PROCEDURE TMortgageView.HandleEvent(Var Event : TEvent);
BEGIN
 TWindow.HandleEvent(Event);
 IF Event.What = evCommand THEN
 BEGIN
 CASE Event.Command OF
 cmExtraPrin : ExtraPrincipal;
 cmPrintSummary : PrintSummary;
 ELSE
 Exit;
 END; { CASE }
 ClearEvent(Event);
 END
 ELSE
 IF Event.What = evBroadcast THEN
 CASE Event.Command OF
 cmCloseBC : Done
 END; { CASE }
END;

PROCEDURE TMortgageView.ExtraPrincipal;
VAR
 Control : Word;
 ExtraPrincipalData : ExtraPrincipalDialogData;
BEGIN
 { Execute the "extra principal" dialog box: }
 Control := Desktop^.ExecView(HouseCalc.ExtraDialog);
 IF Control <> cmCancel THEN { Update the active mortgage window: }
 BEGIN
 { Get data from the extra principal dialog: }
 HouseCalc.ExtraDialog^.GetData(ExtraPrincipalData);
 Mortgage.Payments^[ExtraPrincipalData.PaymentNumber].ExtraPrincipal :=
 ExtraPrincipalData.ExtraDollars;
 Mortgage.Recalc; { Recalculate the amortization table... }
 Redraw; { ...and redraw the mortgage window }
 END;
END;

PROCEDURE TMortgageView.PrintSummary;
BEGIN
END;

DESTRUCTOR TMortgageView.Done;
BEGIN
 Mortgage.Done; { Dispose of the mortgage object's memory }
 TWindow.Done; { Call parent's destructor to dispose of window }
END;

BEGIN
 RegisterAllTypes;

 HouseCalc.Init;
 HouseCalc.Run;
 HouseCalc.Done;
END.


























































September, 1992
GRAPHICS PROGRAMMING


Pooh and the Space Station


 This article contains the following executables: XSHRP21.ZIP


Michael Abrash


So, here's where Winnie the Pooh lives: in a space station orbiting Saturn.
No, really; I have it straight from my daughter, and a six year old wouldn't
make up something that important, would she? One day she wondered aloud,
"Where is the Hundred Acre Wood, exactly?" and before I could give one of
those boring parental responses about how it was imaginary -- but A.A. Milne
probably imagined it to be somewhere near London -- my daughter announced that
the Hundred Acre Wood was in a space station orbiting Saturn, and there you
have it.
As it turns out, that's a very good location for the Hundred Acre Wood,
leading to many exciting adventures for Pooh and Piglet. Consider the time
they went down to the Jupiter gravity level (we're talking centrifugal force
here; the station is spinning, of course) and nearly turned into pancakes of
the Pooh and Piglet varieties, respectively. Or the time they drifted out into
the free-fall area at the core and had to be rescued by humans with wings
strapped on (a tip of the hat to Robert Heinlein here). Or the time they were
caught up by the current in the river through the Wood and drifted for weeks
around the circumference of the station, meeting many cultures and finding
many adventures along the way. (Yes, Riverworld; no one said the stories you
tell your children need to be purely original, just interesting.)
(If you think Pooh and Piglet in a space station is a tad peculiar, then I
won't even mention Karla, the woman who invented agriculture, medicine,
sanitation, reading and writing, peace, and just about everything else while
travelling the length of the Americas with her mountain lion during the last
Ice Age; or the Mars Cats and their trip in suspended animation to the Lesser
Magellenic Cloud and beyond; or most assuredly Little Whale, the baby Universe
Whale that is naughty enough to eat inhabited universes. But I digress.)
Anyway, I bring up Pooh and the space station because a great many people have
asked me to discuss fast texture mapping. Texture mapping is the process of
mapping an image (in our case, a bitmap) onto the surface of a polygon that's
been transformed in the process of 3-D drawing. Up to this point, each polygon
we've drawn in X-Sharp (the 3-D animation package we've been developing in
this column) has been a single, solid color. Recently, we added the ability to
shade polygons according to lighting, but each polygon was still a single
color. Thus, in order to produce any sort of intricate design, a great many
tiny polygons would have to be drawn. That would be very slow, so we need
another approach. One such approach is texture mapping; that is, mapping the
bitmap containing the desired image onto the pixels contained within the
transformed polygon. Done properly, this should make it possible to change
X-Sharp's output from a bland collection of monocolor facets to a lively,
detailed, and much more realistic scene.
"What sort of scene?" you may well ask. This is where Pooh and the space
station come in. When I sat down to think of a sample texture-mapping
application, it occurred to me that the shaded ball we added to X-Sharp
recently looked at least a bit like a spinning, spherical space station, and
that the single unshaded, yellow polygon looked somewhat like a window in the
space station, and it might be a nice example if someone were standing in the
window....
The rest is history.


Principles of Quick-and-Dirty Texture Mapping


The key to our texture-mapping approach will be quickly determining what pixel
value to draw for each pixel in the transformed destination polygon. These
polygon pixel values will be determined by mapping each destination pixel in
the transformed polygon back to the image bitmap, via a reverse
transformation, and seeing what color resides at the corresponding location in
the image bitmap, as shown in Figure 1. It might seem more intuitive to map
pixels the other way, from the image bitmap to the transformed polygon, but in
fact it's crucial that the mapping proceed backward from the destination, in
order to avoid gaps in the final image. With the approach of finding the right
value for each destination pixel in turn, via a backward mapping, there's no
way we can miss any destination pixels. On the other hand, with the
forward-mapping method, some destination pixels may be skipped or
double-drawn, because this is not necessarily a one-to-one or onto mapping.
Although we're not going to take advantage of it now, mapping back to the
source makes it possible to average several neighboring image pixels together
to calculate the value for each destination pixel; that is, to antialias the
image. This can greatly improve texture quality, although it is slower.


Mapping Textures Made Easy


To understand how we're going to map textures, consider Figure 2, which maps a
bitmapped image directly onto an untransformed polygon. Here, we simply map
the origin of the polygon's object coordinate system somewhere within the
image, then map the vertices to the corresponding image pixels. (For
simplicity, I'll assume in this discussion that the polygon's coordinate
system--its object coordinate system--is in units of pixels, but scaling
images to polygons is eminently doable. This will become clearer when we look
at mapping images into transformed polygons, below.) Mapping the image to the
polygon is then a simple matter of stepping one scan line at a time in both
the image and the polygon, each time advancing the X coordinates of the edges
according to the slopes of the lines, just as is normally done when filling a
polygon. Since the polygon is untransformed, the stepping is identical in both
the image and the polygon, and the pixel mapping is one-to-one, so the
appropriate part of each scan line of the image can simply be block copied to
the destination.
Now matters get more complicated. What if the destination polygon is rotated
in two dimensions? We no longer have a neat direct mapping from image scan
lines to destination polygon scan lines. We still want to draw across each
destination scan line, but the proper source pixels for each destination scan
line may now track across the source bitmap at an angle, as shown in Figure 3.
What to do?
The solution is remarkably simple. We'll just map each transformed vertex to
the corresponding vertex in the bitmap; this is easy, because the vertices are
at the same indices in the original and transformed vertex lists. Each time we
select a new edge to scan for the destination polygon, we'll select the
corresponding edge in the source bitmap, as well. Then--and this is
crucial--each time we step a destination edge one scan line, we'll step the
corresponding source image edge an equivalent amount.
Ah, but what is an "equivalent amount?" Think of it this way. If a destination
edge is 100 scan lines high, so it will be stepped 100 times, then we'll
divide the SourceXWidth and SourceYHeight lengths of the source edge by 100,
and add those amounts to the source edge's coordinates each time the
destination is stepped one scan line. Put another way, we have, as usual,
arranged things so that in the destination polygon we step DestYHeight times,
where DestYHeight is the height of the destination edge. The above approach
arranges to step the source image edge DestYHeight times too, to match what
the destination is doing, as shown in Figure 4.
Now we're able to track the coordinates of the polygon edges through the
source image in tandem with the destination edges. Stepping across each
destination scan line uses precisely the same technique. In the destination,
we step DestXWidth times across each scan line of the polygon, once for each
pixel on the scan line. (DestXWidth is the horizontal distance between the two
edges being scanned on any given scan line.) To match this, we divide
SourceXWidth and SourceYHeight (the lengths of the scan line in the source
image, as determined by the source edge points we've been tracking, as
described above) by the width of the destination scan line, DestXWidth, to
produce SourceXStep and SourceYStep. Then, we just step DestXWidth times,
adding SourceXStep and SourceYStep to SourceX and SourceY each time, and
choose the nearest image pixel to (SourceX,SourceY) to copy to (DestX,DestY).
(Note that the names used above, such as "SourceXWidth," are used for
descriptive purposes, and don't necessarily correspond to the actual variable
names used in Listing Two.)
That's a workable approach for 2-D rotated polygons--but what about 3-D
rotated polygons, where the visible dimensions of the polygon can vary with
3-D rotation and perspective projection? First, I'd like to make it clear that
texture mapping takes place from the source image to the destination polygon
after the destination polygon is projected to the screen. That is, the image
will be mapped after the destination polygon is in its final, drawable form.
Given that, it should be apparent that the above approach automatically
compensates for all changes in the dimensions of a polygon. You see, this
approach divides source edges and scan lines into however many steps the
destination polygon requires. If the destination polygon is much narrower than
the source polygon, as a result of 3-D rotation and perspective projection, we
just end up taking bigger steps through the source image and skipping a lot of
source image pixels, as shown in Figure 5. The upshot is that the above
approach handles all transformations and projections effortlessly. It could
also be used to scale source images up to fit in larger polygons; all that's
needed is a list of where the polygon's untransformed vertices map into the
source image, and everything else happens automatically. In fact, mapping from
any polygonal area of a bitmap to any destination polygon will work, given
only that the two polygons have the same number of vertices.


Notes on DDA Texture Mapping


That's all there is to quick-and-dirty texture mapping. This technique
basically uses a two-stage digital differential analyzer (DDA) approach to
step through the appropriate part of the source image in tandem with the
normal scan-line stepping through the destination polygon, so I'll call it
"DDA texture mapping." It's worth noting that there is no need for any
trigonometric functions at all, and only two divides are required per scan
line.
This isn't a perfect approach, of course. For one thing, it isn't anywhere
near as fast as drawing solid polygons; the speed is more comparable to
drawing each polygon as a series of lines. Also, the DDA approach results in
far from perfect image quality, since source pixels may be skipped or selected
twice. I trust, however, that you can see how easy it would be to improve
image quality by antialiasing with the DDA approach. For example, we could
simply average the four surrounding pixels as we did for simple, unweighted
antialiasing in this column last year. Or, we could take a Wu antialiasing
approach (see my June column) and average the two bracketing pixels along each
axis according to proximity. If we had cycles to waste (which, given that this
is real-time animation on a PC, we don't), we could improve image quality by
putting the source pixels through a low-pass filter sized in X and Y according
to the ratio of the source and destination dimensions (that is, how much the
destination is scaled up or down from the source).
Finally, I'd like to point out that this sort of DDA texture mapping is
display-hardware dependent, because the bitmap for each image must be
compatible with the number of bits per pixel in the destination. That's
actually a fairly serious issue. One of the nice things about X-Sharp's
polygon orientation is that, until now, the only display dependent part of
X-Sharp has been the transformation from RGB color space to the adapter's
color space. Compensation for aspect ratio, resolution, and the like all
happens automatically in the course of projection. Still, we need the ability
to display detailed surfaces, and it's hard to conceive of a fast way to do so
that's totally hardware independent. (If you know of one, drop me a line.)
For now, all we need is fast texture mapping of adequate quality, which the
straightforward, non-antialiased DDA approach supplies. I'm sure there are
many other fast approaches, and, as I've said, there are certainly
better-looking approaches, but DDA texture mapping works well, given the
constraints of the PC's horsepower. Next, we'll look at code that performs DDA
texture mapping. First, though, I'd like to take a moment to thank Jim Kent,
author of Autodesk Animator and a frequent contributor to this column, for
getting me started with the DDA approach.


Fast Texture Mapping: An Implementation


As you might expect, I've implemented DDA texture mapping in X-Sharp. Listing
One (page 164) shows the new header files and defines, and Listing Two (page
164) shows the actual texture-mapped polygon drawer. The set-pixel routine
that Listing Two calls is a slight modification of the mode X set-pixel
routine from my June 1991 column. In addition, INITBALL.C has been modified to
create three texture-mapped polygons and define the texture bitmaps, and
modifications have been made to allow the user to flip the axis of rotation.
In short, you will need the complete X-Sharp archive (available as described
below) to see texture mapping in action, but Listings One and Two are the
actual texture mapping code in its entirety.
There's a lot I'd like to say about DDA texture mapping, and about the use of
it in the DEMO1 program in X-Sharp, but I'll have to save most of it for next
month because I'm running out of space. Here's the big thing: DDA texture
mapping looks best on fast-moving surfaces, where the eye doesn't have time to
pick nits with the shearing and aliasing that's an inevitable by-product of
such a crude approach. Get the X-Sharp archive, compile DEMO1, and run it. The
initial display looks okay, but certainly not great, because the rotational
speed is so slow. Now press the S key a few times to speed up the rotation and
flip between different rotation axes. I think you'll be amazed at how much
better DDA texture mapping looks at high speed. This technique would be great
for mapping textures onto hurtling asteroids or jets, but would come up short
for slow, finely detailed movements.
No matter how you slice it, DDA texture mapping beats boring, single-color
polygons nine ways to Sunday. The big downside is that it's much slower than a
normal polygon fill; move the ball close to the screen in DEMO1, and watch
things slow down when one of those big texture maps comes around. Of course,
that's partly because the code is all in C; if I get a chance, I'll convert to
assembler and turbocharge the DDA texture mapping for next time, and maybe
speed up the general polygon-and rectangle-fill code, too. I'll try to get to
that next month, along with a further discussion of texture mapping and
attending to some rough spots that remain in the DDA texture mapping
implementation, most notably in the area of exactly which texture pixels map
to which destination pixels as a polygon rotates.
And, in case you're curious, yes, there is a bear in DEMO1. I wouldn't say he
looks much like a Pooh-type bear, but he's a bear nonetheless. He does tend to
look a little startled when you flip the ball around so that he's zipping by
on his head, but, heck, you would too in the same situation. And remember,
when you buy the next VGA megahit, Bears in Space, you saw it here first.


Where to Get X-Sharp



The full source for X-Sharp is available in the file XSHRP n.ZIP in the DDJ
Forum on CompuServe and XSHARP n.ZIP in the programming/graphics conference on
M&T Online and the graphic.disp conference on Bix. (XSHARP20 is the first
version that includes texture mapping.) Alternatively, you can send me a 360K
or 720K formatted diskette and an addressed, stamped diskette mailer, care of
DDJ, 411 Borel Ave., San Mateo, CA 94402, and I'll send you the latest copy of
X-Sharp. There's no charge, but it'd be very much appreciated if you'd slip in
a dollar or so to help out the folks at the Vermont Association for the Blind
and Visually Impaired.
I'm available on a daily basis to discuss X-Sharp on M&T Online and Bix (user
name mabrash in both cases).
By the way, folks, I very much appreciate both your generous contributions to
the Vermont Association for the Blind and your letters. However, I
can't--honest-to-god, can't--write back to all of you. Not if I ever want to
get any work done, anyway, and I have a family to feed. When I have time, I
try to answer those of you who have written with questions, but there are no
guarantees, especially given the flood of letters that have materialized in
the wake of X-Sharp. You'll stand a lot better chance of getting a reply on
MCI Mail, Bix, or M&T Online. Mind you, I do enjoy getting your letters, and
the feedback and suggestions are great; it's just the responding part that's a
problem. So keep those cards and letters (and e-mail messages) coming!


_GRAPHICS PROGRAMMING COLUMN_
by Michael Abrash



[LISTING ONE]

/* New header file entries related to texture-mapped polygons */
/* Draws the polygon described by the point list PointList with a bitmap
 texture mapped onto it */
#define DRAW_TEXTURED_POLYGON(PointList,NumPoints,TexVerts,TexMap) \
 Polygon.Length = NumPoints; Polygon.PointPtr = PointList; \
 DrawTexturedPolygon(&Polygon, TexVerts, TexMap);
#define FIXED_TO_INT(FixedVal) ((int) (FixedVal >> 16))
#define ROUND_FIXED_TO_INT(FixedVal) \
 ((int) ((FixedVal + DOUBLE_TO_FIXED(0.5)) >> 16))
/* Retrieves specified pixel from specified image bitmap of specified width.
*/
#define GET_IMAGE_PIXEL(TexMapBits, TexMapWidth, X, Y) \
 TexMapBits[(Y * TexMapWidth) + X]
/* Masks to mark shading types in Face structure */
#define NO_SHADING 0x0000
#define AMBIENT_SHADING 0x0001
#define DIFFUSE_SHADING 0x0002
#define TEXTURE_MAPPED_SHADING 0x0004
/* Describes a texture map */
typedef struct {
 int TexMapWidth; /* texture map width in bytes */
 char *TexMapBits; /* pointer to texture bitmap */
} TextureMap;

/* Structure describing one face of an object (one polygon) */
typedef struct {
 int * VertNums; /* pointer to list of indexes of this polygon's vertices
 in the object's vertex list. The first two indexes
 must select end and start points, respectively, of this
 polygon's unit normal vector. Second point should also
 be an active polygon vertex */
 int NumVerts; /* # of verts in face, not including the initial
 vertex, which must be the end of a unit normal vector
 that starts at the second index in VertNums */
 int ColorIndex; /* direct palette index; used only for non-shaded faces */
 ModelColor FullColor; /* polygon's color */
 int ShadingType; /* none, ambient, diffuse, texture mapped, etc. */
 TextureMap * TexMap; /* pointer to bitmap for texture mapping, if any */
 Point * TexVerts; /* pointer to list of this polygon's vertices, in
 TextureMap coordinates. Index n must map to index
 n + 1 in VertNums, (the + 1 is to skip over the unit
 normal endpoint in VertNums) */
} Face;
extern void DrawTexturedPolygon(PointListHeader *, Point *, TextureMap *);







[LISTING TWO]

/* Draws a bitmap, mapped to a convex polygon (draws a texture-mapped
polygon.)
 "Convex" means that every horizontal line drawn through the polygon at any
 point would cross exactly two active edges (neither horizontal lines nor
 zero-length edges count as active edges; both are acceptable anywhere in
 the polygon), and that the right & left edges never cross. Nonconvex
 polygons won't be drawn properly. Can't fail. */
#include <stdio.h>
#include <math.h>
#include "polygon.h"
/* Describes the current location and stepping, in both the source and
 the destination, of an edge */
typedef struct {
 int Direction; /* through edge list; 1 for a right edge (forward through
 vertex list), -1 for a left edge (backward
 through vertex list) */
 int RemainingScans; /* height left to scan out in dest */
 int CurrentEnd; /* vertex # of end of current edge */
 Fixedpoint SourceX; /* current X location in source for this edge */
 Fixedpoint SourceY; /* current Y location in source for this edge */
 Fixedpoint SourceStepX; /* X step in source for Y step in dest of 1 */
 Fixedpoint SourceStepY; /* Y step in source for Y step in dest of 1 */
 /* variables used for all-integer Bresenham's-type
 X stepping through the dest, needed for precise
 pixel placement to avoid gaps */
 int DestX; /* current X location in dest for this edge */
 int DestXIntStep; /* whole part of dest X step per scan-line Y step */
 int DestXDirection; /* -1 or 1 to indicate way X steps (left/right) */
 int DestXErrTerm; /* current error term for dest X stepping */
 int DestXAdjUp; /* amount to add to error term per scan line move */
 int DestXAdjDown; /* amount to subtract from error term when the
 error term turns over */
} EdgeScan;
int StepEdge(EdgeScan *);
int SetUpEdge(EdgeScan *, int);
void ScanOutLine(EdgeScan *, EdgeScan *);
int GetImagePixel(char *, int, int, int);
/* Statics to save time that would otherwise pass them to subroutines. */
static int MaxVert, NumVerts, DestY;
static Point * VertexPtr;
static Point * TexVertsPtr;
static char * TexMapBits;
static int TexMapWidth;
/* Draws a texture-mapped polygon, given a list of destination polygon
 vertices, a list of corresponding source texture polygon vertices, and a
 pointer to the source texture's descriptor. */
void DrawTexturedPolygon(PointListHeader * Polygon, Point * TexVerts,
 TextureMap * TexMap)
{
 int MinY, MaxY, MinVert, i;
 EdgeScan LeftEdge, RightEdge;
 NumVerts = Polygon->Length;
 VertexPtr = Polygon->PointPtr;
 TexVertsPtr = TexVerts;
 TexMapBits = TexMap->TexMapBits;

 TexMapWidth = TexMap->TexMapWidth;
 /* Nothing to draw if less than 3 vertices */
 if (NumVerts < 3) {
 return;
 }
 /* Scan through the destination polygon vertices and find the top of the
 left and right edges, taking advantage of our knowledge that vertices run
 in a clockwise direction (else this polygon wouldn't be visible due to
 backface removal) */
 MinY = 32767;
 MaxY = -32768;
 for (i=0; i<NumVerts; i++) {
 if (VertexPtr[i].Y < MinY) {
 MinY = VertexPtr[i].Y;
 MinVert = i;
 }
 if (VertexPtr[i].Y > MaxY) {
 MaxY = VertexPtr[i].Y;
 MaxVert = i;
 }
 }
 /* Reject flat (0-pixel-high) polygons */
 if (MinY >= MaxY) {
 return;
 }
 /* The destination Y coordinate is not edge specific; it applies to
 both edges, since we always step Y by 1 */
 DestY = MinY;
 /* Set up to scan the initial left and right edges of the source and
 destination polygons. We always step the destination polygon edges
 by one in Y, so calculate the corresponding destination X step for
 each edge, and then the corresponding source image X and Y steps */
 LeftEdge.Direction = -1; /* set up left edge first */
 SetUpEdge(&LeftEdge, MinVert);
 RightEdge.Direction = 1; /* set up right edge */
 SetUpEdge(&RightEdge, MinVert);
 /* Step down destination edges one scan line at a time. At each scan
 line, find the corresponding edge points in the source image. Scan
 between the edge points in the source, drawing the corresponding
 pixels across the current scan line in the destination polygon. (We
 know which way the left and right edges run through the vertex list
 because visible (non-backface-culled) polygons always have the vertices
 in clockwise order as seen from the viewpoint) */
 for (;;) {
 /* Done if off bottom of clip rectangle */
 if (DestY >= ClipMaxY) {
 return;
 }
 /* Draw only if inside Y bounds of clip rectangle */
 if (DestY >= ClipMinY) {
 /* Draw the scan line between the two current edges */
 ScanOutLine(&LeftEdge, &RightEdge);
 }
 /* Advance the source and destination polygon edges, ending if we've
 scanned all the way to the bottom of the polygon */
 if (!StepEdge(&LeftEdge)) {
 break;
 }
 if (!StepEdge(&RightEdge)) {

 break;
 }
 DestY++;
 }
}
/* Steps an edge one scan line in the destination, and the corresponding
 distance in the source. If an edge runs out, starts a new edge if there
 is one. Returns 1 for success, or 0 if there are no more edges to scan. */
int StepEdge(EdgeScan * Edge)
{
 /* Count off the scan line we stepped last time; if this edge is
 finished, try to start another one */
 if (--Edge->RemainingScans == 0) {
 /* Set up the next edge; done if there is no next edge */
 if (SetUpEdge(Edge, Edge->CurrentEnd) == 0) {
 return(0); /* no more edges; done drawing polygon */
 }
 return(1); /* all set to draw the new edge */
 }
 /* Step the current source edge */
 Edge->SourceX += Edge->SourceStepX;
 Edge->SourceY += Edge->SourceStepY;
 /* Step dest X with Bresenham-style variables, to get precise dest pixel
 placement and avoid gaps */
 Edge->DestX += Edge->DestXIntStep; /* whole pixel step */
 /* Do error term stuff for fractional pixel X step handling */
 if ((Edge->DestXErrTerm += Edge->DestXAdjUp) > 0) {
 Edge->DestX += Edge->DestXDirection;
 Edge->DestXErrTerm -= Edge->DestXAdjDown;
 }
 return(1);
}
/* Sets up an edge to be scanned; the edge starts at StartVert and proceeds
 in direction Edge->Direction through the vertex list. Edge->Direction must
 be set prior to call; -1 to scan a left edge (backward through the vertex
 list), 1 to scan a right edge (forward through the vertex list).
 Automatically skips over 0-height edges. Returns 1 for success, or 0 if
 there are no more edges to scan. */
int SetUpEdge(EdgeScan * Edge, int StartVert)
{
 int NextVert, DestXWidth;
 Fixedpoint DestYHeight;
 for (;;) {
 /* Done if this edge starts at the bottom vertex */
 if (StartVert == MaxVert) {
 return(0);
 }
 /* Advance to the next vertex, wrapping if we run off the start or end
 of the vertex list */
 NextVert = StartVert + Edge->Direction;
 if (NextVert >= NumVerts) {
 NextVert = 0;
 } else if (NextVert < 0) {
 NextVert = NumVerts - 1;
 }
 /* Calculate the variables for this edge and done if this is not a
 zero-height edge */
 if ((Edge->RemainingScans =
 VertexPtr[NextVert].Y - VertexPtr[StartVert].Y) != 0) {

 DestYHeight = INT_TO_FIXED(Edge->RemainingScans);
 Edge->CurrentEnd = NextVert;
 Edge->SourceX = INT_TO_FIXED(TexVertsPtr[StartVert].X);
 Edge->SourceY = INT_TO_FIXED(TexVertsPtr[StartVert].Y);
 Edge->SourceStepX = FixedDiv(INT_TO_FIXED(TexVertsPtr[NextVert].X) -
 Edge->SourceX, DestYHeight);
 Edge->SourceStepY = FixedDiv(INT_TO_FIXED(TexVertsPtr[NextVert].Y) -
 Edge->SourceY, DestYHeight);
 /* Set up Bresenham-style variables for dest X stepping */
 Edge->DestX = VertexPtr[StartVert].X;
 if ((DestXWidth =
 (VertexPtr[NextVert].X - VertexPtr[StartVert].X)) < 0) {
 /* Set up for drawing right to left */
 Edge->DestXDirection = -1;
 DestXWidth = -DestXWidth;
 Edge->DestXErrTerm = 1 - Edge->RemainingScans;
 Edge->DestXIntStep = -(DestXWidth / Edge->RemainingScans);
 } else {
 /* Set up for drawing left to right */
 Edge->DestXDirection = 1;
 Edge->DestXErrTerm = 0;
 Edge->DestXIntStep = DestXWidth / Edge->RemainingScans;
 }
 Edge->DestXAdjUp = DestXWidth % Edge->RemainingScans;
 Edge->DestXAdjDown = Edge->RemainingScans;
 return(1); /* success */
 }
 StartVert = NextVert; /* keep looking for a non-0-height edge */
 }
}
/* Texture-map-draw the scan line between two edges. */
void ScanOutLine(EdgeScan * LeftEdge, EdgeScan * RightEdge)
{
 Fixedpoint SourceX = LeftEdge->SourceX;
 Fixedpoint SourceY = LeftEdge->SourceY;
 int DestX = LeftEdge->DestX;
 int DestXMax = RightEdge->DestX;
 Fixedpoint DestWidth;
 Fixedpoint SourceXStep, SourceYStep;
 /* Nothing to do if fully X clipped */
 if ((DestXMax <= ClipMinX) (DestX >= ClipMaxX)) {
 return;
 }
 if ((DestXMax - DestX) <= 0) {
 return; /* nothing to draw */
 }
 /* Width of destination scan line, for scaling. Note: because this is an
 integer-based scaling, it can have a total error of as much as nearly
 one pixel. For more precise scaling, also maintain a fixed-point DestX
 in each edge, and use it for scaling. If this is done, it will also
 be necessary to nudge the source start coordinates to the right by an
 amount corresponding to the distance from the the real (fixed-point)
 DestX and the first pixel (at an integer X) to be drawn) */
 DestWidth = INT_TO_FIXED(DestXMax - DestX);
 /* Calculate source steps that correspond to each dest X step (across
 the scan line) */
 SourceXStep = FixedDiv(RightEdge->SourceX - SourceX, DestWidth);
 SourceYStep = FixedDiv(RightEdge->SourceY - SourceY, DestWidth);
 /* Clip right edge if necessary */

 if (DestXMax > ClipMaxX) {
 DestXMax = ClipMaxX;
 }
 /* Clip left edge if necssary */
 if (DestX < ClipMinX) {
 SourceX += SourceXStep * (ClipMinX - DestX);
 SourceY += SourceYStep * (ClipMinX - DestX);
 DestX = ClipMinX;
 }
 /* Scan across the destination scan line, updating the source image
 position accordingly */
 for (; DestX<DestXMax; DestX++) {
 /* Get currently mapped pixel out of image and draw it to screen */
 WritePixelX(DestX, DestY,
 GET_IMAGE_PIXEL(TexMapBits, TexMapWidth,
 FIXED_TO_INT(SourceX), FIXED_TO_INT(SourceY)) );
 /* Point to the next source pixel */
 SourceX += SourceXStep;
 SourceY += SourceYStep;
 }
}









































September, 1992
PROGRAMMER'S BOOKSHELF


Coming to Grips with the Information Age




William F. Jolitz


During the mid-1970s, computer enthusiasts were frequently as knowledgeable
about the telephone system and telecommunications as they were about the
details of their microprocessors. In fact, the details of telephone switching
systems, cross-bar mechanisms, signaling, and other matters were common topics
of discussion at Homebrew Computer Club meetings. (I remember once being
accosted during "random access" by a rather famous fellow waving schematics
related to an ESS switch!) Still, it's surprising that computer and
telecommunications technologies continue to be viewed as immiscible
disciplines, especially when you consider that they face many of the same
problems and that their histories are so intertwined.
During the '70s, the architects of the Internet felt quite safe keeping the
address space at a mere 32-bit size, never imagining this might be consumed
before the end of the century. But the unanticipated and tremendous growth in
computer networking in the '80s changed this. We now use large numbers of
interacting computer systems, so that computer networking forms the basis of a
modern information infrastructure. It's disquieting, though, that this
revolution in information exchange has occurred removed from the world's
largest connected network--the telephone system.
In Global Telecommunications: Layered Networks, Layered Services, Robert
Heldman seeks to outline his vision for the integration of these two worlds
from the telecommunications industry perspective. While the book fails in its
stated goal of showing you how to take advantage of communications technology
(a daunting task for any single book), it does provide what may be a more
valuable service: Global Telecommunications lays bare the frustrations of
bringing about the birth of this global information industry. In a rambling
style, the author leads us through the process by which the telecom industry
structures the problem of a unified voice, video, and data-communications
network. That the book is targeted at the telecommunications industry is
apparent in its use of telecom terminology and approaches, and its attempts to
span the gulf between the nuts-and-bolts physical realities and the top-down
market-driven business analysis of a telecommunications corporate participant.
This approach normally would be guaranteed to annoy both the dyed-in-the-wool
computer user and the telecommunications professional. Computer professionals
quickly become frustrated at being sold a "multimedia soon" future which
requires megabit (perhaps gigabit) service, while having to put up with
current piddling kilobit-per-second analog modem communications. On the other
hand, any astute telecommunications professional is already aware of the
difficulty of implementing even the most mundane of ideas presented in this
book. However, even though it oscillates between microscopic details
(bit-density problems with T1) and cosmic issues (Maslow's Hierarchy of
Needs), Global Telecommunications provides us with a unique view, a "gestalt"
if you will, of an industry at the crossroads. In fact, after a while I began
to feel that I was reading a psychoanalytic profile of the telecommunications
industry in crisis. It underscored how difficult it will be for the industry
to take full advantage of the new information age.
For example, the book's perspective on data-communications networking (page
123) is quite revealing: "We are expanding from the traditional analog
voice-based network, with voice-only services...we are beginning to use
digital capabilities to better operate and expand the network."
In essence, data is viewed as a special case of voice! This isn't really
surprising, since that's what the telecommunications industry made its fortune
on. However, this is exactly opposed to the computer industry's approach,
where digital information is everything. Every media--voice, data, video--is
digital.
Another example of the book's somewhat myopic view can be found in its
treatment of ISDN. Like fusion, ISDN has been "just around the corner" for at
least a decade. While its genesis occurred at about the same time the first
PCs appeared, it's been slow going for ISDN ever since. The problem is that
the telcos don't view ISDN as an extension of LANs, but as a replacement for
them (pages 147-148).
By delaying the public network offering in the 1970s and 1980s, the RBOCs
[regional Bell Operating Companies] have driven the MIS managers of large
firms to create new networks that are quite autonomous from the telcos....
This lack of success encouraged considerable growth of point-to-point LAN-type
architectures.... This has resulted in a separate "private networking"
approach to interconnect LANs by gateways, bridges and routers. However,
expansive growth has made this cumbersome, quite expensive and somewhat
limited to closed user groups. In the new world of multiple systems
interfacing together, there is a need for the more robust public networks to
carry this traffic to and from the internal private networks.
In fact, LANs have been fantastically successful, partially due to their
relatively low cost and availability, and partially to their appropriateness
in treating data like data, not a special case of voice. It's ludicrous to
believe that ISDN will recapture the bandwidth lost to private data networks
over the next decade without examining issues such as performance requirements
and existing software interfaces.
However, as pointed out in Global Telecommunications, digital communications
still amount to only 3 percent of the total current service present, so you
can expect voice to be in the driver's seat over the long term (the next 40
years?) as the industry develops new equipment plans. Quite simply, until the
telcos see an economic necessity, they will not take data seriously. (Given
the rapid rate of change in this area, they could end up missing the boat
completely.)
A point not addressed in Heldman's book is "why" the telephone system went
digital in the first place. Steve Hardwick shows in ISDN Design that it was
the economics of digital transmission, switching, and signaling that caused
the industry to migrate incrementally away from ancient analog mechanisms.
This has had the indirect result of tossing the entire telecommunications
industry into the information market, since now voice is just one of many
kinds of information to be transmitted. This was not anticipated, but rather
dictated by circumstance.
In a computer-oriented world, where entire computer systems become obsolete
after a few years, there's little patience for ISDN, even though it
fundamentally extends the reach of computer technology to any point in the
world and has the potential for turning every telephone into a computer
system. Compared to the computer industry's hare, the conservative,
button-down telecommunications industry begins to look like a tortoise by
always attempting to justify each step before making it.
The irony of this is that to be in the "information side" of the business, you
must be immersed in the application of information. Unfortunately, the
monolithic firms capable of orchestrating gigantic phone networks cannot seem
to grasp this simple fact.
Global Telecommunications is valuable for its isolated vignettes focusing on
the needs of the telecommunications customer as well as outlining (sometimes
futile) attempts to provide what the customer desired. It is worthwhile to
compare this approach with the breakneck pace of high-speed computer
networking (like you find within the Internet community). Of marked contrast
is the mind-boggling complexity of the low level of abstraction present in
telecommunications (mostly at the ISO physical and link layers) versus the
relatively trivial components in most computer-networking fields. (I used to
have an entire bookshelf dedicated just to a partial description of ISDN,
while a single book on TCP/IP was sufficient.) It is this holdover from the
past that forces the tortoise carrying the world on its back to pale beside
the lightweight hare that uses abstraction to minimize burdens. The tortoise
has a problem in scaling down; the hare has a similar problem in scaling up.
One measures time in decades, while the other is unwilling to wait for a
fraction of a year for fear of missing a technology window.
Global Telecommunications' unstructured "all over the map" approach is its
worst characteristic. You just can't read it in a typically linear fashion and
expect to keep a logical train of thought. Another irritation is the frequent
mention of market-driven customer-application approaches that are completely
naive regarding digital-communications applications. (There is no mention, for
example, of remote file systems, distributed computing, or interoperability.)
In this, the hallmarks of modern computer-networking progress of recent years
are completely ignored. Alas, this is a major omission in an industry that
prides itself on its technical competence and thoroughness.
Telecommunications is rapidly approaching a crossroads. Recent court and
regulatory decisions mean that other firms can access telephone subscribers
directly, as an alternative to local exchange carriers (LECs). Many powerful
interests are already jockeying for position to compete with them on an equal
basis to provide even more services than the existing LECs provide. Global
Telecommunications correctly puts forth a number of barriers for the industry
to hurdle, but it misses the most significant one: employing the existing
computer networking technology to accelerate the demand and acceptance of
modern communications technologies. In other words, the compelling need for
new communications technology is rooted in developing the existing
information-system paradigm, not in trying to invent something totally new or
recast things in a past image.
There may be some historical significance to Global Telecommunications in that
it captures a moment in time before the pent-up innovation of one industry
impacts another, very much like the old saw of an irresistible force meeting
an immovable object. I'm sure it will be an interesting sight, regardless of
the outcome.
































September, 1992
OF INTEREST





New from Aggregate Computing is NetMake, a distributed, parallel version of
the UNIX make utility. NetMake lets you compile individual source files on
separate machines on a network: it sends the files to the most appropriate
remote machine, compiles them in parallel, and collects the results on one
machine to perform linking. Thus, you can compile faster using NetMake than
you can by running make on a single machine. Version 1.1 includes Highland
Software's floating license manager, automated installation and configuration,
and greater parallelization for improved speed.
Prices start at $2500.00. Reader service no. 20.
Aggregate Computing Inc. 5217 Wayzata Blvd., Suite 125 Minneapolis, MN 55416
612-546-5579
Gimpel Software has released C-Vision, a set of tools for analyzing,
understanding, and maintaining C source code. Included in the package are a
cross referencer, a function-call diagrammer, a source code reformatter, and a
source code lister.
The cross referencer provides symbol descriptions that include symbol-usage
information--definition/declaration, assignment to, taking the address of, and
so on--and type information. The tree diagrammer uses a precompiled database
of your program modules and shows different views of your program's functions
based on a call graph; subtrees can be shown either directed away from
functions or into functions or data. The lister prints outlined listings of C
source, with attention to handling of preprocessor conditionals, and the
customizable reformatter deals with source code and comments.
C-Vision runs under DOS or OS/2 and costs $139.00. Reader service no. 21.
Gimpel Software 3207 Hogarth Lane Collegeville, PA 19426 215-584-4261
C-Debug, Softran's C-language debugging program, is now available for VAX/VMS
and OS/2 platforms. C-Debug offers comprehensive searches, and can even
specify if the program is corrupting its own memory. It catches errors before
the application dies, providing a detailed description of the errors' location
and nature. In most situations, C-Debug runs with almost no additional
programming.
Also available is C-Verify, a companion program that improves
quality-assurance by identifying program areas missed during testing.
C-Debug can also be used with DOS and most UNIX systems. It comes ready to use
with 12 popular C compilers and can be customized to work with others. C-Debug
prices range from $249.00 to $995.00; C-Verify costs $395.00. Reader service
no. 26.
Softran Corp. One Naperville Plaza Naperville, IL 60563 800-462-3932 or
708-505-3456
ProVoice is First Byte's new toolkit for adding synthesized speech to any DOS
application. ProVoice uses First Byte's Speech Engine to produce high-quality,
synthesized speech from strings of text, numbers, and data. Text is converted
to speech in two phases using a set of phonetic translation and pronunciation
rules. First, the software analyzes and translates text into sound
descriptors, a phonetic language that includes the pitch, duration, and
amplitude codes that produce stress patterns in phrases and sentences. The
second phase converts the intermediate phonetic language into speech signals;
algorithms transform distinct speech signals into continuous, clear speech.
Unlike digitized speech, synthesized speech processes text into words and
sentences internally, using a user-editable dictionary and English grammar
rules.
The toolkit provides source code bindings to the ProVoice Speech Engine for
Microsoft C, Turbo C++, Turbo Pascal, Turbo Assembler, and MASM; sample source
code is provided to illustrate speech programming techniques in all these
languages. For optimum speech output, First Byte highly recommends a sound
accessory, and ProVoice supports many, including AdLib, Sound Blaster, Covox
Speech Thing, and others. The suggested retail price is $595.00. Reader
service no. 23.
First Byte Software 19840 Pioneer Avenue Torrance, CA 90503 800-523-2983
New from Valois Software is Vsearch, a BRIEF enhancement that provides
disk-wide, multifile, multipattern search and replace capabilities. Vsearch is
fully integrated with BRIEF and supports drive/directory/file wildcards and
writes output reports. It can also run in batch mode and undo all file
changes. Vsearch is targeted at developers who make multiple changes to many
files, increasing their productivity by eliminating repetitive editing and
improving reliability.
DDJ spoke with David Bechtel, associate engineer at Mountain Network Solutions
in Scotts Valley, California, who was enthusiastic about Vsearch. "The feature
I use the most is Vsearch's ability to make global changes to several modules
and conditions within those modules, and undo those changes if necessary. It
also gives you a good summary report of all modules and strings modified."
Bechtel liked Vsearch's ability to generate file directory listings within
brief, which can then be used as pattern files for future Vsearch operations.
He concluded that, "it's a fantastic product, but I would like to see more
detailed examples of its advanced features."
Vsearch's list price is $89.00. Reader service no. 24.
Valois Software Inc. 2237 Cecilia Avenue San Francisco, CA 94116-1832
415-759-0911
MINC is offering programmable logic users a free toolkit for comparing PLD,
CPLD, and FPGA synthesis tools. The utility generates random Boolean equations
that can be produced in the format required as input for most popular
universal programmable logic tools. The user controls generation of the random
equations through a number of parameters entered in the utility. Parameters
include: the number of outputs, buried outputs, and inputs; the maximum number
of product terms and inputs in any single equation; the percent of equations
registered; and the number of clocks available for register assignments. Given
a set of input parameters, identical designs are produced across all output
formats, allowing you to make a fair and accurate comparison between design
tools.
MINC encourages users to conduct their own benchmark evaluations and send in
the results. MINC will periodically distribute the aggregate results of these
studies. To receive the toolkit, call MINC and request the "BENCHKIT." Reader
service no. 25.
MINC Inc. 6755 Earl Drive Colorado Springs, CO 80918 800-755-3742
The Snooper, a software-based protocol analyzer for Ethernet networks, is now
available from General Software. The Snooper, which runs on any PC, captures
live traffic on Ethernet networks and saves the captured packets for analysis
through summary, hexadecimal, or decoded protocol displays.
The Snooper is actually a source-code toolkit that allows programmers to build
custom protocol analyzers for additional protocols used by application
software. It comes with free drivers (including source) for popular Ethernet
host adapters such as Novell-compatible NE1000 and NE2000 boards and boards
from 3Com, IBM, and others.
Protocol support is provided for Novell Netware file-sharing protocols,
Microsoft LAN Manager protocols, and IBM LAN Server protocols.
The Snooper retails for $350.00. Reader service no. 22.
General Software Inc. P.O. Box 2571 Redmond, WA 98073 206-391-4285
Frontier Software has released FS:pascal 2.0, a 32-bit protected-mode Pascal
compiler that generates native 32-bit code. FS:pascal eliminates runtime
memory limitations and lets you use available extended memory. Absent also is
the 64K segment-size limit set by real-mode programs; thus, variables and
arrays can be as large as available RAM.
Over 250 new library routines have been added to the package, including a
mouse library that works in both text and graphics modes and a screen design
library with scroll-bar menus and save- and restore-screen routines. By taking
advantage of 32-bit registers, FS:pascal can create executables that run
faster than those created by real-mode compilers.
FS:pascal is fully compatible with Turbo Pascal, including the BGI graphics
interface. The built-in DOS extender is automatically linked in with every
program. There are no royalty fees or license agreements, and the retail price
is $149.95. Reader service no. 27.
Frontier Software 66-22 Fleet Street, Suite 2C Forest Hills, NY 11375
800-934-FSDC
The Professional Toolkit for Visual Basic from Microsoft provides a
single-source solution for a variety of Windows-based applications. The
toolkit has many new features, first of which is object linking and embedding
(OLE), which lets you create applications that combine spreadsheets, word
processing, graphics, and other OLE server functionality into customized
applications. The tool-kit also supports pen-based computing: It includes
controls to create text boxes that can hold pen input as an ink data type or
provide automatic access to the pen recognition engine. Multimedia developers
can write applications for Windows Multimedia Extensions, including animation
and audio.
Also included are graphing capabilities with ten charting styles; grids for
presenting tabular data with resizable rows, columns, and scroll bars;
multiple document interfaces (MDI) for creating applications with multiple
child windows contained in a single parent form; 15 new controls, including
3-D interface components, animated buttons, gauges, spin buttons, and access
to commonly used dialog boxes; the Windows help compiler for creating custom
online Windows help files; the Windows API online reference; a set-up kit; and
a custom-control development kit.
The Professional Toolkit for Visual Basic costs $299.00; with Visual Basic,
$495.00. Reader service no. 28.
Microsoft Corp. One Microsoft Way Redmond, WA 98052-6399 206-882-8080




















September, 1992
SWAINE'S FLAMES


Swahili Menace Unmasked




Michael Swaine


Under the heading "What you get out of this column depends on what you put
into it, or possibly on what is put into it by some other reader with more wit
than sense and more time than you," I present the following splendid example
of reader involvement above and beyond.
The reader in question is Mike Morton, occasional Dr. Dobb's contributor from
the state of HI, and the subject is anagrams, as in "Swahili Menace" or "whale
is anemic" for "Michael Swaine," "Legal bits" for "Bill Gates," "prig to peer"
for "Tipper Gore," "Scorn in joke rubs old drab Jon" for "Jon Erickson, Dr.
Dobb's Journal," and "Mr. Machine Tool" for "Michael Morton."
Dear Swahili Menace:
Your Brain Games are great; I especially enjoy the anagrams, although I'm too
lazy to do them myself. A little program I wrote found a few answers for me.
"Real friend or nuts" is clearly "Flatus error in den," referring to the
hazards of excessive Jolt consumption while working in a home office--all too
common in the software biz. But this answer had some lively competition from
the following:
Resentful in ardor
Fail not, surrender (dishonor before slight pain?)
Terrain flounders * No nudist referral
Fortran runes lied (see other religious comments in anagram below)
Render Lotus, infra
User, florid tanner
Error, fail--stunned
Error stifled a nun (kick that monochrome habit?)
Dental error is fun (running light without...)
Userland Frontier (Lord knows what this one means...)
Meanwhile, "Magic land of lost tribes" appears to be a reference to those who
worship and continue to attempt the revival of lost programming languages:
"BASIC: Lost idol fragment." But some other possibilities are:
Distracting bosom, fella
Forecasting dismal blot
Octal fossil abridgment (for those who don't find BASIC sufficiently
atavistic)
Food: belittling sarcasm
Comforting sadist label (Intel inside--suffer!)
Gold-lobster fanaticism
Bigots craft medallions
Sanctified global storm (religious slant on a new world order?)
Oft remolding lost BASIC (see also the idol-worship reference above)
Relating to bold fascism
Bestial Microsoft gland (secretes hormones that cause animal-like marketing)
Microsoft: legal bandits (hmm)
Microsoft dating Bealls (all of them?)
Microsoft and Bill Gates (I'm sure this wasn't what you meant)
Yours for a new word order, Mr. Machine Tool



















October, 1992
October, 1992
EDITORIAL


Back on the Copyright Beat




Jonathan Erickson


I don't know about you, but I'm sleeping better knowing the Lotus
look-and-feel copyright infringement suit against Borland has been decided. If
you recall, this was the action that spurred legions of parents to sign up
their offspring for law school. Lotus, by the way, won.
The question remains though: What did Lotus really win, and who cares, anyway?
Sure, Lotus will likely receive damages if Borland's sure-to-be-filed appeal
fails, maybe even enough money to cover several years worth of legal wrangling
and corporate-image damage control.
But more to the point, who, except maybe Lotus, cares anymore about the
interface that started the ruckus? Not Borland, which, the day after the
ruling, started shipping Quattro Pro, bereft of the onerous 123.mu file that
generates the look-alike interface. (The file is still available to those who
want it.) Nor, so it seems, does anyone else. When was the last time you saw a
newly released application that resembles the Lotus look-and-feel? In today's
world, what counts is compliance with CUA, not 1-2-3.
In truth, we all should care about the Lotus/Borland ruling. By allowing Lotus
to copyright menu structures, U.S. District Court Judge Robert Keeton said, in
effect, that you can copyright not only the code that implements the software,
but the structure of the program as well. The problem with this is that the
structure of a program represents an idea, while the source code is the
expression of that idea. Copyright law allows protecting the expression, but
not the idea itself.
Keeton's decision seems contrary to the landmark ruling of U.S. District Judge
George Pratt, who decided the Computer Associates vs. Altai
copyright-infringement case. On the surface, the only similarity between the
two cases is that they're both pigeonholed as copyright infringement. The
CA/Altai suit, however, focused on how software interacts with other software,
while Lotus/Borland embraced how the user interacts with software.
The original issues in the CA/Altai case were straightforward. Altai software
contained CA code that a former CA programmer brought to Altai. The company
conceded the infringement, took its lumps, paid its damages, and rewrote its
software.
CA then extended the suit, saying it also owned copyright to the structure of
the operating-system interface routines--disk I/O and the like. Since both
companies' software was written for the same third-party operating system
(MVS), the interface routines were understandably similar in structure. After
all, the OS demands this. Pratt decided program structures can't be
copyrighted, and ruled against CA.
In his decision, Judge Pratt established guidelines that have since been
supported by law professors and courts around the country. (Keeton's Second
District is a rare exception.) As I interpret Pratt's decision, the elements
you need to consider are:
Text. Printouts of a program's source code can be copyrighted just like book
or magazine text. If two programs have identical listings (as in CA/Altai),
infringement is likely.
Appearance. The images that make up a program's screen display can be
copyrighted, as can a photograph or painting. If the artistic screen display
is the same (say, a video game like PacMan), infringement may have occurred.
Behavior. The behavior, or method of operation, of a program can't be
copyrighted because copyright law doesn't cover procedures and methods of
operation. (These are provided for under patent law.)
In short, one court says you can copyright a program structure; the other says
you can't. One court says you can copyright a program's behavior; the other
disagrees. To buy into Keeton's decision, you have to accept the premise that
a menu structure is somehow different from the other structures or submodules
that collectively make up a program. Interestingly, Keeton seems out on his
own in this as other courts line up behind CA/Altai. U.S. District Court Judge
Vaughn Walker (who's trying the Apple vs. Microsoft/Hewlett-Packard case) has
said, for instance, that the sequence and organization of commands should not
be the sole property of a single organization.
With this in mind, you'd think that the recent Unix System Labs' (nee AT&T)
suit against the University of California at Berkeley over copyright
infringement on the BSD Networking Software, Release 2 tape should be as easy
as, well, 1-2-3. If infringement exists, it's a "text" issue. USL should be
able to put its listings side-by-side with Net/2 code and point to the
offending source. If A = A and B = B, then somebody goofed.
Somehow, I don't think all this will be that easy. Take the murky waters of
copyright law and digital video, for instance. As Michael Swaine discusses in
this month's "Programming Paradigms," the latest challenge for Apple's
QuickTime team is to embed copyright information into individual digital
frames, as well as to hone digital-video encryption techniques.
It's a whole new world out there, copyright-speaking. Maybe there's still hope
for those parents who prematurely packed their kids off to law school.

































October, 1992
LETTERS







WEB: Paeans and Pointers


Dear DDJ,
I was pleased to see Andrew Schulman pay some attention to WEB in his August
1992 "Programmer's Bookshelf" column. This was the first coverage of the topic
I've ever seen outside academic journals.
It was interesting to see Knuth admit that logging his errors hadn't
diminished their frequency. I wonder if he's measured the effect of using WEB
on his stumble rate. I haven't measured it myself, but could swear I make
fewer errors since starting to use WEB; it seems to discourage the "code
first, think later" style of work that constantly tempts programmers. Perhaps
this works in a way similar to how having to explain a program discourages you
from loading it down with features that don't belong in it, as both Knuth and
Schulman note. For if you can explain your implementation of an algorithm--or
better still, if you have explained it--there's a good chance that your code
actually works.
One last thing: Readers following up on Schulman's reference to CWEB will be
disappointed if their C dialect is ANSI, let alone C++. For these I strongly
recommend a system called FWEB, which supports these dialects as well as
Fortran and Ratfor. It includes binaries for MS-DOS as well as build
procedures for UNIXes, VMS, and MVS. OS/2 isn't explicitly provided for, but
is an easy port with the OS/2 GNU tools. FWEB is available by anonymous ftp
from lyman.pppl.gov in the directory pub/fweb.
Lew Perin
Jersey City, New Jersey


C Question


Dear DDJ,
Many, if not most, versions of the memcmp and strcmp C library functions
perform an unsigned comparison of the supplied strings (although this is not
usually stated in the accompanying documentation). I was recently involved in
a discussion (with a group of C programmers) about whether or not these
functions should perform signed or unsigned comparisons. Since there was no
general agreement from the discussion, I was wondering if some of your readers
would be prepared to offer an opinion (or two). It would seem to me that the
answer lies in the precise meaning of the phrase, "lexicographical order."
Llewellyn R. Griffiths
Victoria, Australia


One Steak, One Sushi


Dear DDJ,
I'm writing with regard to Homer Tilton's letter on Japanese patents in the
July 1992 issue. I suspect that his steak dinner is safe. Automated language
translation was one of the first applications computer scientists tackled in
the 1960s. It turned out to be an enormous problem. I haven't heard much about
it for ten years or more. But I'm sure the language translation folks haven't
given up; are making progress; and will be able to do something along the line
of what Mr. Tilton desires in another few decades. Perhaps you'll get a letter
or two from experts who can explain the state of the art.
A computer program for translating English to Japanese (or any other language)
is an enormously difficult undertaking. Almost every English word has multiple
meanings and while Japanese has words or phrases to match most English words,
the word that matches one meaning usually is not the same as the word that
matches another meaning.
Words and phrases often have context-specific meanings. Even translating the
individual words in a phrase "right" is no guarantee that the phrase will make
sense or that it will mean what the writer intended it to mean.
Idioms, homonyms, spelling errors, grammatical errors, and major language
exceptions all need to be handled.
There are a number of things that need to be rephrased in moving between the
languages. Japanese verb tenses are not the same as English verb tenses. While
many words map directly between the two languages, some common ones don't.
The language translator is not just dealing with mechanical replacing of a few
tens of thousands of words with one-to-one equivalents while juggling word
order to get verbs at the end of the sentence. Analysis must go beyond syntax
(bad enough in English) and worry about context. There are lots of special
cases and judgement calls. It's a big job.
One example of the problems. A Japanese vendor managed to translate the phrase
"track jam" (something that happens when a piece of paper sticks while flowing
down a mechanical track) as "the jelly on the track." That happened with
people. Lord knows what a computer would do.
Pragmatically, even if you had an English-to-Japanese translation program, how
would you proofread the result?
An inexpensive alternative to a Japanese law firm would be to hire Japanese
speaking graduate students from the engineering department of a nearby college
or university. Have one translate the document to Japanese and a second
translate it back. Work with one of them to resolve any inconsistencies you
spot in proofreading.
Donald Kenney
Canton, Michigan
Dear DDJ,
I was amazed by the letter by Homer B. Tilton published in the July 1992
issue. I think that if I were to file a patent here in the U.S., I would have
to have my patent application professionally localized to the American variety
of English, if not to Mexican Spanish. I very much doubt that the government
of this country would accept the application written in Japanese. How could
one hope for the Japanese government to accept a patent filing in English? I
thought this highly Bushy.
I have been working on CAT (computer aided translation) for some years using
various translation software. I have only been successful with automatic
translation of simple things such as hello world. Beyond this level of
complexity, the difference in cultural background between the two languages
makes it impossible to generate any comprehensible documents automatically.
The level of complexity of patent filings in Japanese is such that no average
mortal would ever attempt to even read, let alone file, such a document. I
doubt that the situation is much different in the U.S. The government
formalities and specific wordings required in the patent filings in the target
language, whichever it may be, would not simply be represented by a rule-based
logic or any sort of currently available machine intelligence. Being native
Japanese, I have attempted to improve my English writing skills using various
software, without much success. Could software help the average person come up
with a correct patent filing even in his or her own language? My experience
indicates the answer to be negative.
To focus on the cultural background problem, let us as an example develop a
translator from properly written Fortran to properly written C code. These two
languages are alike except for the philosophical background, historical
evolutions, and system environments. Unless the translation system recognizes
and appreciates these differences, the result will still be Fortran code
written in C syntax. No serious programmer would accept such code as a C
program. Wouldn't the same sentiment apply to the "Japatent" filing? Instead
of asking for such an impossible product as automatic patent-filing software,
I would go for a real Fortran-to-C translator. If someone comes up with such
software, my treat will be sushi instead of cholesterol-rich red meat.
Shohei Nakazawa
Sunnyvale, California


CRCMEN


Dear DDJ,
I take exception to Mark R. Nelson's statement in the article "File
Verification Using CRC" (May 1992) that "it is possible that an exceptionally
clever virus will be able to defeat it [CRCMAN]." He then continues, "Further
modifications to the program would improve its ability to fight viruses. For
example, just storing the length of the file along with its CRC would make a
virus programmer's job much more difficult, if not impossible."
On page 65, Mark also states, "The challenge to the virus programmer would
then be to add new bytes to the end of the file so that the original CRC was
restored." This is not only misleading but also demonstrates that Mark is not
familiar with a class of viruses known as "stealth" viruses. It is not
necessary for a virus to forge a scheme like CRC-32 to evade detection. There
are currently a number of viruses (e.g. 4096, Fish-6, 512) that fool all such
schemes, including those that employ cryptographic checksums. It might
surprise Mark that none of these viruses attempt a direct attack on checksum
algorithms, but they still succeed in almost every case!

For any detection mechanism to work against such beasts, the verifier must
first establish a secure channel between the objects on the disk and its
memory buffers. The reason is simple: Stealth viruses intercept disk access
and undo the modifications they have made. In other words, the verifier will
get clean input. Therefore, no matter how sophisticated the algorithm is, it
will be fooled every time.
This does not mean that CRC-32 and such schemes cannot be used to implement an
integrity checker. In fact, they are quite adequate as far as catching
modifications, as long as the verifier can get to the actual contents of the
objects on the disk. If a stealth virus is active in memory, other measures
must be taken to gain untampered access to the disk. Under MS/PC DOS, this is
not as trivial as it sounds; and it may be one of the reasons why there are
many professional software packages that deal with viruses. Homemade solutions
such as CRCMAN are simply not adequate. Mark should have at least advised his
readers to boot from a clean, write-protected floppy diskette and then run
CRCMAN from the same diskette (in case the copy on the hard disk is infected).
Otherwise, CRCMAN not only fails to find any modifications, but also may
spread the infection to all programs it checks. Besides the problem mentioned
above, the article was excellent, and I must express my admiration for Mark's
Data Compression Book (M&T Publishing, 1991). Keep up the good work.
Tarkan Yetiser
Dear DDJ,
Mark Nelson's article, "File Verification Using CRC," and its accompanying C
code are clearly written and make an enjoyable reading. The only thing I don't
agree with and find potentially dangerous is Mark's claim that it would take
"a few weeks of computing power" to come up with a file with a CRC given in
advance, or to produce two files with the same CRC. This is true if a simple
brute force attack is made, but with a little help from linear algebra it can
be done in a matter of seconds; see Example 1.
Example 1

 44444444444444444444444444444444444444444
 (41 t's (i.e. 74 hex) followed by the end-of-file mark (.1A hex)).

 qtttttqttqqtttttqtttqqqtqqtqqtqqqtttttttt
 (some t's replaced by q's. Use s instead of q if you wish). Both files
 have CRC=3EEFECA4.

This algebraic approach is based on a simple observation (which follows from
the properties of polynomial division): The CRC bits are linear functions of
the file bits. To be a bit more specific, consider only the last 32 bits (four
bytes) of a given file. They can be thought of as components of a binary
vector x of size 32. Let y be the corresponding CRC. It can be also
interpreted as a binary vector; moreover, we have y = Ax + b for some constant
binary matrix A and vector b. They can be found by setting x equal to various
unit and 0 vectors and computing the corresponding CRCs. It follows that Ax =
y + b (addition and subtraction are the same in the 0-1 number field) and
finally x=A{-1} (y+b). I used Maple V to invert A in the 0-1 field and produce
the examples below. Once you have A{-1}, you can, given any CRC y, find x from
the aforementioned formula and thus construct a file whose CRC equals y.
The last four bytes in the files in Example 1 were 74 74 74 1A (hex). Change
them to 31 D9 B5 54 to obtain CRC = 00000000. Or change them to 50 F7 10 7A to
obtain CRC = BEEFBEEF.
I apologize for this full-scale lecture, but since it is so easy to forge
CRCs, I believe that a warning of some sort to the unsuspecting readers is in
order. Please feel free to use this material if you like it. In my opinion,
message digest algorithms (such as MD5) are much more reliable for virus
detection than CRCs. MD5 is described in RFC 1321, available by anonymous FTP
from rsa.com.
Miroslav D. Asic
Newark, Ohio
Mark responds: I appreciate (without necessarily understanding) Professor
Asic's demonstration of CRC-32 inversion. While my article pointed out that a
32-bit CRC could be broken fairly easily under a brute-force attack, I did not
explain that a more sophisticated approach could reduce the procedure to a
matter of seconds.
What I do need to re-emphasize is that a CRC check by itself does not
constitute protection against a virus. Professor Asic shows us that the
algorithm is no match for a card-carrying practitioner of linear algebra. My
article just demonstrated that even the mythical "teenage hacker" can invert a
CRC given a spare CPU and a C compiler. Let's just hope that Professor Asic
and his colleagues side with the good guys in this fight.









































October, 1992
 SIZING UP APPLICATION FRAMEWORKS AND CLASS LIBRARIES


Tools for reducing coding effort and leveraging prefab functionality


 This article contains the following executables: DDJAFX.TXT (program spec)
BOROWL.ARC (Borland OWL) OWLYAO.ARC (OWL version by Paul Yao) INZAPP.ARC
(Inmark zApp) OBJMNU.ARC (Island Systems object-Menu) CPPVUS.ARC (Liant
C++/Views) MSMFC.ARC (Microsoft MFC) .


Ray Valdes


Ray is senior technical editor at DDJ. He can be reached through the DDJ
offices, at 76704,51 on CompuServe, or at rayval@well.sf.ca.us.


As today's applications increase in complexity, so do the tools for building
them. Application frameworks, class libraries, and GUI toolkits are three such
categories of tools, yet they represent very different technological
approaches. Still, the tool categories, though different, all address the same
basic software-development issues: reducing coding effort, speeding time to
market, increasing maintainability, adding robustness, and leveraging prefab
functionality.
Although comparing these tools is like comparing apples, oranges, and bananas,
when you start a development project, you might end up saying, "All these
products look like fruit to me." In this article we'll provide you with a
basis for deciding which of these approaches is right for you, focusing on
five object-oriented application frameworks and class libraries. In future
articles, we'll cover GUI toolkits not based on object-oriented languages, as
well as additional class libraries.


The View from Square One


When your project is at square one, there's a simple question: Do you stay
with the raw API (basic DOS or Windows services), or do you choose something
at a higher level? If you choose the latter (the "fruit scenario"), then
you're at square two: choosing between an application framework, a class
library, or a GUI toolkit.
Once you decide on one of these three categories, the next question is, which
particular toolkit? Your decision can be as critical to your success as your
choice of algorithm or programming language. These categories are new enough
that no clear standards have emerged, and many plausible contenders exist.
Even properly defining these categories is sticky. (See the textbox, "What is
an Application Framework?")
The trade-offs are numerous. With an application framework, you can cruise
above the API at a high altitude, but you'll have the steepest learning curve.
Their productivity-increasing power implies great complexity. Also, because
the framework constrains your design decisions, you run the risk of crashing
into a brick wall. If you use a GUI toolkit, you stay much closer to the
ground-level API and enjoy a gentle learning gradient as well as free choice
in architectural-design approach. The downside, of course, is that you must do
much more of the work yourself. Smack-dab in the middle between GUI toolkit
and application framework is the class-library category.
This is where we found ourselves when we needed a program to display digitized
samples for our "Handprinting Recognition Contest." We chose the safe but slow
ground-level route of writing the program using the native Windows API.
(Actually, we first wrote a DOS version which used the compiler's library of
rudimentary graphics primitives.) Like programmers everywhere, we had a
deadline and were leery of diving into a large, complex product we had no
experience with. In retrospect, we wondered which tool was the right one for
the job; hence this article.


The Problem with "Saturday Night" Reviews


Because much of an application framework's machinery lies under the surface,
evaluating application frameworks is a uniquely difficult task. A few hours
spent with the manual or looking over example programs won't cut it for
complex and subtle tools.
So how do you make an informed choice? Not by reading most product reviews
we've seen. The limited hours allotted for a standard product
review--typically, an evening with the manual, plus a weekend working with
sample code--do not give the reviewer sufficient exposure to realworld use of
the framework. (A recent review of several application frameworks in another
programming magazine presents, with a straight face, the source code to "Hello
World" using four different products, averaging about a dozen lines of code
each. Although "Hello World" is a convenient way to meet a framework, it's not
something on which you want to build a long-term relationship. It's like
buying a house based on the look of the front door.) As with a programming
language, the best way to evaluate a tool is to live with it for a while on a
nontrivial development project.
But the live-in approach has problems too. First, the steep learning curve
precludes conducting more than one or two such evaluations. Although the
products are meant to make life simpler, they are complex pieces of software
machinery, as Table 1 and 2 show. Second, weeks or even months can elapse
before you run into product limitations that stymie further development or
require contrived workarounds. When you finally discover that your framework
is in fact a straight-jacket, it may be too late. A third problem is that,
even if a framework is well designed and bug free, it may offer multiple ways
of accomplishing a given task, and it is not always clear to the novice which
is the best approach. By contrast, the experienced user of a framework knows
which methods offer best performance and are easiest to code, but such
knowledge is often hard-won.
Table 1: A simple OOP metric for varius frameworks and class libraries.

 Number of
 classes
 --------------------------------

 Apple MacApp 3.0 70
 Borland OWL 75
 Digitalk Smalltalk/V 110
 Go PenPoint 250
 Inmark zApp 121
 Island object-Menu 81
 Liant C++/Views 102
 Microsoft MFC 87
 Think C Class Library 61
 Xerox Smalltalk-80 120

Table 2: Source-code size for the five packages described in this article.

 Lines of Source in
 Code Kilobytes
 -------------------------------------------


 Borland OWL + ClassLib
 CPP files 11,300 332K
 HPP files 13,678 411K
 Total 24,978 743K

 Inmark zApp
 CPP files 19,299 612K
 HPP files 3,802 100K
 Total 24,101 712K

 Liant C++/Views
 CPP files 27,000 600K
 HPP files 10,139 275K
 Total 37,000 875K

 Microsoft MFC
 CPP files 17,202 403K
 HPP files 8,211 285K
 Total 25,413 688K

 Island object-Menu
 CPP files 35,268 1150K
 H files 6,484 250K
 Total 41,752 1400K

The tack we settled on was to ask expert programmers at various tool vendors
to code up the same small, but non-trivial example program. We wrote a program
specification (see the textbox, "The Spec for the Sample Application") that
exercised many of the most interesting features of these different packages,
yet can be implemented by an expert user of a framework in three or fewer
working days. Someone new to the framework, by contrast, could spend a couple
of weeks accomplishing the same task.
By examining different implementations of the same functional requirements,
you can obtain concrete information to help you decide which available tools
are appropriate. You can then weigh the resulting programs in light of your
own particular trade-offs. The program specification, programmer's notes,
complete source code, and executables are available electronically; see
"Availability" on page 5.
This isn't a confrontational "shoot-out." Arguing about the "best" framework
is like arguing about the best cuisine, text editor, or programming language.
Our position is that different tools satisfy different needs and tastes: Some
support portability between platforms, others focus on high-level object
orientation, others emphasize closeness to the Windows API, and still others
swear by performance optimization. Every programmer has a unique set of
opinions about which characteristics are important and which can be sacrificed
in real-world trade-offs. Not only that, but strongly held opinions often
change in the course of development, in response to changing market
conditions.
We find it more interesting to get a real-world useful application that works,
rather than a contrived example that displays 20 different dialog boxes that
we'll never use. With this in mind, our specification offers plenty of leeway
to the implementor to solve the problem in a way best suited to the
application and tool. This wide latitude in design choices reflects the
implementations described here: We had no dogmatic preferences about UI
widgets; we just cared about getting the job done. Our philosophy is that any
program is better than no program, and the world's fanciest empty dialog box
is useless compared to a complete rudimentary program that accomplishes the
given task. In the case of our own implementations (the raw Windows API and
the DOS versions), the results are indeed homely but usable nevertheless.
Our focus is on the single-user, graphics-oriented application. The example
does not do anything related to text-editing, multiform data entry, database
access, or multiuser processing. Nor does our spec address cut, copy, paste,
selection by pointing, raster-image data, and the all-important Undo command.
In homage to real-world randomness, we'll leave these for another time.


The Participants


We've not tried to be comprehensive in our selection of vendors. (There are
way too many to cover in one or two issues of the magazine.) Rather, our focus
remains on technology issues rather than specific products. We've tried to
make our list representative rather than comprehensive. Comparing apples,
oranges, and bananas is a stated goal; comparing all such instances is not.
Although the boundaries are not always clear-cut, we look this month at
object-oriented application frameworks and class libraries. Future articles
will focus on GUI toolkits and bare-bones class libraries.
This article sizes up the following tools:
Borland's ObjectWindows Library (OWL).
Inmark's zApp Application Framework.
Island Systems' object-Menu for C++.
Liant's C++/Views Class Library.
Microsoft's Foundation Classes (MFC).
In the coming months, we'll present implementations using packages such as
Autumn Hill's Menuet, MetaGraphics' MetaWindow, XVT Software's XVT, and
others.


The Results


The results are indeed striking in both diversity and homogeneity -- a
fascinating case study in user-interface design as well as engineering
trade-offs. Figure 1 through Figure 5 show screen displays from the various
implementations. Table 3 summarizes which features were actually implemented.
Tables 4, 5, and 6 provide some rudimentary metrics (lines of code, executable
size, and runtime memory profile, respectively) on the various
implementations. These metrics, like EPA mileage estimates, are for rough
comparisons only; relative standings may vary depending upon your application.
Table 3: Feature sets in the different implementations of the DDJ HWX Browser
(missing features do not imply lack of support by product).

 Borland Inmark Island Liant Microsoft
 OWL zApp object-Menu C++/Views MFC
 -------------------------------------------------------------------------

 File-open dialog x x x x x
 Data window is
 resizable -- x x -- --

 Data window is
 scrollable -- -- x -- --
 Access commands
 via menu x x x x x
 Access commands via
 toolbar or button x x x x x

 Show menu help
 in status pane -- x x -- --
 Show general help
 in Help window -- x -- x --
 Select instance by
 pointing x x x x x
 Select letter
 by pointing x x x x x
 Select instance by
 keyboard x x -- -- --
 Select letter
 by keyboard x x -- -- --
 Select instance by
 scrollbars x x x x x
 Select letter
 by scrollbars x x x -- x
 Show all letters
 and instances -- -- x -- --
 Multiple kinds of
 views of instance x x x -- x
 Display custom
 sequence of letters -- -- x -- --
 Letters in drop-down
 graphic list -- -- -- -- x
 Change line
 color of letter x x x x x
 Change background
 color of letter x x x -- x
 Change line
 width of letter x x x x x
 Change scaling
 of letter x x -- -- --
 Print letter x x x x x
 MDI-style
 child windows -- -- x -- --
 Tear-off menus -- -- x -- --

Table 4: Source-code size (in lines of code) of different implementations of
the DDJ HWX Browser.

 Line Borland Inmark Island Liant Microsoft
 Type OWL zApp object-Menu C++/Views MFC
 ---------------------------------------------------------

 CPP 2470 1116 1592 1512 412
 HPP 269 -- -- -- --
 C -- -- -- -- 251
 H 816 29 197 334 175
 RC 106 74 -- 5 85
 DEF 6 9 -- 8 11
 ____________________________________________
 Total 3398 1497 1789 1859 934


Table 5: Size of executable files (in bytes) of different implementations of
the DDJ HWX Browser.

 Borland OWL 56,848
 Inmark zApp 334,848
 Island object-Menu 470,048
 Liant C++/Views 269,312
 Microsoft MFC 46,536

Table 6: Profile of memory consumption after initial program load.

 Segment Type Number of Number
 Segments of Bytes
 -----------------------------------------------------------

 Borland OWL Code 11 24,928
 DGroup 1 22,304
 Resource 2 1,536
 Private data 3 4,896
 Total Application 17 54,784{*}

 Inmark zApp Code 17 208,352
 DGroup 1 39,328
 Resource 2 1,024
 Private data 2 69,632
 Other 10 2,176
 Total Application 32 320,512{*}

 Liant C++/Views Code 14 197,792
 DGroup 1 51,104
 Resource 9 5,248
 Private data 5 69,952
 Other 2 1,024
 Total Application 31 325,120{*}

 Microsoft MFC Code 1 29,984
 DGroup 1 15,520
 Resource 8 992
 Other 3 1,280
 Total Application 12 47,776{*}

{*} The OWL implementation has an additional runtime requirement of 312,640
bytes for the OWL runtime DLLs. In addition, the OWL, zApp, C++/Views, and MFC
implementations are all Windows-based programs, and therefore require memory
consumption of 915,584 bytes of Windows runtime environment. Because Island
Systems' implementation is DOS-based there are no comparable figures; after
program load, the memory manager reports 346 Kbytes available for use by the
application.

The following sections discuss, in alphabetical order, each of the five
toolkits and their corresponding implementations. While there's not enough
room in this article to fully describe any one of the five products covered
here, we'll introduce them briefly and touch upon interesting highlights. Keep
in mind that our goal is not detailed coverage, but rather to direct your
attention to areas for further investigation.
The tools in this article all run on the PC platform. Four of the five
packages--Borland's OWL, Microsoft's MFC, Inmark's zApp, Liant's C++/
Views--are Windows-based, while Island's object-Menu is DOS-based. Others,
such as Glockenspiel's CommonView and Autumn Hill's Menuet/CPP, are certainly
worth considering. Fortunately or unfortunately, the area of application
frameworks and class libraries is one in which there will be many available
choices and little consensus of opinion about which is best, for some time to
come.


Borland's ObjectWindow Library


OWL from Borland, while not the first framework on the PC market, is certainly
the most visible and likely the one with the most number of installations,
given the popularity of Borland's C++ language products.
More Details.
The technology in OWL evolved from an earlier effort by the Whitewater Group
(authors of the Actor language/environment for Windows), which has been in use
for some years and is generally well regarded. OWL relies on a nonportable
extension to C++ known as dynamic-dispatch virtual tables (DDVT), which
simplifies the handling of Windows messages. While Borland has shown a version
running on NT and has announced plans to support OS/2, OWL currently runs only
on Windows. Note that OWL does not support DOS apps; for that, you must use
Borland's companion framework, TurboVision.
More Details.
OWL classes (of which there are 25, plus 405 methods) cover the middle layer
of user-interface components, such as dialogs, controls, and text-edit
windows, as well as some higher-level components (application classes, MDI
frame windows). Graphics classes (those that correspond to the Windows GDI)
can be found in the Whitewater Group's ObjectGraphics library, designed to
complement OWL. OWL does not provide low-level components such as
data-structure classes (collections, sets, arrays, lists) and date and time
classes. These low-level classes are part of Borland's Class Library, which
runs on both DOS and Windows and supports both OWL and TurboVision. Borland's
Class Library comes with the OWL package, and for this reason we count OWL and
the Class Library together in our discussion and in the table summaries.
(Windows expert and noted author Paul Yao has ported our DOS implementation to
Windows using OWL, resulting in an implementation of 1050 lines. This
implementation is also available electronically, if you want to see another
approach to OWL programming.)
Figure 1 shows the interface Borland programmers designed. Listing One (page
106) shows some of the code implementing the main application window.
Borland's implementation of our sample application required the most lines of
code; see Table 4. Despite the number of lines, the resulting executable is
still small; see Table 5. Unlike those implementations with large executables,
programs written with OWL do require about 312K of runtime DLLs. The runtime
support, however, is shared by all OWL applications running at the time, much
as Windows itself is shared by Windows apps.


Inmark's zApp



Of the Windows-hosted products discussed here, zApp and C++/Views are the two
which strive to take the high road above the Windows API. zApp and C++/Views
each define higher-level abstractions that offer a modern, event-driven GUI
application model, which is portable to platforms other than Windows (OS/2,
DOS, and Motif). The advantage of the "high-road" approach is that the design
is not burdened or warped by the historical vagaries of the Windows API. The
disadvantage for the programmer is that there is a brandnew set of concepts to
learn, with the potential danger that some of these abstractions may be poorly
conceived and not stand the test of time.
Unlike C++/Views, which is strongly influenced by the classic Smalltalk MVC
paradigm, zApp has its own unique design. The design, however, results from
years of experience in building GUI apps. (The architects of zApp started out
as a consulting firm doing GUI software development in C++.)
There's a saying that good design is invisible. As Ward Cunningham puts it,
"Good class libraries whisper the design in your ear." This is the case with
zApp, which straight-forwardly provides all the usual classes you'd expect in
building event-driven GUI programs: a main application class, a small
hierarchy of event-handling classes, classes for graphic display,
window-containing classes, and the usual GUI widgets (push button, check box,
list box, combo box, and so on). There are no radical concepts or unpleasant
surprises, greatly easing any learning curve associated with a brand-new API.
Inmark's implementation is shown in Figure 2. Listing Two (page 108) shows
some of the principal member functions in Inmark's implementation of our
sample application.


Island Systems' object-Menu


object-Menu is the only DOS-based product we cover in this article. Like the
other packages here, it's written in C++ and provides a set of classes that
serve as a configurable framework for a GUI application. Island Systems plans
to release a Windows version later this year.
The programmers at Island Systems came up with the most complete
implementation of our sample application (see Table 3), showcasing many of the
features in their product. The application's visual components (see Figure 3)
have a sculpted, three-dimensional look reminiscent of Motif. One slightly
disconcerting element for people used to Windows (and the Mac) is that the
mouse pointer arrow angles to the right rather than to the left.
Among the features of the sample implementation, multiple scrollable browser
windows can be opened to view part or all of the handwriting-data samples.
Listing Three (page 110) shows excerpts from Island's implementation. Each
window can be iconized, and has its own menu. An object-Menu window includes
functionality for maximize, minimize, close, resize, drag, and autoplacement
of internal items. object-Menu can use automatic placement directives to
define positioning of visual components, similar to the facility in zApp and
C++/Views. Unlike other implementations that use scrollbars to select next and
previous characters in the alphabet, the scroll bars here truly scroll the
items in a given window's display.
One nice feature in Island Systems' implementation is a window that allows you
to type in a string and see the corresponding letters from the handwriting
sample data. Another unique feature is the use of tear-off menus, similar to
those on the Mac and in OpenLook; the Edit/Copy menu is the only tear-off menu
in this implementation. Island's implementation apparently does not directly
support printing, but accomplishes the task via the use of Graf-Drive Plus
(from Fleming Software). Note also that object-Menu does not contain graphics
primitives, but can work with a number of low-level graphics libraries
(Borland's BGI, MetaWindow, Genus FX, and Flash Graphics).
The executable size of Island Systems' implementation was larger than the
others. However, considering that the other packages are relying on Windows
for extensive support (approximately one megabyte of runtime support, up to
five megabytes on disk), the size of the Island Systems implementation does
not seem so large.
With regard to performance, all of the Windows-based implementations felt
about the same on our test hardware (a 386/33 with 8 Mbytes of RAM). Island's
DOS-based implementation was the only one that seemed sluggish. One reason for
this is that the other implementations were running in Windows Enhanced mode,
which uses both extended and virtual memory. In DOS, of course, any memory
above 640K generally lies unused. Because the code we supplied reads the
entire data file (250K worth) into memory, there's not much room left, and the
overlay manager must kick into action. Using a disk cache helped, but the
sluggishness remained. object-Menu can get around this overlay swapping by
using Phar Lap's DOS extender and Metaware's 32-bit C++ compiler, but we did
not try this version.
Another way to speed up performance might be to use an optimized graphics
library, such as MetaWindow, or Flash-Graphics instead of BGI.


Liant's C++/Views


Of the five tools covered here, Liant's C++/Views may have the longest
heritage. C++/Views evolved from a C-based package that attempted to bring to
C some of the benefits of the Smalltalk language and class libraries. The
package was called "c_talk" (from CNS, later acquired by Liant) and came out
in 1988. It was noteworthy because it included a class-browser utility similar
to that found in Smalltalk-80 and Digitalk's Smalltalk/V. A class browser is a
program that allows you to view and edit the source code for classes by using
a multipane window display. In one window pane is a scrolling hierarchical
list of classes, on a second pane is a scrolling list of methods (termed
"member functions" in C++), and in a third pane is the code for the method
itself, which can be edited and changed.
The design of the class browser stems from the object-oriented nature of the
Smalltalk environment, in which there is no real concept of source files,
header files, or module linking as there is in C and C++. By contrast, the
integrated development environments (IDE) from Borland and Microsoft are
designed with the individual source-code file as the main focus of a
programmer's attention. As time passes, these IDEs have evolved to better
support object-oriented programming by supplying graphical views of class
hierarchy and so on. In the DOS or UNIX C/C++ environment, you can always drop
down out of the IDE to work at the command-line level, editing source files
with your favorite text editor and running the compiler directly (or via
make). In the Smalltalk environment, the class browser provides the sole
access to the source code. Liant's version allows you to work in both ways.
Source code is maintained in source files, rather than in a persistent object
database of classes and methods.
C++/Views is strongly influenced by Smalltalk's Model/ View/Controller (MVC)
paradigm. This approach to structuring applications consists of three parts: a
control component that manages events, a view component that manages the
presentation of data to the user, and a model component that encapsulates
application-specific data and processes. (See Adele Goldberg's "Information
Models, Views, and Controllers," DDJ, July 1990.) According to this protocol,
any changes made to the application's data model (say it's a collection of
numbers) can be automatically reflected in multiple views (such a bar chart,
pie chart, or spreadsheet view) without the model knowing which views are
presenting the data. In my opinion, this design strategy has not been
surpassed in the 15 years since it was first conceived.
Despite this heritage, C++/Views is more than a clone of Smalltalk-80; it has
its own design. The VNotifier class is the controller that handles events. The
VView class (a subclass of VWindow) is an abstract class from which view
classes are derived. Graphics output is accomplished through the VPort class.
The application's data model can be constructed using user-defined classes in
conjunction with C++/Views data-structure classes (sets, dictionaries, lists,
and the like).
Liant's implementation of our sample application is not the most
feature-filled, but has, as Figure 4 shows, a clean, intuitive feel. The
implementation strategy decomposes the main window into a nested set of panes,
or views, corresponding to the various visual elements on the application.
Listing Four (page 111) shows a sample of the browser code. One nice feature
in Liant's class library is the VRatio class, which is used in conjunction
with the VFrame class (a window frame) to flexibly specify the screen size of
display objects. This is similar to the automatic sizing facility in zApp.


Microsoft's Foundation Class Library


Most articles on Microsoft's MFC say it's a "thin veneer" or "wrapper" around
the Windows native API. This myth is not substantiated by the facts. Metrics
such as line count, number of classes, and number of methods indicate
otherwise.
The reason behind the myth is likely that MFC is designed to sit at a close
conceptual distance from the Windows API. Classes in MFC map on a one-to-one
basis with entities in the Windows API, such as display context, bitmap,
brush, button, MDI child window, and so on.
In terms of lines of code, the source for MFC library is larger than the
source for OWL or zApp (although all are in the same general ballpark). Those
lines of MFC source code are not just idly sitting there: Microsoft was able
to implement our sample application using the fewest lines of code. In
addition, Microsoft's implementation had the smallest executable and required
the least amount of runtime support from DLLs. Listing Five (page 113) shows
sample code that makes up the browser UI.
Microsoft's implementation was not as comprehensive as some of the others
(such as Inmark's or Island Systems'), although it did fully satisfy the specs
we provided. Microsoft's implementation (Figure 5 shows the UI implemented by
Microsoft) provided one nice feature that the others did not: a drop-down list
box that shows the graphical shapes of the handwriting data. Using this UI
component as a means for selecting letters, rather than having to handcode the
tabular display of shapes, may have contributed to the compact implementation.
One possible reason for the small executable size is the use of a dialog box
as the main application window, rather than using a "heavy-weight,"
user-defined window. In Windows, dialog boxes are built-in components, in that
they don't require user code for registration or full-fledged window procs,
thus saving space in the executable and/or the runtime. Using dialogs means
the main window is not scrollable or resizable (not required by the spec).
Note that other implementors also used the dialog-box technique (in Borland's
implementation, for example, TMainWindow is a subclass of TDialog), but
without equivalent savings in space.
Microsoft has demonstrated MFC on DOS text mode (via Mewel, like Inmark's
zApp), on Windows/NT, and apparently has plans to create a portability layer
for the Macintosh platform (according to leaks published in the trade press).
Inelegant yes, but portable nonetheless.
The design of MFC requires a form of exception handling and template classes,
which are implemented in a nonstandard way due to deficiencies in Microsoft's
C7 compiler (unlike Borland's compiler, C7 does not support templates). But
these mechanisms are implemented using macros and other standard constructs,
and are therefore theoretically portable to other compilers. It would be an
interesting exercise to port MFC to Borland's compiler. For those who don't
mind sitting within spitting distance of the native Windows API, such a
combination may well be the best of both worlds.


Conclusion


All the tools discussed here were able to deliver on our program spec. Each
has different strengths and weaknesses; you'll have to decide which one fits
your requirements profile. Try out the sample implementations. Ask the vendors
for additional information and demo diskettes.
Although these products are full-featured and mature, it seems that none of
the PC-based products yet contain the rich functionality present in
non-PC-based frameworks like MacApp or PenPoint. This situation will likely
change. In the meantime, you can still get a leg up on program development by
using one of these products.


What is an Application Framework?


Although application frameworks are not a new concept, they have only recently
arrived as a mainstream development tool on the PC platform. Object-oriented
languages are ideal vehicles in which to embody an application framework, and
the advent of C++ to the PC platform has allowed mainstream PC programmers to
finally enjoy the benefits of application frameworks.
From the early '80s to the start of this decade, C++ was found mostly on UNIX
systems and researcher's workstations, rather than on PCs and in commercial
settings. C++, along with other object-oriented languages, enabled a number of
university and research projects to produce the precursors to today's
commercial frameworks and class libraries. The most visible of these early
efforts were Mark Linton's InterViews, Keith Gorlen's NIH class library, the
Andrew toolkit from CMU, and Erich Gamma's ET++ framework.
These tools had their roots in the Smalltalk-80 system, the granddaddy of
frameworks and class libraries. In Smalltalk-80, there's no concept of an
application distinct from the enclosing framework or environment. The act of
programming in Smalltalk consists of navigating the class hierarchy in search
of appropriate classes to reuse or subclass. And, it turns out, there are
quite a few. The typical Smalltalk-80 environment has over 120 classes, 4000
methods, 6000 instantiated objects, and 1.3 Mbytes of source code.
In 1985, Apple Computer's MacApp system rigorously systematized the key ideas
of a commecial application framework: a generic app on steroids that provides
a large amount of general-purpose functionality within a well-planned,
well-tested, cohesive structure. More specifically, Apple defines an
application framework as "an extended collection of classes that cooperate to
support a complete application architecture or application model, providing
more complete application development support than a simple set of class
libraries." This support means not just visual components such as list boxes
and dialogs, but all the other facilities needed by an application, such as
support for undoing commands, printing, and debugging.
For our purposes, we define an application framework as an integrated
object-oriented software system that offers all the application-level classes
(documents, views, and commands) needed by a generic application. An
application framework is meant to be used in its entirety, and fosters both
design reuse and code reuse. An application framework embodies a particular
philosophy for structuring an application, and in return for a large mass of
prebuilt functionality, the programmer gives up control over many
architectural-design decisions. The architectural approach used by many
application frameworks is an evolutionary descendant of Smalltalk's
Model/View/Controller triad.
Class libraries and GUI toolkits are generally smaller and simpler systems
than application frameworks. We define a class library as an object-oriented
set of workhorse classes that can be incorporated piecemeal into an
application, rather than the other way around. (Apple's definition of a class
library is "a collection of classes designed to work together to make a given
set of programming tasks easier, i.e., numerics, graphics memory management,
etc.") By allowing the programmer to pick and choose, a class library does not
enforce any particular architectural approach. Class libraries are
object-oriented implementations, unlike GUI toolkits. The services offered by
class libraries include both user-interface functions (menus, dialog boxes,
and graphics primitives), as well as general-utility functions (date and time
conversion, data-structure manipulations, and so on).
Finally, GUI toolkits offer services similar to those of a class library, but
using a procedure-oriented rather than an object-oriented interface. GUI
toolkits predate object-oriented languages, and some of the most successful
products in this category are written in assembler. Most DOS-based GUI
libraries are at the same basic level of abstraction as Windows (or at least
the early versions of Windows). Unlike class libraries, GUI toolkits generally
do not offer nongraphical utility routines (queue management routines, for
example).
Both class libraries and GUI toolkits often come bundled with design tools
that allow you to interactively specify application components, such as dialog
boxes and menus.

In practice, the distinctions between one product and another are not as
clear-cut as these definitions. For example, one class-library vendor directly
compares their product against a popular application framework. Another
class-library vendor avoids comparisons with application frameworks, yet the
sample programs bear an uncanny philosophical resemblance to those of the
application framework.
Although the ancestral lines are often blurred, a collective unconscious
(community memory) is at work here. For example, one particular vendor (not
part of this issue's roundup) was unaware of the basic concepts of MacApp,
Smalltalk, and the MVC triad, even though his product competes directly
against frameworks that embody the architectural heritage of these OOP
pioneers.
--R.V.



The Spec for the Sample Application


We've all been there. Your boss comes to you on a Friday afternoon with a
request for a "simple program" that's needed ASAP. In this particular case, it
happens to be a graphics-oriented program to display some digitized vectors.
"It's real easy," your boss tells you. "All you have to do is let the user
pick from a list of files, then read in the data, which is just a bunch of
points, and then display those points using MoveTo and LineTo. Do you think
you can have it ready by Monday morning?"
This, in essence, is what we asked the tool vendors to do. The application is
called HWX Browser, and it allows users to read in different sets of
handwriting data from files on disk and display the alphabets in an
application window.
The program is not a contrived example meant to exercise items on vendor's
feature lists. Rather, the specs come from an entirely different project and
predate the current effort. The program is in many ways representative of
graphics-oriented programs, as well as offering its own unique challenges to
the implementor.
The DDJ HWX Browser allows the user to view handwriting samples which have
been gathered from a number of people. Each file represents samples collected
from a single person, in the form of a series of alphabets. The file has a
particular structure, consisting of a short header, followed by the data for
individual letters of the alphabet, each letter being represented by multiple
instances or versions. The data for a single instance is basically a sequence
of coordinates as gathered by the digitizer; a set of points for each stroke
in a character.
Each vendor received the same specs and had the same amount of time. The basic
areas of functionality are: file operations, character selection, display of
the image in a window, printing, and debugging. For each area, we specified
minimum requirements and identified optional features. Our spec provided
plenty of leeway to the implementor to highlight features unique to their
product. The complete spec is available electronically.
We supplied the data files and the specification for the file format; we also
implemented DOS-based functions to read the data from the disk into an
in-memory data structure, and other functions to display a letter on the
screen using MoveTo() and LineTo() callback functions. Implementors were free
to change the code to their liking, as long as functionality was preserved.
This was not a programming exam in the sense of trick questions or hidden
problems to solve. We're only interested in looking at the final result, in
understanding what it took to get there, and in discussing the technological
issues surrounding frameworks, class libraries, and GUI toolkits. Also, we
hoped to provide some good fun. In retrospect, it looks like we succeeded in
that goal: More than one implementor told us that he welcomed the chance to do
some uninterrupted programming.
--R.V.

_SIZING UP APPLICATION FRAMEWORKS AND CLASS LIBRARIES_
by Ray Valdes


[LISTING ONE]

/******* BORLAND ******/

....standard #includes...

_CLASSDEF( TMainWindow )
_CLASSDEF( TMainApp )

//---------------------------- Constructor of Application's MainWindow.
TMainWindow::TMainWindow( PTWindowsObject AParent,LPSTR AName,PTModule
AModule)
 : TDialog( AParent, AName, AModule )
{
 fDataLoaded= FALSE;
 DrawData.cur_char = 'A';
 DrawData.cur_inst = 0;
 DrawData.rgbBackGroundColor = GetSysColor( COLOR_WINDOW );
 DrawData.rgbLineColor = GetSysColor( COLOR_WINDOWTEXT );
 DrawData.nLineThickness = 1;
 DrawData.nScaleFactor = NORM_SCALE;
 GridWindow = new TGridWindow( this, ID_GRIDWINDOW , &DrawData );
 ViewWindow = new TViewWindow( this, ID_VIEWWINDOW , &DrawData );
 Printer = new TPrinter;
 ScaleScrollBar = new TScrollBar( this, ID_SCROLLSCALE );
 ScaleText = new TStatic ( this, ID_SCALETEXT,-1);
}
//-------------------------------------- Destructor of App's MainWindow
TMainWindow::~TMainWindow()
{
 if ( Printer ) delete Printer;
}
//------------------------------------- return Main Window's Class Name
LPSTR TMainWindow::GetClassName()
{
 return "bordlg_MainDialog";
}

//---- Use the Borland Custom Dialog Class and indicate the Application's Icon
void TMainWindow::GetWindowClass( WNDCLASS& AWndClass )
{
 TDialog::GetWindowClass( AWndClass );
 AWndClass.lpfnWndProc = BWCCDefDlgProc;
 AWndClass.hIcon = LoadIcon( GetApplication()->hInstance,"ApplicationIcon" );
}
//----- Function called as a result of WM_INITDIALOG. Center Dialog on Screen.
void TMainWindow::SetupWindow()
{
 RECT rc;
 TDialog::SetupWindow();
 GetWindowRect( HWindow, &rc );
 OffsetRect( &rc, -rc.left, -rc.top );
 MoveWindow( HWindow, (( GetSystemMetrics( SM_CXSCREEN ) -
 rc.right ) / 2 + 4 ) & ~7,
 ( GetSystemMetrics( SM_CYSCREEN ) - rc.bottom ) / 2,
 rc.right, rc.bottom, 0 );
 ScaleScrollBar->SetRange( 1, 10 );
 ScaleScrollBar->SetPosition( 5 );
 EnableScaleScrollBar( FALSE );
 ScaleText->SetText( "&Scale: 50%" );
 SetFocus( GridWindow->HWindow );
}
void TMainWindow::EnableScaleScrollBar( BOOL fFlag )
{
 EnableWindow( ScaleScrollBar->HWindow, fFlag );
 EnableWindow( ScaleText->HWindow, fFlag );
}
//---- Function responds to FILEREAD Menu Option...
// Prompts user for FileName and Read Data...
void TMainWindow::CMFileRead ( RTMessage )
{
 strcpy( lpszFileName , "*.dat" );
 GetApplication()->ExecDialog( new TFileDialog( this ,
 SD_FILEOPEN , lpszFileName ));
 SetWait();
 if ( file_LoadHWXData( lpszFileName ))
 {
 fDataLoaded = TRUE;
 GridWindow->UpdateView();
 ViewWindow->UpdateView();
 }
 ReleaseWait();
}
//-------- Responds to the Print Request...
void TMainWindow::CMPrint(RTMessage)
{
 PTPrintCharInst Printout = 0;

 if ( Printer )
 {
 Printout = new TPrintCharInst( "Char Instance", DrawData.cur_char,
 DrawData.cur_inst, DrawData.nLineThickness,
 DrawData.nScaleFactor );
 if ( Printout )
 {
 Printout->SetBanding( TRUE );
 Printer->Print(this, Printout);

 delete Printout;
 }
 }
}
//------- Allows user to Select/Configure Printer
void TMainWindow::CMPrinterSetup(RTMessage)
{
 if ( Printer ) Printer->Setup(this);
}
//--------- Request to Exit Menu Choice - Terminates App...
void TMainWindow::CMExit( RTMessage )
{
 PostMessage( HWindow, WM_SYSCOMMAND, SC_CLOSE, 0 );
}
//--------------------------------------Bring up AboutBox
void TMainWindow::CMAbout( RTMessage )
{
 GetApplication()->ExecDialog( new TDialog( this , "ABOUTBOX" ));
}
void TMainWindow::UpdateSubViews( void )
{
 GridWindow->UpdateView();
 ViewWindow->UpdateView();
}
//----- Allows user to select Foreground/Line Color.
void TMainWindow::IDForColor( RTMessage )
{
 PTColorDialog ColorDialog = new TColorDialog( this , DrawData.rgbLineColor );
 if ( ColorDialog )
 {
 delete ColorDialog;
 UpdateSubViews();
 }
}
//--------------------------------------user selects Background Color
void TMainWindow::IDBackColor( RTMessage )
{
 PTColorDialog ColorDialog = new TColorDialog( this ,
 DrawData.rgbBackGroundColor , TRUE );
 if ( ColorDialog )
 {
 delete ColorDialog;
 UpdateSubViews();
 }
}
//-------------------------------------- Responds to Line Thickness Selection
void TMainWindow::IDLineRad1( RTMessage )
{
 DrawData.nLineThickness = 1;
 UpdateSubViews();
}
//---------------------------------- user can specify scale for GridDisplay
void TMainWindow::IDScrollScale( RTMessage )
{
 int nPos = ScaleScrollBar->GetPosition();
 if ( nPos != DrawData.nScaleFactor )
 {
 char buff[80];
 wsprintf( buff, "&Scale: %d%%", nPos*10 );

 DrawData.nScaleFactor = MAX_SCALE + MIN_SCALE - nPos;
 ScaleText->SetText( buff );
 GridWindow->UpdateView();
 }
}
//---- Sent by GridWindow to MainWindow which then informs ViewWindow of
change
void TMainWindow::SetCurChar( int ch, int inst )
{
 if ( DrawData.cur_char != ch DrawData.cur_inst != inst )
 {
 DrawData.cur_char = ch;
 DrawData.cur_inst = inst;
 ViewWindow->UpdateView(); // Allows View to Refresh
 }
}
//---------------------------------- Constructor of Main Application Class

TMainApp::TMainApp( LPSTR AName, HINSTANCE hInstance,
 HINSTANCE hPrevInstance, LPSTR lpCmd, int nCmdShow )
 : TApplication( AName, hInstance, hPrevInstance, lpCmd, nCmdShow )
{
}
//--------------------------------- Specifying the Application's Main Window
void TMainApp::InitMainWindow()
{
 BWCCGetVersion(); // Force Implicit Loading of BWCC.DLL !
 MainWindow = new TMainWindow( NULL, "MainDialog" );
}
//---------------------------------- Create Application Class and run !
int PASCAL WinMain( HINSTANCE hInstance, HINSTANCE hPrevInstance,
 LPSTR lpCmd, int nCmdShow )
{
 TMainApp MyApp( "HWX Browser", hInstance, hPrevInstance, lpCmd, nCmdShow );
 MyApp.Run();
 return( MyApp.Status );
}






[LISTING TWO]

/******** INMARK *******/

#include "zapp.hpp"
#include "ddj.hpp"

typedef int (zEvH::*KeyProc)(zKeyEvt*);
Alphabet *alpha=0;
char *types[]={
 "HWX Files (*.dat)","*.dat",
 "All Files (*.*)","*.*",
 0,0
};
void Browser::removeViews() {
 if (grid!=0) {
 grid->sizer()->remove();

 has()->remove(grid);
 delete grid;
 grid=0;
 }
 if (curVis!=0) {
 curVis->sizer()->remove();
 has()->remove(curVis);
 }
 if (ilet!=0) {
 ilet->sizer()->parent()->remove();
 delete ilet->sizer()->parent();
 has()->remove(ilet);
 delete ilet;
 ilet=0;
 }
}
void Browser::changeLetInInstView() {
 removeViews();
 curInst = 0;
 ilet=new VisLetter(this,center,new
zPercentSizer(left,100,50,sizer()),curLet,1);
 has()->append(ilet);
 grid=new GridCompound(this,sizer());
 select(grid->addLetter(alpha,curVis->letter()->ch()));
 sizer()->update();
 canvas()->setDirty();
 updateStats(selVis);
}
void Browser::changeInstanceInAlphaView(int spos) {
 removeViews();
 curInst = spos;
 grid=new GridCompound(this,sizer());
 select(grid->addAlphabet(alpha,curInst));
 updateStats(selVis);
 sizer()->update();
 canvas()->setDirty();
}
void Browser::changeLetterFromScrollPos(int spos) {
 int cnt=0,let=0;
 while (let <HICHAR && cnt <=spos)
 if (alpha->instance(let++,curInst)!=0) cnt++;
 changeChar(let-1,curInst);
}
void Browser::changeLetFromInstScr(int spos) {
 int cnt=0,let=0;
 while (let <HICHAR && cnt <=spos)
 if (alpha->instance(let++,curInst)!=0) cnt++;
 curLet = alpha->instance(let-1,curInst);
 curVis->letter(curLet);
}
InfoPane::InfoPane(zWindow *w, zSizer *siz):VisualPane(w,siz) {
 VisPanel *pan;
 pan=new VisPanel(this,new zPercentSizer(left,100.0,33.3,siz));
 add(pan);
 zSizer *x2;
 x2=new TextFrameSizer(middle,zDimension(0,0),sizer(),pan);

 totalc=new InfoText(this,x2,"Total Chars",pan);
 fsize=new InfoText(this,x2,"Size",pan);
 totalseg=new InfoText(this,x2,"Total Segments",pan);


 pan=new VisPanel(this,new zPercentSizer(left,100.0,33.3,siz));
 add(pan);
 x2=new TextFrameSizer(middle,zDimension(0,0),sizer(),pan);
 wchar=new InfoText(this,x2,"Char",pan);
 cinst=new InfoText(this,x2,"Instance",pan);
 scalef=new InfoText(this,x2,"Scaling Factor",pan);

 pan=new VisPanel(this,new zPercentSizer(left,100,33.3,siz));
 add(pan);
 x2=new TextFrameSizer(middle,zDimension(0,0),sizer(),pan);
 nstrok=new InfoText(this,x2,"Strokes",pan);
 npoints=new InfoText(this,x2,"Points",pan);
 csize=new InfoText(this,x2,"Size",pan);
 backgroundColor(zColor(192,192,192));
}
Browser::Browser(zWindow *w,zSizer *siz, InfoPane
*inp):VisualPane(w,siz,zSCROLLVzSCROLLH) {
 ilet=selVis=0;
 curLet=0;
 curInst=curWidth=0;
 curView=IDM_SVIEW;
 ip=inp;
 curVis=new VisLetter(this,center,sizer(),0,1);
 add(curVis);
 has()->alwaysHit();
}
BOOL Browser::getPreferredSize(zDimension& d) {
 d.width() = 500;
 d.height() = alpha->maxSize().height()*grid->vertKids();
 return 1;
}
void Browser::updateStats(VisLetter *vl) {
 char buf[50];
 wsprintf(buf," %d - %c ",(int)vl->letter()->ch(),
 (char)vl->letter()->ch());
 ip->wchar->setVal(buf);
 wsprintf(buf," %d of %d ",vl->letter()->instance()+1,
 alpha->letterInstances(vl->letter()->ch()));
 ip->cinst->setVal(buf);
 wsprintf(buf," %d x %d ",
 vl->maxSize().width(),vl->maxSize().height());
 ip->csize->setVal(buf);
 ip->nstrok->setVal(vl->letter()->strokes());
 int tot=vl->letter()->strokes();
 for (int i=0;i<vl->letter()->strokes();i++)
 tot+=vl->letter()->path(i).length();
 ip->npoints->setVal(tot);
 ip->scalef->setVal(vl->scaling());
}
int Browser::ch(zKeyEvt* e) {
 if (curView == IDM_SVIEW curView == IDM_IVIEW) {
 Letter *test=0;
 if (alpha!=0) test=alpha->instance(e->ch(),0);
 if (test!=0 && test!=curLet) {
 if (curView == IDM_SVIEW) changeChar(e->ch(),0);
 else {
 curLet = test;
 curVis->letter(curLet);
 int mnum = 0;

 for (int cnt = 0; cnt < curVis->letter()->ch();cnt++)
 if (alpha->instance(cnt,curInst)!=0) mnum++;
 scrollBarVert()->pos(mnum);
 scrollBarVert()->oldpos(mnum);
 changeLetInInstView();
 }
 }
 }
 return 0;
}
int Browser::printView(zCommandEvt* e) {
 if (alpha==0) {
 zMessage(app->rootWindow(),"No data file loaded","Error");
 return 1;
 }
 zPrinterDisplay *pr=new zPrinterDisplay;
 if (!pr->isValid()) {
 zMessage mess(this,"No Printer drivers installed","Unable to print");
 return 1;
 }
 if (pr->printerSetup()) {
 zDefPrJobDlg *prDlg=new zDefPrJobDlg(parent(), zResId(PRINT));
 zPrintJob *pj = new zPrintJob(this, pr, prDlg);
 pj->setJobName("HWX Print Job");
 pj->go();
 }
 return 1;
}
int Browser::print(zPrinterDisplay *pd, zRect *re) {
 zRect r;
 pushDisplay(pd);
 canvas()->lock();
 // Re-Lay out the display based on the printers display;
 canvas()->getVisible(r);
 zRect save(*sizer());
 sizer()->update(&r);
 draw(0);
 // Restore the old display layout.
 canvas()->unlock();
 sizer()->update(&save);
 popDisplay();
 return 0;
}
int Browser::changeView(zCommandEvt* e) {
 int which,cnt=0;
 if (e==0) {
 curView=0;
 which=IDM_SVIEW;
 } else which=e->cmd();

 if (which==curView alpha==0) return 1;
 selVis=0;
 curInst = 0;
 app->rootWindow()->menu()->checkItem(curView,FALSE);
 app->rootWindow()->menu()->checkItem(which,TRUE);
 if (scroller()) delete scroller();
 removeViews();
 switch (which) {
 case IDM_SVIEW:

 sizer()->append(curVis->sizer());
 has()->append(curVis);
 scroller(new BroScroller(this));
 scrollBarVert()->limits(zRange(0,alpha->totalChars()-1));
 for (cnt = 0; alpha->instance(cnt,curInst)==0;cnt++);
 changeChar(cnt,curInst,0);
 scrollBarVert()->oldpos(scrollBarVert()->pos());
 scrollBarHoriz()->pos(curInst);
 scrollBarHoriz()->oldpos(curInst);
 ip->totalc->setVal(alpha->totalChars());
 ip->fsize->setVal(alpha->fileSize());
 ip->totalseg->setVal(alpha->totalSegments());
 break;
 case IDM_IVIEW:{
 ilet=new VisLetter(this,center,new
zPercentSizer(left,100,50,sizer()),curLet,1);
 has()->append(ilet);
 grid=new GridCompound(this,sizer());
 select(grid->addLetter(alpha,curLet->ch()));
 updateStats(selVis);
 scroller(new InstanceScroller(this));
 scrollBarVert()->limits(zRange(0,alpha->totalChars()-1));
 int mnum=0;
 for (cnt = 0; cnt < curVis->letter()->ch();cnt++)
 if (alpha->instance(cnt,curInst)!=0) mnum++;
 scrollBarVert()->pos(mnum);
 scrollBarVert()->oldpos(mnum);
 break;}
 case IDM_AVIEW:
 grid=new GridCompound(this,sizer());
 select(grid->addAlphabet(alpha,curInst));
 updateStats(selVis);
 scroller(new AlphabetScroller(this));
 scroller()->respondToSize();
 }
 sizer()->update();
 canvas()->setDirty();
 curView=which;
 return 1;
}
void Browser::changeChar(int wh,int ins,int updateNow) {
 curInst=ins;
 curLet=alpha->instance(wh,ins);
 curVis->letter(curLet);
 updateStats(curVis);
 scrollBarHoriz()->limits(zRange(0,
 (alpha->letterInstances(curVis->letter()->ch())-1)));
 int mnum=0;
 for (int cnt=0; cnt < curVis->letter()->ch();cnt++)
 if (alpha->instance(cnt,curInst)!=0) mnum++;
 scrollBarVert()->pos(mnum);
 if (updateNow) {
 canvas()->setDirty();
 UpdateWindow(*this);
 }
}
void Browser::select(VisLetter *vl) {
 if (selVis!=vl) {
 if (selVis!=0) selVis->select(FALSE);
 selVis=vl;

 selVis->select(TRUE);
 if (ilet!=0) {
 ilet->letter(selVis->letter());
 updateStats(ilet);
 } else
 updateStats(selVis);
 }
}
int Browser::changeLineWidth(zCommandEvt* e) {
 AskNumForm *ask;
 ask=new AskNumForm(new zFormDlg(((HwxApp *)parent()),
 zResId("ASKNUMFORM")),curWidth);
 if (ask->completed() && ask->width()!=curWidth) {
 VisLetter::setPen(new zPen(def[IDM_LINEC-IDM_BACK],Solid,
 curWidth=ask->width()));
 canvas()->setDirty();
 }
 delete ask;
 return 1;
}
int Browser::changeColor(zCommandEvt* e) {
 zColorSelForm *tmp=new zColorSelForm(this,def[e->cmd()-IDM_BACK]);
 if (tmp->completed()) {
 def[e->cmd()-IDM_BACK]=tmp->color();
 switch (e->cmd()) {
 case IDM_BACK:
 backgroundColor(tmp->color());
 break;
 case IDM_LINEC:
 VisLetter::setPen(new zPen(tmp->color(),Solid,curWidth));
 break;
 case IDM_HILIGHTC:
 VisLetter::setHighlightColor(tmp->color());
 break;
 case IDM_LINEB:
 VisLetter::setBrush(new zBrush(tmp->color()));
 }
 canvas()->setDirty();
 }
 return 1;
}
HwxToolBar::HwxToolBar(zWindow *w,zSizer *s,DWORD d):ToolBar(w,s,d) {
 backgroundColor(zColor(LTGRAY));
 frame->append(new VisBmpButton(this,top,sizer(),IDM_OPEN,LINEI,LINEI+1));
 frame->append(new VisBmpButton(this,top,sizer(),IDM_PRINT,PRI,PRI+1));
 frame->append(new VisBmpButton(this,top,sizer(),IDM_SVIEW,SVI,SVI+1));
 frame->append(new VisBmpButton(this,top,sizer(),IDM_IVIEW,IVI,IVI+1));
 frame->append(new VisBmpButton(this,top,sizer(),IDM_AVIEW,AVI,AVI+1));
}
HwxApp::HwxApp(zWindow *w,zSizer *siz,DWORD d,const char *title)
: zAppFrame(w,siz,d,title) {
 menu(new zMenu(this,zResId("TOPMENU")));
 menu()->setCommand(this,(CommandProc)&HwxApp::fOpen,IDM_OPEN);
 setIcon(new zIcon(zResId("DRAWICON")));
 sline=new StatusLine(this,new zGravSizer(bottom,0));
 sline->show();
 sline->setupHilite();
 toolbar=new HwxToolBar(this,new zGravSizer(left,32),WS_BORDER);
 toolbar->show();

 ip = new InfoPane(this, new zGravSizer(top,zPrPoint(0,590)));
 ip->show();
 image=new Browser(this,new zGravSizer(middle),ip);
 menu()->setCommand(image,
 (CommandProc)&Browser::changeColor,IDM_BACK,IDM_LINEB);
 menu()->setCommand(image,
 (CommandProc)&Browser::changeLineWidth,IDM_LINEW);
 menu()->setCommand(image,
 (CommandProc)&Browser::changeView,IDM_SVIEW,IDM_AVIEW);
 menu()->setCommand(image,(CommandProc)&Browser::printView,IDM_PRINT);
 image->show();
}
HwxApp::~HwxApp() {
 delete sline;
 delete toolbar;
 delete image;
}
int HwxApp::fOpen(zCommandEvt*e) {
 zFileOpenForm *tmp=new zFileOpenForm(this,"Select File",0,types);
 if (tmp->completed()) {
 sline->message()->printf("Loading File...");
 setCursor(zCursor(Hourglass));
 char buf[300];
 wsprintf(buf,"zApp HWX Browser - %s",tmp->name());
 caption(buf);
 alpha=new Alphabet();
 alpha->readFile(tmp->name());
 if (alpha->isValid()) {
 image->addVertScrollBar();
 image->addHorzScrollBar();
 image->changeView();
 sline->message()->printf("File Loaded");
 } else alpha=0;
 setCursor(zCursor(Arrow));
 }
 return 1;
}
int HwxApp::focus(zFocusEvt* e) {
 if (e->gainFocus() && image!=0) image->setFocus();
 return 1;
}
int HwxApp::command(zCommandEvt*e) {
 switch (e->cmd()) {
 case IDM_EXIT:
 app->quit();
 break;
 case IDM_HELP:
 WinHelp(*this,"ddj.hlp",HELP_INDEX,0);
 break;
 case IDM_ABOUT:{
 AboutBox *ab = new AboutBox(this,new zSizer,"About Box");
 delete ab;}
 }
 return 1;
}
void zApp::main() {
 HwxApp* p=new HwxApp(0,new zSizer,zSTDFRAME,"zApp HWX Browser");
 p->show();
 go();

 delete p;
}






[LISTING THREE]

/***** ISLAND SYSTEMS IMPLEMENTATION USING OBJECT-MENU*****/

#include "omExt.h"

#include "h_config.h"
#include "h_stddef.h"
#include "h_list.h"
#include "h_grafic.h"

#include "h_om.h"


#include "omExt.h"

#include "h_config.h"
#include "h_stddef.h"
#include "h_list.h"
#include "h_grafic.h"

#include "h_om.h"

extern void LoadHWXData(LPSTR name);
extern void omDisplayInstance(lpRect R, int char_code, int instance_num,
 int shift, int boxcolor, int drawcolor);

extern void RegisterCallback(
 HDISPLAY display_entity,
 void (*pf_move_to)(HDISPLAY,INT16,INT16),
 void (*pf_line_to)(HDISPLAY,INT16,INT16));

extern void myMoveTo(HDISPLAY display_entity,INT16 h,INT16 v);
extern void myLineTo(HDISPLAY display_entity,INT16 h,INT16 v);

//----- globals
const char titlePredicate[] = "Sample File: ";
omDispHelpBlock *dispHelp;

...subordinate routines deleted...

//-----------------MAIN FUNCTION------------------

int omRunIt()
{
 //------ initialize help and icon libraries: hw.dlb, hw.ilb

 omLibMgr = new omLibMgrType("hw");
 if (omLibMgr==NULL) {omBeep(); return 1;} // enough memory?
 if (!omLibMgr) {omBeep(); return omLibMgr->lastErr;} // init ok?


 RegisterCallback(0, myMoveTo, myLineTo); // DDJ callbacks

 dispHelp = new omDispHelpBlock( 100, 4*omFontHeight, 0,0);
 dispHelp->usedNew = TRUE;
 dispHelp->pageBkColor = LIGHTGRAY;
 dispHelp->theDress = RIDGE;

 //------- FILE MENU: allocate and assign items to the "File" pulldown menu

 hwFileMenu *fileMenu = new hwFileMenu(4);
 if (fileMenu==NULL) return 1;
 fileMenu->usedNew = omTRUE;
 *fileMenu + "~Open" + "~About" + "_" + "E~xit";
 fileMenu->assignId( ID_FILEMENU );
 omIDTABLE.add(ID_FILEMENU, fileMenu);
 fileMenu->setEventHelp( "file" ); // set help prefix
 fileMenu->setDisplayHelp( dispHelp ); // attach help

 //------- EDIT MENU: allocate and assign items to the "Edit" pulldown menu

 omVertMenu *editMenu = new omVertMenu( 5 );
 if (editMenu==NULL) return 1;
 editMenu->usedNew = omTRUE;
 *editMenu + "~Cut" + DISABLE + "~Copy" + "?copy" //+ DISABLE
 + "~Paste" + DISABLE + "_" + "Undo" + DISABLE + IS_TEAROFF;
 editMenu->setEventHelp( "edit" ); // set help prefix
 editMenu->setDisplayHelp( dispHelp ); // attach help

 //------- OPTION MENU: allocate and assign items to the "Edit" pulldown menu

 hwOptMenu *optMenu = new hwOptMenu( 2 );
 if (optMenu==NULL) return 1;
 optMenu->usedNew = omTRUE;
 *optMenu + "~Open new browser !!"
 + "?open" + "~Single letter !!" + "?single";
 optMenu->setEventHelp( "optmenu" ); // set help prefix
 optMenu->setDisplayHelp( dispHelp ); // attach help

 //------- TOP MENU: allocate and assign items to Horizontal menu bar

 omHorizMenu *topMenu = new omHorizMenu( 6 );
 if (topMenu==NULL) return 1;
 topMenu->usedNew = omTRUE;
 *topMenu + "~File" + "?file" + *fileMenu
 + "~Edit" + "?edit" + *editMenu
 + "~Browse" + "?browse" + *optMenu
 + "~Printer setup" + "?printcfg" + psetup
 + "dela~yedhelp" + omIconFcnRadioDiam + changeHelpTimeOut
 + "?delayedhelp" + IS_RADIO
 + "~immediatehelp" + omIconFcnRadioDiam + changeHelpTimeOut
 + "?immedhelp";

 if ( omPrefs.helpTimeout>0 ) topMenu->autoRadioToggle(4); // delayed help
 else topMenu->autoRadioToggle(5); // immed help

 topMenu->setEventHelp( "topmenu" ); // set help prefix
 topMenu->setDisplayHelp( dispHelp ); // attach help
 topMenu->assignId(ID_TOPMENU);


 //------------ WINDOW: define the window

 omBkWindow *w = new omBkWindow(); w->bkColor = CYAN;
 if (w==NULL) return 1;
 w->usedNew = omTRUE;

 w->setTitle("Island Systems Handwriting Utility"
 " Rev 1.1Dr. Dobb's Journal, October 1992");
 w->assignId(ID_BKWINDOW);
 w->setDrawModeBk( omCOLOR );

 *w + *topMenu + omTL + *dispHelp + omBL;
 w->makeTask();

 //------------- RUN: put the window on the event Queue and run.
 *omMEQ + *w;
 omMEQ->run();

 return 0;
}






[LISTING FOUR]

/****** LIANT ******/

#include "about.h"
#include "brush.h"
#include "bttnrbbn.h"
#include "button.h"
#include "charview.h"
#include "file.h"
#include "fileslct.h"
#include "font.h"
#include "hwxappvw.h"
#include "notifier.h"
#include "objarray.h"
#include "pen.h"
#include "pointrry.h"
#include "popupmen.h"
#include "port.h"
#include "printer.h"
#include "printdlg.h"
#include "printab.h"
#include "statusbr.h"

#include <stdio.h> // for sprintf()

#define BR_HT 40 // height of the Button Bar window
#define STAT_HT 24 // height of the Status Bar window

defineClass(HwxAppView, VAppView)

/* menu definition (could also come from a resource) */
char *file[] = { "&Open...", "",

 "&Print...", "",
 "&About...", "",
 "E&xit", 0
};
method fmthds[] = { methodOf(HwxAppView, openFile),NIL_METHOD,
 methodOf(HwxAppView, print),NIL_METHOD,
 methodOf(HwxAppView, about),NIL_METHOD,
 methodOf(HwxAppView, closeApp),NIL_METHOD
};
char *display[] = { "&Style...", "",
 "&Instance",
 "&Alphabet",
 0
};
method dmthds[] = { methodOf(HwxAppView, setDisplayStyle),NIL_METHOD,
 methodOf(HwxAppView, showByInstance),
 methodOf(HwxAppView, showByAlphabet),NIL_METHOD
};
/* icon bar definition */
char *bttns[] = { "FILEBOX","PRINTER","PALETTE","ALPHA",0};
method bMthds[] = {
 methodOf(HwxAppView, btnOpen),
 methodOf(HwxAppView, btnPrint),
 methodOf(HwxAppView, btnStyle),
 methodOf(HwxAppView, btnAlpha),
 NIL_METHOD
};
HwxAppView::HwxAppView()
{
 /* set title and background brush for main window */
 setTitle("Liant HWX: (no file loaded)");
 setBackground(new VBrush(BLUE));
 /* create the icon bar */
 bttnRbbn = new ButtonRibbon(VFrame(0, 0, 1.0F, BR_HT),
 this, this, bttns, bMthds);
 /* create a status bar */
 int x, y, w, h;
 getArea(&x, &y, &w, &h);
 statBar = new StatusBar(VFrame(0, h - STAT_HT, 1.0F, STAT_HT), this);
 statBar->putText("HWX Browser");
 statBar->setFont(new VFont("TMS RMN", 10));
 /* create the pull down menu (using arrays defined above) */
 VPopupMenu *pm = new VPopupMenu("&File");
 addPopup(pm);
 pm->addItems(file, fmthds, this);
 displayMenu = new VPopupMenu("&Display");
 addPopup(displayMenu);
 displayMenu->addItems(display, dmthds, this);
 displayMenu->enableAll(FALSE);
 displayMenu->checkAt(TRUE, 2);
 /* create the single character view sub-window */
 charView = new CharacterView(VFrame(10, 50, 100, 100), this);
 /* create the multi-instance view sub-window */
 multiView = new MultiCellView(VFrame(130, 50, 7 * 60, 200), this);
 multiView->setDims(6, 13, 30, 30);
 multiView->uponClick(this, methodOf(HwxAppView, multiSelect));
 /* create the character selection sub-window */
 selectView = new AlphaCellView(VFrame(10, 300,
 17 * 32 + 6, 17 * 3 + 6), this);

 selectView->setDims(3, 32, 17, 17);
 selectView->uponClick(this, methodOf(HwxAppView, charSelect));
 /* create pen for drawing single character instance */
 charPen = new VPen();
 charData = 0;
 byInstance = TRUE;
 setCharacter(48, 0);
}
HwxAppView::~HwxAppView()
{
 /* destroy status bar font */
 VFont *f = statBar->getFont();
 if (f) {
 f->free();
 }
 /* destroy background brush */
 if (getBackground()) {
 delete getBackground();
 }
 /* destroy character pen */
 if (charPen) {
 delete charPen;
 }
 /* destroy associated character data */
 if (charData) {
 delete charData;
 }
}
boolean HwxAppView::free()
{
 delete this;
 return(TRUE);
}
boolean HwxAppView::paint()
{
 VPort port(this);
 int x, y, w, h;
 getArea(&x, &y, &w, &h);
 port.open();
 /* draw a separator above the staus bar */
 port.moveTo(0, h - STAT_HT - 1);
 port.lineTo(w, h - STAT_HT - 1);
 port.close();
 return(TRUE);
}
boolean HwxAppView::resized(int w,int h) // called when window changes size.
{
 // resize the ButtonRibbon
 bttnRbbn->resize(w, BR_HT);
 // move the StatusBar up to bottom of MDI client
 statBar->move(0, h - STAT_HT, w, STAT_HT);
 return(TRUE);
}
boolean HwxAppView::openFile(VMenuItem *m) // Load a new data file
{
 VString *temp;
 VString filter("*.dat");
 char str[100];
 temp = VFileSelect::dialog(this, "Select Data File", NIL, &filter);

 if (temp) {
 notifier->beginWait();
 statBar->putText("Loading File...");
 VFile f(temp->gets());
 if (f.open(ReadOnly)) {
 if (charData != 0) {
 delete charData;
 charData = 0;
 }
 charData = new CharacterData(f);
 f.close();
 /* update window title */
 sprintf(str, "Liant HWX: (%s)", temp->gets());
 setTitle(str);
 /* enable display menu */
 displayMenu->enableAll(TRUE);
 }
 notifier->endWait();
 setCharacter(48, 0);
 }
 return(TRUE);
}
boolean HwxAppView::closeApp(VMenuItem *m) // Close the application
{
 return(VAppView::close());
}
boolean HwxAppView::print(VMenuItem *m) // Print the current character.
{
 VPrinter *printer = 0;
 /* get a printer object via the printer dialog */
 boolean ret = VPrintDialog::print(NIL, &printer, this);
 if (!ret) {
 statBar->putText("Printer not ready.");
 }
 else {
 /* display the print abort dialog */
 VPrintAbort *pab = new VPrintAbort("HWX",
 "Printing character.", printer, this);
 /* open a port on the printer */
 VPort port(printer);
 VRectangle rect(CornerDim, 50, 50, 300, 300);
 port.open();
 /* draw the character on the printer */
 charView->drawCharacter(&port, &rect, 5.0);
 // printer->newPage();
 port.close();
 pab->free();
 printer->free();
 }
 return(TRUE);
}
boolean HwxAppView::about(VMenuItem *m) // Display the About box
{
 VAbout::dialog("Liant HWX Browser\n\n) 1992 by Liant Software Corp."
 "\nWritten by Joe DeSantis\n"
 "\nDeveloped with C++/Views 2.0", this);
 return(TRUE);
}
boolean HwxAppView::setDisplayStyle(VMenuItem *m)

/* Put up a dialog, and set the width and color of the character display. */
{
 int width;
 rgbColor color;
 width = charPen->width();
 color = charPen->color();
 DisplayDialog::dialog(this, &width, &color);
 charPen->width(width);
 charPen->color(color);
 charView->update();
 return(TRUE);
}
boolean HwxAppView::showByAlphabet(VMenuItem *m)
{
 displayMenu->checkAt(FALSE, 2);
 displayMenu->checkAt(TRUE, 3);
 setByInstance(FALSE);
 return(TRUE);
}
boolean HwxAppView::showByInstance(VMenuItem *m)
{
 displayMenu->checkAt(TRUE, 2);
 displayMenu->checkAt(FALSE, 3);
 setByInstance(TRUE);
 return(TRUE);
}
void HwxAppView::setCharacter(int cno, int ino)
{ // Set the current character/instance displayed.
 char str[100];
 currChar = cno;
 currInst = ino;
 int icount = (charData) ? charData->getInstanceCount(currChar) : 0;
 /* update status bar */
 sprintf(str, "Character %d, Instance %d out of %d",
 currChar, currInst, icount);
 statBar->putText(str);
 /* update views */
 charView->setCharData(charData, currChar, currInst);
 multiView->setCharData(charData, currChar, currInst);
 selectView->setCharData(currChar, currInst);
}
void HwxAppView::charSelect(long row, long col)
{ // AlphaCellView box clicked in cell row, col
 if (byInstance) { /* view a new character */
 setCharacter((int) ((row + 1) * 32 + col), currInst);
 }
 else { /* view a new instance */
 setCharacter(currChar, (int) (row * 16 + col));
 }
}
void HwxAppView::multiSelect(long row, long col)
{ // MultiCellView box clicked in cell row, col
 if (byInstance) {
 /* view a new instance */
 setCharacter(currChar, (int) (row * 13 + col));
 }
 else {
 /* view a new character */
 setCharacter((int) (row * 13 + col + 48), currInst);

 }
}
void HwxAppView::setByInstance(boolean tf)
/* Specifies whether to display all instances of a single character,
 or an alphabet of a specific instance. */
{
 if (byInstance != tf) {
 byInstance = tf;
 if (tf) {
 selectView->setDims(3, 32, 17, 17);
 }
 else {
 selectView->setDims(3, 16, 34, 17);
 }
 multiView->setByInstance(tf);
 selectView->setByInstance(tf);
 setCharacter(48, 0);
 selectView->update();
 }
}
boolean HwxAppView::btnOpen(VButton *b) // Callback for file icon
{
 openFile(NIL);
 return(TRUE);
}
boolean HwxAppView::btnPrint(VButton *b) // Callback for print icon
{
 print(NIL);
 return(TRUE);
}
boolean HwxAppView::btnStyle(VButton *b) // Callback for style icon
{
 setDisplayStyle(NIL);
 return(TRUE);
}
boolean HwxAppView::btnAlpha(VButton *b)// Callback for alpha/instance icon
{
 if (byInstance) {
 showByAlphabet(NIL);
 }
 else {
 showByInstance(NIL);
 }
 return(TRUE);
}






[LISTING FIVE]

/****** MICROSOFT ******/

//------ CHwbDlg: The main user interface to this application is a dialog
class CHwbDlg : public CModalDialog
{
// Constructors

public:
 CHwbDlg();
 ~CHwbDlg();
// Attributes
 // global state of browser
 COLORREF m_colorCur;
 int m_nLineWidth;
 // current state (based on selection in .DAT file)
 BOOL m_bLoaded; // .DAT file loaded
 char m_chCur; // current character
 int m_nInstanceCount; // number of instance of this character
 int m_iInstanceCur; // current instance of that character
// Operations
 void RenderInstance(CDC* pDC, char ch, int iInstance,
 CRect rect, int inflate, BOOL bAttrib);
// Implementation
 CImageBox m_charBox; // all possible characters
protected:
 // brushes for drawing
 CBrush m_brBack, m_brGray;
 CFont m_statusFont;
 // special controls
 CBitmapButton m_buttons[NUM_BITMAPBUTTONS];
 CImageBox m_imageBox; // image selection
 CStatic m_display; // display output control
 CScrollBar m_scrollBar;
 CRect GetOutputRect();
 // message handlers
 virtual BOOL OnInitDialog();
 virtual BOOL OnCommand(WPARAM wParam, LPARAM lParam);
 afx_msg void OnPaint();
 afx_msg void OnHScroll(UINT nSBCode, UINT nPos, CScrollBar*);
 afx_msg void OnMenuSelect(UINT nItemID, UINT nFlags, HMENU hSysMenu);
 // command handlers
 afx_msg void OnOpen();
 afx_msg void OnCopy();
 afx_msg void OnPrint();
 afx_msg void OnAbout();
 afx_msg void OnPrevChar();
 afx_msg void OnNextChar();
 afx_msg void OnLineColor();
 // control notification handlers
 afx_msg void OnInstanceSelect();
 afx_msg void OnCharSelect();
#ifdef _DEBUG
 afx_msg void OnShowStrokes();
#endif
 DECLARE_MESSAGE_MAP()
};
// Message map defines messages handled by our main window
BEGIN_MESSAGE_MAP(CHwbDlg, CModalDialog)
 ON_WM_PAINT()
 ON_WM_HSCROLL()
 ON_WM_MENUSELECT()
 ON_COMMAND(IDM_OPEN, OnOpen)
 ON_COMMAND(IDM_COPY, OnCopy)
 ON_COMMAND(IDM_PRINT, OnPrint)
 ON_COMMAND(IDM_ABOUT, OnAbout)
 ON_COMMAND(IDC_PREV, OnPrevChar)

 ON_COMMAND(IDC_NEXT, OnNextChar)
 ON_COMMAND(IDM_LINE_COLOR, OnLineColor)
#ifdef _DEBUG // debug commands
 ON_COMMAND(IDM_DEBUG_SHOW_STROKES, OnShowStrokes)
#endif //_DEBUG
 ON_CBN_SELCHANGE(IDC_INSTANCE_BOX, OnInstanceSelect)
 ON_CBN_SELCHANGE(IDC_ALLCHARS_BOX, OnCharSelect)
END_MESSAGE_MAP()
//-----------------------------------------------------------------
CHwbDlg::CHwbDlg()
 : CModalDialog("MAINDIALOG"),
 m_brBack(::GetSysColor(COLOR_WINDOW)),
 m_brGray(RGB(128, 128, 128))
{
 m_bLoaded = FALSE;
 m_colorCur = RGB(0, 0, 0); // black to start with
 m_nLineWidth = 1;
}
//-----------------------------------------------------------------
CHwbDlg::~CHwbDlg()
{
 if (m_bLoaded)
 ::UnloadHWXData(); // free old data file
}
//-----------------------------------------------------------------
BOOL CHwbDlg::OnInitDialog()
{
 // initialize bitmap buttons
 for (int i = 0; i < NUM_BITMAPBUTTONS; i++)
 VERIFY(m_buttons[i].AutoLoad(IDC_BITMAPBUTTON_MIN + i, this));
 // wire in the dialog controls
 VERIFY(m_imageBox.SubclassDlgItem(IDC_INSTANCE_BOX, this));
 m_imageBox.m_pParent = this;
 VERIFY(m_charBox.SubclassDlgItem(IDC_ALLCHARS_BOX, this));
 m_charBox.m_pParent = this;
 VERIFY(m_scrollBar.SubclassDlgItem(IDC_SCROLLBAR, this));
 // select a lighter font for the status bar
 LOGFONT logfont;
 GetFont()->GetObject(sizeof(logfont), &logfont);
 logfont.lfWeight = FW_NORMAL;
 VERIFY(m_statusFont.CreateFontIndirect(&logfont));
 GetDlgItem(IDC_STATUS)->SetFont(&m_statusFont);
 OnOpen(); // require an open data file
 if (!m_bLoaded)
 EndDialog(IDCANCEL); // abort
 return TRUE;
}
//-----------------------------------------------------------------
void CHwbDlg::OnOpen()
{
 CFileDialog dlg(TRUE, "dat", NULL, OFN_FILEMUSTEXIST OFN_HIDEREADONLY,
 "Data Files (*.dat)*.datAll Files (*.*)*.*");
 // constructor for standard file open dialog
 if (dlg.DoModal() != IDOK)
 return; // stay with old data file
 HCURSOR hCurs = ::SetCursor(::LoadCursor(NULL, IDC_WAIT));
 // set the wait cursor
 if (m_bLoaded)
 ::UnloadHWXData(); // free old data file

 // open new data file
 ::LoadHWXData(dlg.GetPathName());
 m_bLoaded = TRUE;
 // fill listbox with all valid characters using exported Dr. Dobbs APIs
 m_charBox.ResetContent();
 for (char ch = 0; ch <= 126; ch++) // simple ASCII range
 {
 if (::GetInstanceCount(ch) == 0)
 continue; // skip characters with no instances
 int nIndex = m_charBox.AddString(MAKEINTRESOURCE(ch));
 if (ch == 'A')
 m_charBox.SetCurSel(nIndex); // initial selection
 }
 OnCharSelect(); // update instances
 ::SetCursor(hCurs); // change cursor back to standard
}
//-----------------------------------------------------------------
void CHwbDlg::OnCopy()
{
 // copy current image to clipboard
 if (!OpenClipboard())
 return;
 CMetaFileDC dc;
 if (dc.Create())
 {
 RenderInstance(&dc, m_chCur, m_iInstanceCur, CRect(0,0,0,0), 0, TRUE);
 HGLOBAL hData = ::GlobalAlloc(GPTR, sizeof(METAFILEPICT));
 if (hData != NULL)
 {
 LPMETAFILEPICT lpMFP = (LPMETAFILEPICT)::GlobalLock(hData);
 lpMFP->mm = MM_ANISOTROPIC;
 lpMFP->hMF = dc.Close();
 VERIFY(EmptyClipboard());
 ::GlobalUnlock(hData);
 SetClipboardData(CF_METAFILEPICT, hData);
 }
 VERIFY(CloseClipboard());
 }
}
//-----------------------------------------------------------------
void CHwbDlg::OnPrint()
{
 // print the current image
 CPrintDialog dlg(FALSE, PD_NOPAGENUMS PD_NOSELECTION);
 if (dlg.DoModal() != IDOK)
 return;
 HCURSOR hCurs = ::SetCursor(::LoadCursor(NULL, IDC_WAIT));
 // set the wait cursor
 CDC dc;
 dc.Attach(dlg.GetPrinterDC());
 CRect rect;
 dc.GetClipBox(rect);
 dc.StartDoc("HandWriting");
 dc.StartPage();
 RenderInstance(&dc, m_chCur, m_iInstanceCur, rect, 0, FALSE);
 dc.EndPage();
 dc.EndDoc();
 ::SetCursor(hCurs); // return to standard cursor
}

void CHwbDlg::OnAbout()
{
 CModalDialog dlg("ABOUTDIALOG");
 dlg.DoModal();
}
void CHwbDlg::OnNextChar()
{
 m_charBox.SetCurSel(m_charBox.GetCurSel() + 1);
 OnCharSelect();
}
void CHwbDlg::OnPrevChar()
{
 m_charBox.SetCurSel(m_charBox.GetCurSel() - 1);
 OnCharSelect();
}
void CHwbDlg::OnCharSelect()
{
 ASSERT(m_bLoaded);
 if (m_charBox.GetCurSel() == -1)
 m_charBox.SetCurSel(0); // select first one
 m_chCur = (char)m_charBox.GetItemData(m_charBox.GetCurSel());
 // update the character indicator and box
 SetDlgItemText(IDC_CHARACTER, CString(m_chCur));
 m_nInstanceCount = ::GetInstanceCount(m_chCur);
 m_scrollBar.SetScrollRange(0, m_nInstanceCount-1);
 // update the list of possible instances
 m_imageBox.ResetContent();
 for (int i = 0; i < m_nInstanceCount; i++)
 m_imageBox.InsertString(i, NULL); // data not used
 m_imageBox.SetCurSel(m_iInstanceCur = 0); // start at first one
 OnInstanceSelect(); // initial update
}
void CHwbDlg::OnInstanceSelect()
{
 int iSel = m_imageBox.GetCurSel();
 if (iSel != -1)
 {
 ASSERT(iSel >= 0 && iSel < m_nInstanceCount);
 m_iInstanceCur = iSel;
 }
 // update output
 ASSERT(m_bLoaded);
 ASSERT(m_iInstanceCur >= 0 && m_iInstanceCur < m_nInstanceCount);
 m_scrollBar.SetScrollPos(m_iInstanceCur);
 InvalidateRect(GetOutputRect(), FALSE); // will redraw at paint time
}
//-------- Encapsulated drawing code
CRect CHwbDlg::GetOutputRect()
{
 CRect rect;
 GetDlgItem(IDC_OUTPUT)->GetWindowRect(rect);
 ScreenToClient(rect);
 rect.InflateRect(-2, -2); // leave border alone
 return rect;
}
//----- functions to map Dr Dobbs API calls to CDC member functions
static void _MoveTo(CDC* pDC, int x, int y)
 { pDC->MoveTo(x, y); }
static void _LineTo(CDC* pDC, int x, int y)

 { pDC->LineTo(x, y); }
//------- Render a character in the DC passed in. All drawing takes place here
void CHwbDlg::RenderInstance(CDC* pDC, char ch, int iInstance,
 CRect rect, int inflate, BOOL bAttrib)
{
 // fill in background
 if (inflate)
 pDC->FillRect(rect, &m_brBack);
 if (iInstance < 0)
 return; // nothing more to do
 rect.InflateRect(inflate, inflate);
 if (inflate != 0 && bAttrib)
 pDC->FrameRect(rect, &m_brGray);
 pDC->SaveDC();
 // draw scaled to fit with optional attributes
 CPen pen(PS_INSIDEFRAME, m_nLineWidth*2+1, m_colorCur);
 if (bAttrib)
 pDC->SelectObject(&pen);
 pDC->SetMapMode(MM_ANISOTROPIC);
 if (!rect.IsRectEmpty())
 {
 pDC->SetViewportExt(rect.Size());
 pDC->SetViewportOrg(rect.TopLeft());
 }
 CRect rect2 = ::CalcInstanceBoundingBox(ch, iInstance);
 pDC->SetWindowExt(rect2.Size());
 pDC->SetWindowOrg(rect2.TopLeft());
 ::RegisterCallback(pDC, _MoveTo, _LineTo);
 ::DisplayInstance(ch, iInstance);
 pDC->RestoreDC(-1);
}
void CHwbDlg::OnPaint()
{
 CPaintDC dc(this);
 RenderInstance(&dc, m_chCur, m_iInstanceCur, GetOutputRect(), -10, TRUE);
}
void CHwbDlg::OnHScroll(UINT nSBCode, UINT nPos, CScrollBar*)
{
 static int deltas[] = { -1, 1, -1, 1, 0, 0, -1000, 1000, 0 };
 int iSel = nPos;
 if (nSBCode != SB_THUMBTRACK && nSBCode != SB_THUMBPOSITION)
 iSel = m_iInstanceCur + deltas[nSBCode];
 if (iSel < 0)
 iSel = 0;
 else if (iSel >= m_nInstanceCount)
 iSel = m_nInstanceCount-1;
 m_imageBox.SetCurSel(iSel);
 OnInstanceSelect();
}
// menu prompt string tracking
void CHwbDlg::OnMenuSelect(UINT nItemID, UINT nFlags, HMENU)
{
 CString strPrompt;
 if (nItemID != 0 &&
 !(nFlags & (MF_SEPARATORMF_POPUPMF_MENUBREAKMF_MENUBARBREAK)))
 strPrompt.LoadString(nItemID);
 SetDlgItemText(IDC_STATUS, strPrompt);
}
// Line width: handle with one function for efficiency for a range of commands

BOOL CHwbDlg::OnCommand(WPARAM wParam, LPARAM lParam)
{
 int nWidth = LOWORD(wParam) - IDM_LINE_WIDTH_BASE;
 if (nWidth < 0 nWidth >= MAX_LINE_WIDTH)
 return CModalDialog::OnCommand(wParam, lParam); // not line width
 m_nLineWidth = nWidth;
 // update the menu
 for (int i = 0; i < MAX_LINE_WIDTH; i++)
 GetMenu()->CheckMenuItem(IDM_LINE_WIDTH_BASE + i,
 (i == m_nLineWidth) ? MF_CHECKED : MF_UNCHECKED);
 InvalidateRect(GetOutputRect(), FALSE); // redraw
 return TRUE;
}
void CHwbDlg::OnLineColor()
{
 CColorDialog dlg(m_colorCur);
 if (dlg.DoModal() != IDOK)
 return;
 m_colorCur = dlg.GetColor();
 InvalidateRect(GetOutputRect(), FALSE);
}









































October, 1992
OBJECT-ORIENTED PROGRAM CONSTRUCTION


Interconnecting components is as important as fabricating them




William G. Wong


Bill was formerly director of PC Labs. Currently he is with Nu-Mega
Technologies, where he is working on a new version of Soft-ICE. You can reach
him on CompuServe at 71561,2502.


Object-oriented programming brings us encapsulation, inheritance,
polymorphism, and the promise of reusable software. Although OO languages make
creation of building blocks easier, at present blocks are connected in the
same fashion as in conventional programming languages. For example,
application frameworks such as Microsoft's Foundation Classes (MFC) or
Borland's ObjectWindows Library (OWL) provide numerous building blocks so you
don't have to fabricate window classes from scratch. However, when you connect
objects together, you must use the pointers and functions unique to a
particular class.
Program construction consists of both component construction (class
definition) and component interconnection. I've invented a concept which I
call plugs to address the issue of component interconnection. Plugs provide a
standard syntax and semantics for connecting disparate components. Plugs are
an object class used in conjunction with conventional classes.
Plug objects work like their electrical counterparts. They are "keyed" so
that, as in the physical counterpart, you can plug a keyboard connector into
the back of a PC but not into the power connector. Likewise, the power
connector can be plugged into a wall socket but not into the monitor's video
connector. A plug class can provide a simple connection like a single wire or
multiple connections with a single plug like a VGA cable.
The act of cabling together a PC and its components is analogous to building a
simple computer program. The difference is that a conventional program makes
the connections between its components using a variety of function calls and
pointer assignments. The plug classes make this process consistent. For
example, assume we have object classes defined for a power outlet, a PC, a
monitor, and a keyboard. The process of connecting together these objects
should be simple, as shown in Example 1(a).
Example 1: (a) The plugs approach to interconnection; (b) the conventional
approach to interconnection.

 (a)

 void BuildAPC ()
 {

 PowerOutlet po1, po2 ;
 PC pc ;
 Monitor monitor ;

 po1 <= pc. PowerConnector ;
 po2 <= monitor.PowerConnector ;

 monitor.VideoConnector <= pc.VideoConnector ;

 monitor.TurnOn () ;
 pc.TurnOn () ;

 }

 (b)

 void BuildAPC ()
 {

 PowerOutlet po1, po2 ;
 PC pc ;
 Monitor monitor ;

 pc. PowerConnector.Connect ( & po1 ) ;
 monitor.PowerConnector.Connect ( & po2 ) ;

 monitor.VideoConnector.Connect ( & pc. VideoConnector ) ;

 monitor.TurnOn () ;
 pc.TurnOn () ;
 }


Example 1(a) makes a number of assumptions. First, the PowerOutlet objects are
plugs, and the Monitor and PC objects both have two plugs: one for power and
the other for video support. The <= method is used to connect two plugs of the
appropriate type together. By definition, it does not matter which plug is on
the left side of the operator. The TurnOn method is used to show how a set of
connected objects might be put into operation.
But how might this be accomplished using conventional programs? Again we make
some assumptions about the objects, so the code would look something like
Example 1(b).
The Connect methods are used with pointer parameters. The effect might be
similar, but implications are different: The use of pointers provides only a
one-way link between the objects; plugs actually provide a two-way link. Also,
the choice of Connect for a function name is arbitrary. Function names in
dissimilar classes tend to be very different.
Bringing consistency to the interconnection aspect of program construction is
more than just syntactic sugar. It means that program construction can be
brought to a higher level, where a programmer does not have to be concerned
with the details of an object but rather with its connections and high-level
functions. The details of how a connection is used is unimportant to the
programmer, just as the type of signals and the amount of current flowing
through the physical PC cables are unimportant to the PC user.
Before going further, I should discuss the basic plug methods. Their number is
limited and should not be expanded upon lest the class lose its simplicity and
consistency. Table 1 lists the principal methods and operators, and Example 2
shows how some of these are used.
Example 2: How to use the basic methods and operators.

 BOOL t ;
 plugA a, b;

 t = a.connected () ; // is A connected?
 t = b.usable () ; // is B usable?

 a <= b ; // plug B into A
 a -> VariableInB = 1; // set VariableInB to 1

 - a ; // disconnect but don't free other end
 -- b ; // disconnect and free other end

Table 1: The principal methods and operators used with plugs.

 Method What it does
 -------------------------------------------------------------------------

 A<=B Connects A to B. First disconnects them by
 invoking --A and --B.

 -A Disconnects A, returns otherEnd. Does not
 free otherEnd.

 --A Disconnects A. This method does free otherEnd.

 A.connected() Returns O if A is disconnected.

 A.usable() Returns O if A is not usable (that is, uninitialized
 redirector).

 a->V Accesses member variable V of object at other end of A.
 Normally assumes something is at the other end.

 R=& A Changes redirector R's reference to A. Returns prior
 reference.

 A.addBefore(B,C) Connects A to B, and C to whatever A was connected to.
 If A has a redirector, then it points to C.

 A.addAfter (B,C) Like addBefore. If A has a redirector, it does not
 change.

Table 1 shows the six principal functions used with plugs. Other functions
used for internal operations are discussed later. The connected and usable
functions return status information about a plug. A plug is always usable. The
function is included and will be addressed later in the discussion covering
redirectors, an adjunct to plugs. We have already discussed the <= connection
method. The element-reference method-> provides access to elements at the
other end of a connection, so in a sense, a plug operates like a conventional
pointer.
The minus (-) and double-minus (--) operators are used to disconnect a plug.
The minus operator indicates that the plug at the other end will be used
again, whereas the double-minus operator tells the plug at the other end it
will no longer be needed. This is important for dynamically allocated plug
objects that must be deleted automatically. Interestingly, it does not matter
which end is disconnected first, since the other end is notified and
disconnected at the same time. There are no dangling pointers when dealing
with plug connections.


Redirectors


Plugs are very useful by themselves. However, the concept of redirector plug
classes can further enhance the program construction process. A redirector for
a particular plug class looks and acts like a plug, but connections are made
to a plug referenced by the redirector. For example, given a and b, where a is
of type plugA and b is of type plugB, and given ra where ra is of type
plugRedirectorA, the statements ra=&a; b<=ra; will connect plugs a and b
together.
This approach is more efficient than connecting b to ra and forwarding all
requests to a through ra. Disconnecting ra disconnects a, but the redirection
remains in effect. A redirector is essentially turned off by assigning a Null
value. The connected function reflects the state of the redirected plug, while
the usable function reflects the state of the redirector (that is, it returns
False when the redirector is set to Null).
Redirecting a redirector has the expected effect of allowing connections to
occur elsewhere, while remaining efficient once a connection is made.
Redirectors can be on either side of a connection operator, and there is no
limit to the number of redirectors in a chain.

Redirectors are very important when creating components from other components.
Typically, the other components have public plugs or redirectors referenced by
the outer components' redirectors. Connections to the other components'
redirectors efficiently connect to the inner components.


The Source Code


Listing One (page 116) presents PLUGS.H, which defines the macros used to
implement plug classes. Listing Two (page 118) contains PLUGS.CPP, which
includes both connection definitions and corresponding function definitions.
Simple plug operations are also shown. The program's main purpose is to
demonstrate the various ways that plugs can be used.
I did not use C++ templates in my implementation, so the code will work with
C++ compilers that lack template support (like Microsoft C7). The macros allow
plugs to be type-safe so that only one type of plug can be connected to
another. Plug class definitions are normally included in header files used to
define the base classes used with a plug class.
The statement DECLARE_CONNECTION (plugX, plugXBase, plugY, plugYBase) declares
a pair of abstract plug classes, plugX and plugY, and a pair of abstract
redirector classes. It assumes that the base classes, plugXBase and plugYBase,
are already defined. The base classes contain the functions and variables to
be shared when a connection is made.
You need to declare at least two plug classes, one for each end, to make a
connection. If you need polymorphic plugs, just declare another plug class.
Redirectors are defined in the same manner, but typically only one class will
be defined per end. For example, the statement DECLARE_PLUG (aPlugX, plugX)
followed by DECLARE_REDIRECTOR (plugXRedirector, plugX) accomplishes this
task.
Plug-function definitions are done using the IMPLEMENT_PLUG macro. It defines
all but two functions declared for every plug; see Example 3.
Example 3: Defining plugs by invoking the IMPLEMENT_PLUG macro.

 IMPLEMENT_PLUG ( aPlugX, plugX )

 void aPlugX::afterConnect ()

 {
 // operations performed after an initial connection are placed here
 }

 void aPlugX::beforeDisconnect ()
 {
 // operations performed before a connection is broken are placed here
 }

The afterConnect function is called for each plug after a successful
connection is made. The order is not specified. These functions allow a plug
to initialize the other end accordingly. For example, a plug with internal
plug references would make these connections after the main plug is connected.
The beforeDisconnect function is also called for each plug, but just before a
connection is broken. This is where plugs connected by the afterConnection
function would be disconnected. Other plug functions can be redefined by
subclassing a plug class.
You can redefine a redirector without additional definitions by invoking the
redirector implementation macro, as in: IMPLEMENT_REDIRECTOR (plugXRedirector,
plugX). Redirectors need only be defined as needed.


Program Construction


Building a program using plugs assumes that the appropriate components are
either available or can easily be built from existing components. You create
and connect the various components together in a manner that parallels program
data-flow designs. You then initiate operations in the constructed program by
invoking appropriate methods. Connections can be dynamically made and broken
and pluggable components created and destroyed as necessary.
Program construction using plugs is both powerful and simple. A single
connection might interconnect two complex components in a variety of ways.
Pluggable objects also need to be viewed at two levels: use and
implementation. A programmer using a plug needs to know what kind of plug is
at the other end, the general effects of making a connection, and how to use
high-level functions provided by the base classes.
The plug implementor must know about connection initialization, termination,
and what really happens once a connection is made. You can program at either
or both levels. You may even use one set of components while creating another.
With plugs, the fewer changes you make to a class, the better. This is in
contrast to the subclassing that occurs with class libraries. The latter are
used to build new components, while plugs are used to build compound
components.


Other Topics


There are many more aspects to the use and implementation of plugs than can be
covered in this article (which is why I am writing a book on the subject).
Just to give you an idea of what can be done with plugs, consider plugs that
allow multiple connections at one time or plugs that can generate new objects
for an actual connection and are coordinated in some fashion. The program in
Listing Two shows how a plug object can dynamically create new plug objects
when a connection is made.
Another aspect of plugs not thoroughly addressed here is their two-sided
nature and what can be done with it. In particular, plugs provide the ability
to insert new components into an existing structure, as well as the ability to
remove a component by disconnecting all the appropriate connections. Why would
you want to do this? How about inserting a debugging component within an
existing system or replacing a broken component with a working one? With this
approach it would be possible for an object library to be supplied, without
source, and enhanced by changing the internals. This is normally impractical.
The addBefore and addAfter functions are included in the source code examples,
but a thorough explanation of them is beyond the scope of this article.
_OBJECT-ORIENTED PROGRAM CONSTRUCTION_
by William Wong


[LISTING ONE]

//***** PLUGS.H -- by William G. Wong, Copyright (c) June 1992 *****

#if !defined(_PLUGS_H_)
#define _PLUGS_H_

// ---- Plug Class Definition Macros ----
// DECLARE_CONNECTION(abstractPlug1,base1,abstractPlug2,base2)
// DECLARE_PLUG(plug,abstractPlug)
// DECLARE_REDIRECTOR(redirector,abstractPlug)
//

// ---- Plug Class Implementation Macros ----
// IMPLEMENT_PLUG(plug)
// void plug::afterConnect () {}
// void plug::beforeDisconnect () {}
// IMPLEMENT_SIMPLE_PLUG(plug)
// IMPLEMENT_REDIRECTOR(redirector)
//
// ---- plugCheck alternate definition ----
// Define a check macro if you want to check for invalid access using ->.

#if !defined(plugCheck)
#define plugCheck(ptr)
#endif

// ---- Simple plug macro definitions ----
#define IMPLEMENT_SIMPLE_PLUG(c1,p1)\
IMPLEMENT_PLUG(c1,p1)\
void c1::afterConnect () {}\
void c1::beforeDisconnect () {}

#if !defined(NULL)
#define NULL ((void *)0)
#endif
#if !defined(FALSE)
#define FALSE 0
#endif
#if !defined(TRUE)
#define TRUE 1
#endif

#define DECLARE_CONNECTION(p1,b1,p2,b2)\
class p1##HiddenRedirector;\
class p2##HiddenRedirector;\
class p2;\
DECLARE_ABSTRACT_PLUG(p1,b1,p1##HiddenRedirector,p2)\
DECLARE_ABSTRACT_PLUG(p2,b2,p2##HiddenRedirector,p1)\
DECLARE_ABSTRACT_REDIRECTOR(p1##HiddenRedirector,p1)\
DECLARE_ABSTRACT_REDIRECTOR(p2##HiddenRedirector,p2)

#define DECLARE_ABSTRACT_PLUG(p1,b1,r1,p2)\
typedef r1 rd##p1 ;\
typedef p2 px##p1 ;\
class p1 : public b1\
{\
 friend class r1;\
 friend class p2;\
 public:\
 virtual ~p1 () {-- * this;}\
 virtual int isRedirector () = 0 ;\
 virtual int connected () = 0 ;\
 virtual int usable () = 0 ;\
 virtual void operator <= (px##p1 & newEnd) = 0 ;\
 virtual px##p1 * operator -> () = 0 ;\
 virtual px##p1 * operator - () = 0 ;\
 virtual void operator -- () = 0 ;\
 virtual void afterConnect () = 0 ;\
 virtual void beforeDisconnect () = 0 ;\
 virtual void addBefore ( px##p1 * inner, p1 * newEnd ) = 0 ;\
 virtual void addAfter ( px##p1 * inner, p1 * newEnd ) = 0 ;\

 virtual rd##p1 * getRedirector() = 0 ;\
 virtual void setRedirector ( rd##p1 * newRedirector ) = 0 ;\
 virtual p1 * connectedto ( px##p1 * newEnd ) = 0 ;\
 virtual void disconnectedFree () = 0 ;\
 virtual void disconnected () = 0 ;\
};

#define DECLARE_ABSTRACT_REDIRECTOR(r1,p1)\
class r1 : public p1\
{\
 public:\
 virtual p1 * operator = ( p1 * newPlug ) = 0 ;\
 virtual void resetRedirection () = 0 ;\
};

#define DECLARE_PLUG(c1,p1)\
class c1 : public p1\
{\
 public:\
 c1 () {myPlug=NULL;myRedirector=NULL;}\
 virtual ~c1 () {-- (* this);}\
 virtual int isRedirector () {return TRUE;}\
 virtual int connected () ;\
 virtual int usable () ;\
 virtual void operator <= (px##p1 & newEnd) ;\
 virtual px##p1 * operator -> () ;\
 virtual px##p1 * operator - () ;\
 virtual void operator -- () ;\
 virtual void addBefore ( px##p1 * inner, p1 * newEnd ) ;\
 virtual void addAfter ( px##p1 * inner, p1 * newEnd ) ;\
 virtual rd##p1 * getRedirector () ;\
 virtual void setRedirector ( rd##p1 * newRedirector ) ;\
 virtual p1 * connectedto ( px##p1 * newEnd ) ;\
 virtual void disconnectedFree () ;\
 virtual void disconnected () ;\
 virtual void afterConnect () ;\
 virtual void beforeDisconnect () ;\
 rd##p1 * myRedirector ;\
 px##p1 * myPlug ;\
};

#define DECLARE_REDIRECTOR(r1,p1)\
class r1 : public p1##HiddenRedirector\
{\
 public:\
 virtual p1 * operator = ( p1 * newPlug ) ;\
 virtual void resetRedirection () ;\
 r1 () {myPlug=NULL;myRedirector=NULL;}\
 virtual ~r1 () {(* this) = NULL;}\
 virtual int isRedirector () {return TRUE;}\
 virtual int connected () ;\
 virtual int usable () ;\
 virtual void operator <= (px##p1 & newEnd) ;\
 virtual px##p1 * operator -> () ;\
 virtual px##p1 * operator - () ;\
 virtual void operator -- () ;\
 virtual void addBefore ( px##p1 * inner, p1 * newEnd ) ;\
 virtual void addAfter ( px##p1 * inner, p1 * newEnd ) ;\
 virtual rd##p1 * getRedirector() ;\

 virtual void setRedirector ( rd##p1 * newRedirector ) ;\
 virtual p1 * connectedto ( px##p1 * newEnd ) ;\
 virtual void disconnectedFree () ;\
 virtual void disconnected () ;\
 virtual void afterConnect () ;\
 virtual void beforeDisconnect () ;\
 rd##p1 * myRedirector ;\
 p1 * myPlug ;\
};

// ---- Implementation definitions ----
#define IMPLEMENT_PLUG(c1,p1)\
int c1::connected () {return myPlug != NULL;}\
int c1::usable () {return TRUE;}\
void c1::operator <= (px##p1 & newEnd)\
{\
 if (myPlug != NULL)\
 myPlug -> disconnectedFree () ;\
 if ((& newEnd) != NULL)\
 {\
 myPlug = newEnd.connectedto ( this ) ;\
 if (myPlug)\
 {\
 (*myPlug).afterConnect();\
 (*this).afterConnect() ;\
 }\
 }\
}\
px##p1 * c1::operator -> () {return myPlug;}\
px##p1 * c1::operator - ()\
{\
 px##p1 * prior = myPlug ;\
 if ( prior != NULL )\
 {\
 beforeDisconnect();\
 prior -> beforeDisconnect();\
 disconnected () ;\
 prior -> disconnected () ;\
 }\
 return prior ;\
}\
void c1::operator -- ()\
{\
 px##p1 * prior = - (* this) ;\
 if (prior) prior -> disconnectedFree () ;\
}\
void c1::addBefore ( px##p1 * inner, p1 * newEnd )\
{\
 rd##p1 * oldRedirector = myRedirector ;\
 addAfter ( inner, newEnd ) ;\
 if (oldRedirector!=NULL)\
 {\
 oldRedirector -> resetRedirection () ;\
 (* oldRedirector) = newEnd ;\
 }\
}\
void c1::addAfter ( px##p1 * inner, p1 * newEnd )\
{\
 if (myPlug && newEnd)\

 (* newEnd) <= (* (- (* this))) ;\
 (* this) <= (* inner) ;\
}\
rd##p1 * c1::getRedirector () {return myRedirector;}\
void c1::setRedirector ( rd##p1*newRedirector ) {myRedirector=newRedirector;}\
p1 * c1::connectedto ( px##p1 * newEnd )\
{\
 if (myPlug)\
 -- (* myPlug) ;\
 myPlug = newEnd ;\
 return this ;\
}\
void c1::disconnectedFree () {myPlug=NULL;}\
void c1::disconnected () {myPlug=NULL;}

// ---- Define for redirector methods -----
#define IMPLEMENT_REDIRECTOR(r1,p1)\
p1 * r1::operator = ( p1 * newPlug )\
{\
 p1 * prior = myPlug ;\
 resetRedirection () ;\
 if (newPlug != NULL)\
 {\
 myPlug = newPlug ;\
 newPlug -> setRedirector(this) ;\
 }\
 return prior ;\
}\
void r1::resetRedirection () {myPlug=NULL;}\
int r1::connected ()\
 { return (myPlug == NULL) ? FALSE : myPlug -> connected () ; }\
int r1::usable ()\
 { return (myPlug == NULL) ? FALSE : myPlug -> usable () ; }\
void r1::operator <= (px##p1 & newEnd)\
 {if (myPlug) (* myPlug) <= newEnd ;}\
px##p1 * r1::operator -> ()\
 {return (myPlug) ? myPlug -> operator -> () : NULL ;}\
px##p1 * r1::operator - ()\
 {return (myPlug) ? - * myPlug : NULL ;}\
void r1::operator -- ()\
 {if (myPlug) -- * myPlug ;}\
void r1::addBefore ( px##p1 * inner, p1 * newEnd )\
{\
 if (myPlug)\
 {\
 myPlug -> setRedirector (NULL) ;\
 (* myPlug).addBefore ( inner, newEnd ) ;\
 }\
 if (newEnd)\
 newEnd -> setRedirector ( this ) ;\
 myPlug = newEnd ;\
}\
void r1::addAfter ( px##p1 * inner, p1 * newEnd )\
{\
 if (myPlug)\
 (* myPlug).addAfter ( inner, newEnd ) ;\
}\
rd##p1 * r1::getRedirector () {return myRedirector;}\
void r1::setRedirector ( rd##p1 * newRedirector )\

 {myRedirector=newRedirector;}\
p1 * r1::connectedto ( px##p1 * newEnd )\
 {return (myPlug) ? (* myPlug).connectedto(newEnd) : NULL ;}\
void r1::disconnectedFree ()\
 {if (myPlug) myPlug -> disconnectedFree();}\
void r1::disconnected ()\
 {if (myPlug) myPlug -> disconnected();}\
void r1::afterConnect ()\
 {if (myPlug) myPlug -> afterConnect();}\
void r1::beforeDisconnect ()\
 {if (myPlug) myPlug -> beforeDisconnect();}

#endif






[LISTING TWO]

// *** PLUGS.CPP--Example of Use of Plugs by WIlliam G. Wong Copyright 1992
***
// ****************************************************************

#include "plugs.h"

// ==== Class definitions ====
//
// These definitions are normally found in a header file.
//
// ---- Simple abstract plug that can link to objects link itself ----

class plugXBase // simple accessible class
{
 public :
 int plugXBaseCommonItem ;
} ;

class plugYBase {} ; // nothing provided here
 // plug used to access plugXBase only

// ---- Declare plugs ----

DECLARE_CONNECTION(plugX,plugXBase,plugY,plugYBase)
DECLARE_PLUG(aPlugX,plugX)
DECLARE_PLUG(aPlugY,plugY)
DECLARE_REDIRECTOR(plugXRedirector,plugX)


// ---- Dynamic allocation example ----
//
// This pair of plug classes work like plugX except that a single
// plug creates a new object each time a connection is made. The
// plug object is freed when the connection is broken.
//
// Both <= and connectedto must be redefined because a connection
// will come through one or the other depending upon which side the
// multiPlugX object is when the initial connection is initiated.


class dynamicPlugX : public aPlugX
{
 public:
 void disconnectedFree () { delete this ; } ;
} ;

class multiPlugX : public aPlugX
{
 public:
 void operator <= ( plugY & newEnd )
 { newEnd <= * new dynamicPlugX ; }
 plugX * connectedto ( plugY * newEnd )
 { return (* new dynamicPlugX).connectedto ( newEnd ) ; }
};

// ---- sample1 to sample2 connection ----
class sample1Base
{
 public:
 aPlugX x, y ;
} ;

class sample2Base {} ;

// ---- Declare plugs ----

DECLARE_CONNECTION(Sample1,sample1Base,Sample2,sample2Base)
DECLARE_PLUG(sample1,Sample1)
DECLARE_PLUG(sample2,Sample2)

// ---- A slightly more complex plug base ----
class myPlugBase
{
 public:
 int i ;
 virtual void read ( int x ) { i = x ; } ;
 virtual void write ( int x ) { i = x ; } ;

 myPlugBase () { i = 1 ; } ;
} ;


// ---- Declare plugs ----
DECLARE_CONNECTION(MyPlug1,myPlugBase,MyPlug2,myPlugBase)
DECLARE_PLUG(myPlug1,MyPlug1)
DECLARE_PLUG(myPlug2,MyPlug2)


// ==== Implementation Definitions ====

IMPLEMENT_PLUG(aPlugX,plugX)
IMPLEMENT_PLUG(aPlugY,plugY)

void aPlugX::afterConnect ()
{
 // initial connection code goes here
}

void aPlugX::beforeDisconnect ()

{
 // disconnect code goes here
}

void aPlugY::afterConnect ()
{
 // initial connection code goes here
}

void aPlugY::beforeDisconnect ()
{
 // disconnect code goes here
}

IMPLEMENT_REDIRECTOR(plugXRedirector,plugX)


// ---- Implementation for simple plugs ----
//
// Generates null afterConnect and beforeDisconnect functions.

IMPLEMENT_SIMPLE_PLUG(sample1,Sample1)
IMPLEMENT_SIMPLE_PLUG(sample2,Sample2)


// ---- Implementat plugs ----
//
// myPlug1a and myPlug1b redifine methods from the base class.

IMPLEMENT_SIMPLE_PLUG(myPlug1,MyPlug1)
IMPLEMENT_SIMPLE_PLUG(myPlug2,MyPlug2)

class myPlug1a : public myPlug1
{
 public :
 int i1 ;
 virtual void write ( int x ) { i1 = x ; } ;
} ;


class myPlug2a : public myPlug2
{
 public :
 int i2 ;
 virtual void write ( int x ) { i2 = x ; } ;
} ;


// ==== Sample program ====

void test ( myPlug1 & pp1 )
{
 aPlugX x1, x2, x3 ;
 plugX * ppx ;
 plugXRedirector rx1, rx2 ;
 aPlugY y1, y2, y3 ;
 sample1 s1 ;
 sample2 s2 ;
 myPlug1a p1 ;

 myPlug2a p2 ;
 multiPlugX mpx ;

 // ---- General connections ----
 x1 <= y1 ; // connect x1 and y1
 y1 <= x2 ; // disconnect x1
 // connect y1 and x2
 s1 <= s2 ; // connect types must match

 // ---- Compound linkages ----
 s2 -> x <= y1 ; // linking s1.x to y1
 s2 -> y <= y2 ; // linking s2.y to y2


 // ---- Redirector examples ----
 rx1 = & x1 ; // setup redirector
 rx1 <= y1 ; // connect x1 to y1 through r1
 y1 -> plugXBaseCommonItem = 1 ;
 // access item at other end
 rx1.addBefore ( & y2, & x3 ) ;// rx1 redirects x3, x1 connected to y2
 // and x3 is attached to y1
 rx2 = & x3 ; // setup a different redirector type
 rx2 <= y2 ; // connects y2 to x3 through r2
 rx2 = NULL ; // reset redirector

 // ---- Multiplug examples ----
 // This generates two new objects and deletes them when they are
 // no longer of use. Note how pp is used to keep around a reference
 // to (mp<=a) (the one created for a) which is later used to link to c.

 mpx <= y1 ; // y1 connected to new object
 mpx <= y2 ; // y2 connected to different object

 ppx = - y1 ; // disconnect but keep around object

 (* ppx ) <= y3 ; // link object to y3

 -- x2 ; // free up the other end of x2

 ppx = - y3 ; // - disconnects and returns pointer
 -- * ppx ; // -- * frees result


 // ---- Plugs with different variables or methods ----
 p1 <= p2 ;

 p1 -> read ( p1 -> i ) ; // using read and i from p2

 pp1 <= p2 ; // connects pp1 to p2
 // disconnects p1

 // ---- Explicit disconnects ----
 -- s2 ; // forced disconnect
 -- s1 ;

 if (x1.connected()) // determine if connection exists
 -- x1 ;
 // ---- Implicit disconnects for locally defined objects ----
}

// ------------main()--------
void main ()
{
 myPlug1a p1 ;

 test ( p1 ) ;
}
================================================================






















































October, 1992
SUPER DISTRIBUTION AND ELECTRONIC OBJECTS


What if there is a silver bullet...and the competition gets it first?




Brad Cox


Brad is the author of Object-oriented Programming: An Evolutionary Approach.
He can be reached at The Program on Social and Organizational Learning, George
Mason University, Fairfax, VA 22030 or at bradcox@infoage.com. A version of
this article first appeared in the Journal of Object-oriented Programming
(June, 1992), and is reprinted with the permission of SIGS Publications, 588
Broadway, New York, NY 10012.


Few programmers could develop a compiler, word processor, or spreadsheet to
compete in today's crowded software market. The cost and complexity of
modern-day applications far exceed the financial and intellectual capacity of
even the rarest of individuals. Even large-granularity subcomponents like
window systems, persistent-object databases, and communication facilities can
be larger than most individuals can handle. But most of us could provide
smaller (so-called "reusable") software components that others could assemble
into larger objects, components as small as stacks and queues.
So why don't we? Why do we drudge away our lives in companies with the
financial, technical, and marketing muscle to build the huge objects we call
"applications?" Why don't we start software companies (like Intel) to invent,
build, test, document, and market small-granularity objects for other
companies to buy? Think of the reduction in auto-emission pollution if more of
us stayed home to build small-granularity components for sale! Think of not
having to get along with the boss!
Object-oriented programming technologies have brought us tantalizingly close
to making this dream technically, if not economically, feasible. Subroutines
have long been able to encapsulate functionality into modules others can use
without needing to look inside, just as with Intel's silicon components.
Object-oriented programming languages have extended our ability to encapsulate
functionality within "software-ICs" that can support higher-level objects than
subroutines ever could. Such languages have already made the use of
prefabricated data-structure and graphical-user-interface classes a viable
alternative to fabricating cut-to-fit components for each application. All
this is technically feasible already, although the software industrial
revolution has hardly begun.
Yet these technical advances have not really changed the way we organize to
build software--they've only provided better tools for building software just
as before. The prefabricated small components of today are not bought and sold
as assets in their own right. They are bundled (given away) inside something
larger. Sometimes they are bundled to inflate the value (and price!) of some
cheap commodity item, as in Apple's ROM software, which turns a $50 CPU chip
into a $5000 Macintosh computer. Sometimes they play the same role with
respect to software objects, as in the libraries that come with
object-oriented compilers.
There is no way of marketing the small, active objects that we call "reusable
software components," at least not today. The same is true of the passive
objects we call "data." For example, nearly 50 percent of the bulk waste in
our landfills is newspapers and magazines. Nearly half of our bulk-waste
problem would be eliminated if we could break the habit of fondling the
macerated remains of some forest critter's home as we drink our morning
coffee. But this is far more than a bad habit from the viewpoint of newspaper
publishers. If they distributed news electronically, how would they charge for
their labor?
Paper-based information distribution makes certain kinds of information
unavailable, even when the information is easily obtainable. For example, I
hate price-comparison shopping and would gladly pay for high-quality
information about where to buy groceries and gasoline cheaply within driving
distance of my home. This information is avidly collected by various
silver-haired ladies in my community, but solely for their own use. There is
no incentive for them to electronically distribute their expertise to
customers like myself.
What if entrepreneurs could market electronic information objects for other
people to buy? Couldn't geographically specialized, but broadly relevant
objects like my gasoline-price example be the "killer apps" that the hardware
vendor are so desperately seeking? Think of what it could mean to today's
saturated market if everyone who buys gasoline and groceries bought a computer
simply to benefit from Aunt Nellie's coupon-clipping acumen.


Information-age Economics


These questions outline the fundamental obstacle of the
manufacturing-to-information age transition. While we're adept at selling
tangible goods such as Twinkies, automobiles, and newspapers, we've never
developed a commercially robust way of buying and selling easily copied,
intangible goods like electronic data and software.
Of course, there are more obstacles to building a robust market in electronic
objects than I could ever cover here. Many are technological deficiencies that
could be easily corrected, such as the lack of suitably diverse encapsulation
and binding mechanisms in today's object-oriented programming languages,
insufficient telecommunications band-width and reliability, and the dearth of
capable browsers, repositories, and software-classification schemes.
The biggest obstacle is that electronic objects can be copied so easily that
there is no way to collect revenue the way Intel does, by exacting a fee each
time another copy of a silicon object is needed. More than any other reason,
this is why nobody would ever quit their day job to build small-granularity
software components for a living.
A striking vestige of manufacturing-age thinking is the still-dominant
practice of charging for information-age goods like software by the copy.
Since electronic goods can be easily copied by every consumer, the producers
must inhibit copying with such abominations as shrinkwrap license agreements
and copy-protection dongles. Since these are not reliable and are increasingly
rejected by software consumers, the Software Publishers Association and
Business Software Alliance have started using handcuffs and jail sentences as
copy-protection technologies that actually do work, even for information-age
products like software.
The lack of robust information-age incentives explains why so many corporate
reuse-library initiatives have collapsed under a hail of user complaints.
"Poorly documented. Poorly tested. Too hard to find what I need. Does not
address my specific requirements." Except for the often rumored "Not invented
here" syndrome, the problem exists only occasionally on the demand side. The
big problems are on the supply side. There are no robust incentives to
encourage producers to provide minutely specialized, tested, documented, and
(dare I hope?) guaranteed components that quality-conscious engineers might
pay good money to buy. As long as these "repositories" are waste-disposal
dumps where we throw poorly tested and undocumented trash for garbage pickers
to "reuse," quality-conscious engineers will rightly insist, "Not in my
backyard!"
Paying for software by the copy (or "reusing" it for free) is so widespread
today that it may seem like the only option. But think of it in
object-oriented terms. Where is it written that we should pay for an object's
instance variables (data) according to usage (in the form of network-access
charges), yet pay for methods (software) by the copy? Shouldn't we also
consider incentive structures that could motivate people to buy and sell
electronic objects, in which the historical distinction between program and
data are altogether hidden from view?


Superdistribution


Let's consider a different approach that might work for any form of
computer-based information, an approach based on the following observation.
Software objects differ from tangible objects in being fundamentally unable to
monitor their copying but trivially able to monitor their use. For example, it
is easy to make software count how many times it has been invoked, but hard to
make it count how many times it has been copied. So why not build an
information-age market economy around this difference between
manufacturing-age and information-age goods?
If revenue collection were based on monitoring the use of software inside a
computer, vendors could dispense with copy protection altogether. They could
distribute electronic objects for free in expectation of a usage-based revenue
stream.
Legal precedents for this approach already exist. The distinction between
copyright (the right to copy or distribute) and useright (the right to
"perform," or to use a copy once obtained) is made in existing copyright laws.
These distinctions were stringently tested in court a century ago as the music
publishers came to terms with broadcast technologies such as radio and TV.
When we buy a record, we acquire ownership of a physical copy (copyright). We
also acquire a severely limited use-right that only allows us to use the music
for personal enjoyment. Conversely, large television and radio companies often
have the very same records thrust upon them by the publishers for free. But
they pay substantial fees to acquire the useright that allows them to play the
music on the air. The fees are administered by the American Society of
Composers, Authors and Publishers (ASCAP) and Broadcast Musicians Institute
(BMI) by monitoring how often each record is broadcast and to how large a
listening audience.
A Japanese industry-wide consortium, Japanese Electronics Industrial
Development Association (JEIDA) is developing an analogous approach for
software. Each computer is thought of as a station that broadcasts not the
software itself, but the use of the software, to an audience of a single
"listener."
The approach, which originated with Ryoichi Mori, is called superdistribution
because, like superconductivity, it allows information-age goods to flow
freely, without the resistance of copy protection and piracy. Its premise is
that copy protection is exactly the wrong idea for intangible, easily copied
goods such as software. Superdistribution leverages ease of copying by
encouraging such goods to be freely distributed and freely acquired via
whatever distribution mechanism you please. Users are actively encouraged to
acquire superdistribution software from networks, to give it away to their
friends, or even send it as junk mail to people they've never met. Broadcast
my software from satellites if you want. (Please!)
This generosity is possible because the software is really "meterware." It has
strings attached that make revenue collection independent of distribution. The
software contains embedded instructions that make it useless except on
machines equipped for this new kind of revenue collection.
The computers that can run super-distribution software are otherwise quite
ordinary. In particular, they will run ordinary pay-by-copy software just
fine. They just have additional capabilities that only superdistribution
software uses. In JEIDA's current prototype, these services are provided by a
silicon chip that plugs into a Macintosh coprocessor slot.
Electronic objects (not just applications, but active and/or passive objects
of every granularity) intended for super-distribution invoke this hardware to
ensure that the revenue collection hardware is present, that prior-usage
reports have been uploaded, and that prior-usage fees have been paid.
The hardware is not complicated (the main complexities being tamper-proofing,
not base functionality). It merely provides several instructions that must be
present before superdistribution software can run. The instructions count how
many times they have been invoked by the software, storing these usage counts
temporarily in a tamper-proof persistent RAM. Periodically (say monthly) this
usage information is uploaded to an administrative organization for billing,
using public-key encryption technology to discourage tampering and to protect
the secrecy of this information.
The end user gets a monthly bill for their usage of each top-level component.
Their payments are credited to each component's owner in proportion to the
component's usage. These accounts are then debited according to each
application's usage of any subcomponents. These are credited to the
subcomponent owners, again in proportion to usage. In other words, the end
user's payments are recursively distributed through the producer-consumer
hierarchy. The distribution is governed by usage-metering information
collected from each end user's machine, plus usage-pricing data provided to
the administrative organization by each component vendor.
Since communication is infrequent and involves only a small amount of metering
information, the communication channel could be as simple as a modem that
autodials a hardwired 800 number each month. Many other solutions are viable,
such as flash cards or even floppy disks mailed back and forth each month.


A Revolutionary Approach


Whereas software's ease of replication is a liability today (by
disincentivizing those who would provide it), superdistribution turns this
liability into an asset (by allowing software to be distributed for free).
Whereas software vendors must spend heavily to overcome software's
invisibility, superdistribution thrusts software out into the world to serve
as its own advertisement. Whereas the PC revolution isolates individuals
inside a stand-alone PC, superdistribution establishes a
cooperative/competitive community around an information-age market economy.
Of course, there are many obstacles to this ever really happening. A big one
is the information-privacy issues raised by usage monitors in every computer
from video games to workstations to mainframes. Although we are accustomed to
usage monitoring for electricity, telephone, gas, water, and electronic data
services, information privacy is an explosive political issue.
Superdistribution could easily be legislated into oblivion out of the fear
that the usage information would be used for purposes other than billing.
A second obstacle is the problem of adding usage-monitoring hardware to a
critical number of computers. This is where today's computing establishment
could be gravely exposed to those less inclined to maintain the status quo.

It is significant that superdistribution was not developed by the American
computer est blishment, which presently controls 70 percent of the world
software market. It was developed by JEIDA, an industry-wide consortium of
Japanese computer manufacturers.
The Japanese are clearly capable of building world-class computers. Suppose
that they were to simply build superdistribution capabilities into every one
of them, not as an extra-price option but as a ubiquitous capability of every
computer they build? What if the pair of superdistribution metering
instructions were built into every next-generation CPU chip, much as ADD and
JSR instructions are built in today? Think about the benefits I've discussed
in this article and then ask: Whose computers would you buy? Whose computers
would Aunt Nellie and her friends buy? What if superdistribution really is a
silver bullet for the information-age issues that I've raised in this article?
And what if the competition builds it first?


References


Cox, Brad J. Object-oriented Programming: An Evolutionary Approach. Reading,
MA: Addison-Wesley, 1986.
Cox, Brad J. Object Technologies: A Revolutionary Approach. Reading, MA
Addison-Wesley; available late 1992.
Cox, Brad J. "Planning the Software Industrial Revolution." IEEE Software
(November, 1990).
Cox, Brad J. "There is a Silver Bullet." BYTE (October, 1990).
Mori, Ryoichi and Masaji Kawahara. "Superdistribution: An Overview and the
Current Status." ISEC 89-44.
Mori, Ryoichi and Masaji Kawahara. "Superdistribution: The Concept and the
Architecture." The Transactions of the IEICE, vol E 73 (July, 1990).

















































October, 1992
A TASTE OF DYLAN


Apple's new object-oriented dynamic language




David Betz


David is a DDJ contributing editor and the author of XLisp, XScheme, Bob, and
numerous other programming languages. He can be contacted through the DDJ
offices.


Dylan is a new language designed by the Advanced Technology Group at Apple
Computer. Like C++, Dylan is an object-oriented language. Unlike C++, Dylan is
a DYnamic LANguage. This means that it provides automatic storage management
as well as runtime type checking and dynamic linking. It shares these features
with other dynamic languages such as Lisp and Smalltalk.
In fact, Dylan looks like a cross between Scheme (a small but powerful dialect
of Lisp) and CLOS (the Common Lisp Object System). Like Scheme, Dylan is a
compact language. It is designed to be compiled efficiently and to run on
machines with limited resources. This is in contrast to Common Lisp, which
typically requires many megabytes of memory.


Dylan's Object System


One of Dylan's most notable features is its object system. Like CLOS from
Common Lisp, it supports multiple inheritance and provides polymorphism
through generic function calls. With Dylan, however, the object system goes
all the way down to the primitive data types. In Common Lisp, the object
system was grafted on top of a nonobject-oriented language. Dylan is "objects
all the way down," like Smalltalk.
Example 1 shows an example of a Dylan class definition, which creates a class
and stores its definition as the value of the symbol <point>. The class
inherits from the class <object> and has slots named x and y. You can create a
new instance of the class <point> by passing it as an argument to the make
function along with any initialization keywords. In this case, x: and y: are
used to provide initial values for x and y, respectively. You can create a
point with the coordinates (12,23) using the form: (make <point> x: 12 y: 23).
Example 1: A Dylan class definition.

 (define-class <point> (<object>)
 (x required-init-keyword: x:)
 (y required-init-keyword: y:))

In addition to defining a new class, the define-class form also creates getter
and setter functions for each of the slots. In Dylan, slot values are always
accessed through functions. The getter function gets the value of a slot and
the setter function sets its value. The default names for these functions are
derived from the name of the slot. The default getter function name is simply
the slot name. The default setter function name is the list (setter
<slot-name>). You can override these defaults using slot options.
In Example 2(a), for instance, setter functions are used to change the value
of a slot. You can call the setter function directly, as in Example 2(b). But
it is much more common to use the form in Example 2(c).
Example 2: (a) Using setter functions to change the value of a slot; (b)
calling the setter function directly; (c) an alternate use of the setter
function.

 (a)

 (define foo (make <point> :x 12 y: 23))
 (x foo) => 12
 (y foo) => 23

 (b)

 ((setter x) foo 99)

 (c)

 (set! (x foo) 99)

The symbol required-init-keyword: is called a "slot option." It states that
the keyword that follows is used to initialize its associated slot and that
the keyword must be present in any call to make for the class <point>. Table 1
lists other slot options that Dylan supports; Table 2 lists space-allocation
options.
Table 1: Dylan slot options.

 Slot Option Purpose
 ------------------------------------------------------------------------

 getter: Specify the getter function name.
 setter: Specify the getter function name.
 type: Specify the type of values to be stored in the slot.
 init-value: Specify the initial value of the slot.

 init-function: Specify a function to compute the initial value of the
 slot.
 init-keyword: Specify the make function keyword used to initialize the
 slot.
 allocation: Specify how space for the slot is to be allocated.

Table 2: Dylan space-allocation options.

 Option Purpose
 ------------------------------------------------------------------------

 instance Each instance has storage for the slot.
 class Every instance of the class and its subclasses share the
 same storage for the slot.
 each-subclass The class and each-subclass get their own storage for the
 slot, each instance uses the storage associated with its
 direct class.
 constant The value of the slot is a constant.
 virtual The slot has no storage allocated for it and its value
 must be managed by its getter and setter functions.

A class can be defined as a subclass of several other classes. In this case,
the new class inherits the structure and behavior of all its parent classes.
For instance, you may want to define a subclass of <point> that allows a bunch
of points to be linked together into a doubly linked circular list. You could
define a subclass of <point> with two additional fields, next and prev, but
this would bury the definition of the doubly linked circular lists in the
definition of linked-point. It would be better to separate out that
functionality to make it available for building other types of doubly linked
lists. You can do this by defining a mixin class, a class that isn't really
useful by itself but can be mixed into other classes to give them additional
functionality. Example 3(a) shows a definition of the doubly linked list
mixin. Example 3(b) then defines the <linked-point> class. Objects of the new
class <linked-point> will contain the slots of both <point> and
<linked-entity>. All slot accessors for both classes will work on instances of
the new subclass, and any methods that apply to either superclass (or their
superclasses) will apply to instances of the subclass.
Example 3: (a) A definition of the doubly linked list mixin; (b) a definition
of the <linked-point> class.

 (a)

 (define-class <linked-entity> (<object>)
 next
 prev)

 (b)

 (define-class <linked-point> (<point> <linked-entity>))



Polymorphism


One of the important concepts of object-oriented programming is polymorphism.
Polymorphism makes it possible to customize the behavior of a function based
on the type of its arguments.
Dylan implements polymorphism through generic functions, collections of
methods that understand how to handle particular types of arguments. When a
generic function is applied to arguments, the generic-function dispatch
facility selects an appropriate method based on the classes of the arguments
passed in the generic function call.
Unlike message-passing languages such as Smalltalk that only dispatch on the
class of the object receiving the message, Dylan takes into account the
classes of all of the required arguments to a function. This is called
"multiple dispatch" (or "multimethods"), and makes it much easier to define
functions whose behavior depends on more than one of their arguments. For
instance, a print function could behave differently based on both its target
output device and the type of object being printed.


Method Definitions


Now, it's time to look at method definitions. Example 4 defines a few methods
useful for classes that use the <linked-entity> mixin. The first method is
used to initialize a new instance of the class. It simply sets the next and
prev pointers to the object itself. This creates a doubly linked circular list
of one element. Note that initialize calls next-method before doing anything
else. The next-method function calls the next applicable method, which would
have been called if this initialize method did not exist. This is useful for
adding behavior in a subclass since it causes an inherited method to be
called. Dylan doesn't support before, after, and around methods like CLOS, so
this is the primary means of method combination.
Example 4: Defining methods for classes that use the <linked-entity> mixin.

 (define-method initialize ((x <linked-entity>))
 (next-method)
 (set! (next x) x)
 (set! (prev x) x))

 (define-method add-after ((e1 <linked-entity>)
 (e2 <linked-entity>))
 (bind ((e3 (next e1)))
 (set! (next e1) e2)

 (set! (next e2) e3)
 (set! (prev e3) e2)

 (set! (prev e2) e1)))

The second method adds the element e2 to a list immediately after the element
e1, updating the links accordingly.
Methods usually have one or more required arguments, whose classes are used to
select the method when it is associated with a generic function. In the
add-after method, both e1 and e2 are required arguments.
Each required argument may have a specializer that restricts the objects that
can appear as the value of that argument. In the definition of add-after, the
specializer (e1 <linked-entity>) says that the first argument is called e1 and
its value must be a general instance of the class <linked-entity>. (A general
instance of a class is an instance of either the class itself or one of its
subclasses.) Even when you call a method directly, the constraints implied by
the specializers are enforced. It's impossible to call a method with arguments
that don't match its argument specializers.
In addition to required arguments, methods can have optional keyword arguments
and rest arguments. Keyword arguments are passed in pairs. The first element
of the pair is the keyword, and the second is the value. You can also specify
a default value to be used if the keyword argument isn't specified.
Sometimes it is convenient to allow a function to take an arbitrary number of
arguments. This is handled by the rest argument. When you call a method with a
rest argument, all of the arguments beyond the required arguments are
collected together and passed as a sequence as the value of the rest argument.
The method is then free to examine these additional arguments using the
sequence functions provided by Dylan. When a method accepts both keyword
arguments and a rest argument, the same actual arguments are used to satisfy
both.
Dylan allows methods to return more than one value. For instance, if you want
to define a method for <point> that returns both coordinates, you do it as
shown in Example 5(a). This method will return two values, the x and y
coordinates. To use the two values, you can use the bind form to bind the
values to variables; see Example 5(b).
Example 5: (a) Defining a method for <point> that returns both coordinates;
(b) binding the two values in (a) to variables.

 (a)

 (define-method coordinates ((p <point>))
 (values (x p) (y p)))

 (b)

 (bind ((x y (coordinates foo)))
 ... do something with x and y ...
 )

All object-oriented languages allow instance variables or slots to be
associated with a class of objects. Dylan also allows slots to be associated
with particular instances. This can be handy when you don't want to create a
new class that will have only one instance just to allow that instance to have
a new slot. You can also define a method that applies only to a particular
instance using what Dylan calls a "singleton." A singleton can be used in
place of a class in the specializer for an argument to a method and will cause
the method to apply only to that particular instance.


Conclusion


Dylan has many other interesting features that I haven't discussed here:
object-oriented condition handling, a standardized protocol for handling
iteration, introspective functions for allowing programs to inspect the way
classes and methods work, and so on. I hope this article has given you a
flavor of this new object-oriented, dynamic programming language from Apple.
Incidentally, you can get a copy of the Dylan manual by writing to: Apple
Computer Inc., One Main Street, Cambridge, MA 02142.

_A TASTE OF DYLAN_
by David Betz



Example 1: A Dylan class definition

(define-class <point> (<object>)
 (x required-init-keyword: x:)
 (y required-init-keyword: y:))



Example 2: (a) using setter functions to change the value of a slot;
(b) calling the setter function directly; (c) an alternate use of
the setter function

(a)

(define foo (make <point> :x 12 y: 23))
(x foo) => 12
(y foo) => 23


(b)


((setter x) foo 99)

(c)

(set! (x foo) 99)



Example 3: The doubly-linked list mixin

(a)

(define-class <linked-entity> (<object>)
 next
 prev)

(b)

(define-class <linked-point> (<point> <linked-entity>))

Example 4: Defining methods for classes that use the <linked-
entity> mixin

(define-method initialize ((x <linked-entity>))
 (next-method)
 (set! (next x) x)
 (set! (prev x) x))

(define-method add-after ((e1 <linked-entity>)
 (e2 <linked-entity>))
 (bind ((e3 (next e1)))
 (set! (next e1) e2)
 (set! (next e2) e3)
 (set! (prev e3) e2)

 (set! (prev e2) e1)))

Example 5: (a) defining a method for <point> that returns both coordinates;
(b) binding the two values in (a) to variables


(a)

(define-method coordinates ((p <point>))
 (values (x p) (y p)))

(b)

(bind ((x y (coordinates foo)))
 ... do something with x and y ...
)







































































October, 1992
DPMI MEETS C++


An object-oriented abstraction of DPMI


This article contains the following executables: DPMI.ZIP


Frederick Hewett


Fred is vice president of Cypress Software Ltd. and can be contacted on
CompuServe (72647,3472) or MCIMail (FHEWETT).


The DOS Protected Mode Interface (DPMI) has been a part of the DOS programming
environment since the release of Windows 3.0. One measure of its success is
the impact it has had on DOS-extender vendors and programmers who develop
protected-mode applications. Virtually all of them have introduced tools that
operate in a DPMI environment and are therefore compatible with Windows
Enhanced mode. Both Microsoft and Borland have introduced compiler products
not simply compatible with DPMI, but that require a DPMI host in order to run;
both vendors bundle a DPMI host as part of their package. As a result, DPMI
will be found on far more systems than it has to date, and an increasing
number of applications will require and take advantage of it.
Given the likelihood that DPMI is on the threshold of wide proliferation,
those who write large non-Windows programs for DOS stand to profit from a
sound understanding of DPMI, its interface structure, and its capabilities. To
that end, this article looks at DPMI from an object-oriented perspective,
using a C++ class library developed by Qualitas (Bethesda, Maryland) as a
basis for exploring DPMI. (Qualitas' 386MAX memory manager is a DPMI host.)
This class library is available electronically; see "Availability" on page 5
of this issue.


DPMI Backgrounder


DPMI is a programming interface that allows application-level code to run in
protected mode under DOS in a virtual 8086-mode operating environment.
Programs ran in protected mode under DOS long before DPMI was conceived. Prior
to 1988, DOS-extended programs were incompatible with virtual 8086-mode memory
managers such as 386-MAX from Qualitas and QEMM from Quarterdeck. The Virtual
Control Program Interface (VCPI) provided a means for protected-mode
applications to run in these environments. To do so, however, the memory
managers had to allow the application to run at the processor's most
privileged level. The architects of Microsoft Windows 3.0 felt that this
compromised system integrity, and offered DPMI as an alternative.
The key distinction between DPMI and VCPI is that DPMI allows applications to
operate only at nonprivileged levels, thereby preventing them from having
direct access to system-sensitive structures such as descriptor tables and
page tables. Instead, DPMI hosts provide a set of services which, in a
controlled fashion, expose the functionality required for protected-mode
operation. Specifically, these services allow clients to do the following:
allocate, free, and modify entries in the local descriptor table (LDT);
allocate and free DOS memory (in the low megabyte); hook interrupts in both
real and protected modes; trap processor exceptions (for instance,
general-protection faults and segment-not-present faults); allocate, free, and
resize blocks of extended memory; enable and disable interrupts; and access
the processor's breakpoint capability (debug registers).
DPMI is specified at the assembly language level. Applications call DPMI by
loading registers and issuing an INT 31h. For example, a program running in
protected mode requests DPMI to allocate three entries in the LDT by setting
AX=0000 (the function identifier), and CX = 3 (the number of descriptors to
allocate) and issuing an INT 31h. The DPMI host returns the selector of the
first of three consecutive descriptors that it allocates in AX, and clears the
carry flag to indicate success.
DPMI does a decent job of abstracting the processor's protected-mode
capabilities, and assembly language is the appropriate level for its
specification. However, to learn about the capabilities of a DPMI host in
general, and gain insight into the interface structure, a higher-level of
abstraction is required. An object-oriented perspective serves this purpose.


DPMI Meets OOP


The process of making the abstraction consists of first isolating the
different data structures DPMI supports (that is, memory blocks and interrupt
handlers). The relationships between these entities, along with the operations
that DPMI clients may perform on them, lead to the definition of C++ classes.
The classes codify the structure of DPMI, clarifying aspects of the interface
that might otherwise be obscure.
Over the last few years, much effort has gone into developing methodologies
for determining optimal class structure. Since the problem at hand is small,
an iterative approach is the most practical: Examine the interface, postulate
classes based on that analysis, test the conceptualization with a prototype,
then apply the results of the test to further analysis until a smoothly
working set of classes emerges.
Before getting into the specifics of the classes, a word about some
constraints on the implementation is necessary. The library is designed for
16-bit small-model operation (single code segment, single data segment). A
large-model or 32-bit implementation would require a program loader that
intelligently performs segment fix-ups on the executable image for
protected-mode execution. While technically feasible, this is the domain of
DOS extenders, and is beyond the scope of this instructional project. With
small-model architecture, near pointers may be used in both real and protected
mode.


The DPMI Class Library


Consider first the class DPMIhost (see Example 1), which abstracts the DPMI
host itself. By simply declaring an instance of a DPMI host object, a program
performs all the necessary operations for detecting the presence of and
gaining linkage to a DPMI host. The program may then use the getStatus()
member function to verify that a host was indeed found.
Example 1: The class DPMIhost.

 class DPMIhost {
 public: DPMIhost(void);
 DPMIerr getStatus(void);
 boolean enter ProtectedMode (uChar bitness);
 void getVersion(uChar *major, uChar *minor);
 uChar getSelectorDelta(void);

The enterProtectedMode() member function makes the initial switch from real or
virtual 8086 mode into protected mode. The function also installs a set of
handlers that gracefully exit to DOS if a fatal exception occurs. Once in
protected mode, it is safe to use the other classes in the DPMI library.
The class library defines a set of classes that abstract the DPMI
implementation of exception handlers, interrupt handlers, real procedure
calls, and real-mode callbacks. Due to space constraints, the complete library
is available only electronically; see "Availability," on page 5 for details.
The classes that support memory as age under DPMI, diagrammed in Figure 1,
illustrate the basic ideas of the library, and are perhaps the most useful. To
see why, consider what DPMI presents to the developer with regards to memory.
Conspicuously absent is a call with functionality equivalent to the familiar
DOS memory-allocation call (INT 21h, AH=48h). For good reasons, the interface
decouples descriptor management and linear-memory management. It requires
several DPMI calls to allocate a block of linear memory and to fully
initialize a descriptor that addresses the block.
The class library is designed to allow the same level of control that the raw
DPMI interface allows, and at the same time make it easy to do common
operations like allocation of an addressable block of memory, modification of
individual attributes of segments, and release of memory and descriptors back
to the host. To achieve both ease of use and low-level control you must create
a primitive set of classes that map onto low-level DPMI entities, then use
inheritance to synthesize higher-level functionality from these base classes.
Consider first raw linear memory: It contains little room (or need) for
abstraction. The library defines the class Block, as shown in Example 2. The
constructor takes as an argument a 32-bit value that specifies the desired
size of the block. It makes the DPMI call to allocate the memory, and if
successful, stores the linear address and a handle in protected data members
of the class. The class provides member functions for reading these values.
Example 2: The class Block.

 class Block {

 public:
 Block (uLong size);
 ~Block (void);
 boolean setSize (uLong);
 uLong blockHandle(void);
 uLong blockSize(void);
 uLong blockBase(void);
 protected:
 uLong handle;
 uLong base;
 uLong size;

An instance of the class Block allocates raw linear memory from DPMI, but raw
linear memory is of little use by itself. To be useful, one or more
descriptors must target that memory. DPMI has a number of ways to create
descriptors, and the class library must abstract the differing behaviors of
descriptors created by each of these services.
To start, let's itemize the DPMI services whose invocations result in creation
of descriptors; see Table 1. The first three create descriptors with the same
behavior: They are fully modifiable and are released by DPMI function 0001h.
The map-real-paragraph-to-descriptor service creates a descriptor that may not
be modified or freed, and is therefore distinguished from the others.
Similarly, the descriptors created by the allocate-DOS-memory service require
special DPMI calls to modify or free them.
Table 1: DPMI services.

 Function Name Description
 -------------------------------------------------------------------------

 0002h Allocate Descriptor Simply allocates one or more LDT entries.
 000Ah Create Alias Allocates a descriptor, then sets its
 Descriptor base, limit, and attributes according to
 an already-existing LDT entry.
 000Dh Allocate Specific Allocates a particular entry in the
 Descriptor LDT. (First 16 entries are reserved for
 this call.)
 0002h Map Real Paragraph Creates a descriptor that maps a paragraph
 to Descriptor in the low megabyte, with a limit of 64
 Kbytes.
 0100h Allocate DOS Allocates memory from DOS, and allocates
 Memory one or more LDT entries to map that
 memory.

This analysis gives rise to three distinct classes in the class library:
Class Segment corresponds to fully modifiable descriptors.
Class CommonRealSegment corresponds to those created by function 0002h.
Class DOSMemory corresponds to those created by function 0100h.
Although the three classes are distinct, one set of operations is common to
all. Some examples of such operations are obtaining the segment's base address
and size and querying the segment's properties. Each operation may be
performed on any descriptor, regardless of how it came into existence. Other
operations, such as resizing the segment, are illegal for the
CommonRealSegment class, but make sense for Segment and DOSMemory, although
different code is required to perform the task in question.
In this situation, you can leverage the power of C++ by introducing a base
class that implements the behavior shared between the three segment classes,
and that defines member functions corresponding to operations that may be
performed on them (albeit in distinct ways). In the DPMI class library, the
class AbstractSegment, shown in Example 3, serves this purpose. The only data
member of AbstractSegment is selector, which simply stores the selector of the
descriptor in question. The member functions segmentSize(), segmentBase(), and
queryProp() each use DPMI calls to read information about the descriptor from
the LDT. Since it is possible to read this information via DPMI for any
descriptor regardless of how it was created, these methods are inherited and
used by Segment, CommonRealSegment, and DOSMemory.
Example 3: The class AbstractSegment.

 class AbstractSegment
 {
 public:
 virtual uLong segmentSize(void);
 virtual uLong segmentBase(void);
 virtual booelan queryProp(SegmentProp_t);
 virtual operator+(SegmentProp_t)=0;
 virtual operator-(SegmentPRop_t)=0;
 virtual boolean resize (uShort)=0;
 virtual boolean move (uLong)=0;
 void far *ptrTo(void);
 protected:
 selector_t selector;
 }

Note that in class AbstractSegment, the addition and subtraction operators are
overloaded when they act on the type SegmentProp_t. The type SegmentProp_t is
an enumeration defined in Example 4. Each member of the enumeration
corresponds to one or more bits of the modifiable descriptor attributes. The
class library abstracts the operation of modifying descriptor attributes by
allowing programmers to simply add or subtract properties from instances of
classes derived from AbstractSegment. The resulting Boolean value indicates
the DPMI host's success in effecting the requested modification.
Example 4: The type Segment Prop_t.


 typedef enum SegmentProperty {
 present,
 executable,
 readable,
 writable,
 big,
 expandDown
 } SegmentProp_t;

For example, the code in Example 5 shows how a program changes a data segment
to an executable segment, and branches on the success of the operation. The
syntax for modifying descriptor properties is natural, and demonstrates how
operator overloading can improve code readability.
Example 5: Program change of a data segment to an executable segment.

 if (myDataSeg + executable)
 {
 // operation succeeded
 }
 else {
 // operation failed
 }

It's important to note that the operator+() and operator-() member functions
are pure virtual functions--they are not implemented in the base class, and
must be implemented in any derived class to be instantiated. The same is true
for the member functions move() and resize(), which change the base and limit
of a segment, respectively. It is incorrect to implement these members for
AbstractSegment because their exact semantics depend on the actual descriptor
type. AbstractSegment is, in fact, an abstract class; it provides a generic
definition of common functionality. AbstractSegment itself, however, cannot be
instantiated, because that would require allocation of a descriptor via some
DPMI call, and would thereby fix a specific behavior of that object.
The last member function of AbstractSegment is ptrTo(). Using the selector
stored within the class instance, this function simply returns a far pointer
to the base of the segment that the descriptor defines. The class
AbstractSegment implements ptrTo() as an inline function.


Deriving the Memory Classes


Moving up the hierarchy, the derived class Segment has three distinct
constructors that correspond to the three DPMI functions that create fully
modifiable descriptors. The first constructor has no argument, and uses the
basic Allocate Descriptor call (function 0000h) to allocate one descriptor.
The second takes a selector (unsigned 16-bit integer) as an argument, and uses
the Allocate Specific Descriptor call to allocate one of the first 16
descriptors in the LDT. The third constructor takes a reference to an
AbstractSegment (or derived class) and uses the Create Alias Descriptor call
to allocate a descriptor and initialize its base, type, and limit in agreement
with the descriptor passed to it. If any constructor is unable to allocate a
descriptor, the constructor sets the selector field to 0, so that ptrTo()
returns a null pointer.
The class Segment implements the pure virtual members of the class
AbstractSegment using DPMI's LDT management services; the mappings between
these services and the member functions are intuitive. The destructor calls
DPMI to release the descriptor to the host, which relieves the programmer from
freeing descriptors when they are no longer needed.
The member functions for the class CommonRealSegment return False to indicate
that the corresponding operations cannot be performed on them. The class does
not define a destructor, since DPMI does not permit clients to free these
descriptors. Similarly, the DOSMemory class supports the resize() member
function and has a destructor that uses DPMI function 0101h to free the
corresponding DOS memory block. It does not, however, allow setting the base
address (the move() member function returns False) or modifying segment
properties.
What's missing from the set of classes defined so far? You may recall that the
Allocate Descriptor call takes as an argument the number of consecutive
descriptors to allocate, but an instance of class Segment only allocates a
single descriptor. DPMI allows programmers to obtain an ordered set of
descriptors that can be set up to span a partition of memory larger than 64
Kbytes--an important capability for addressing huge objects in 16-bit mode.
The library includes the class HugeSegment, derived from AbstractSegment. The
constructor for HugeSegment takes a 32-bit integer argument that specifies the
size (in bytes) of the memory region the segment must span; the class
implementation determines from this how many consecutive descriptors it will
require. The constructor sets the bases of consecutive component descriptors
at intervals of 64 Kbytes. The HugeSegment class supports all the member
functions of AbstractSegment, although the implementation is more complex. All
member functions, including add or remove property, act on all the component
descriptors, and the destructor releases all component descriptors to the
host.
Even with all these classes, there is not enough functionality to allocate
memory and address it via a descriptor. The class library reflects DPMI's
decoupling of memory and descriptors by defining a Block class, along with the
descriptor classes derived from AbstractSegment. A class is needed that brings
these classes together to create addressable blocks of memory.
You might think an elegant solution would be to simply override the global new
operator so that all dynamic allocations use memory and descriptors allocated
from DPMI. Unfortunately, this is not possible in the small model environment,
because new is thus defined to return near pointers, and addressable memory
blocks obtained from DPMI are necessarily far. In a large-model
implementation, however, this would be a good solution.
For the small-model implementation, two additional classes provide the desired
functionality: MemorySegment and HugeMemorySegment. MemorySegment is derived
from Block and Segment, and HugeMemorySegment is derived from Block and
HugeSegment.
Declaration of a MemorySegment constructs both a Block, resulting in
allocation of raw linear memory, and a Segment, which the class initializes to
address the allocated memory. After declaring a MemoryBlock, a program uses
the inherited ptrTo() member function to get a far pointer to the memory
allocated. The class overrides virtual members of AbstractSegment in a
sensible way; the resize() member function, for example, resizes the memory
block and updates the base and limit of the descriptor. Behavior is analogous
for HugeMemorySegments, and the size is not restricted to less than 64 Kbytes.
Multiple derivation of these classes, reflecting the structure of DPMI yields
a higher level of functionality, while retaining the granularity of low-level
DPMI services.
_DPMI MEETS C++_
by Frederick Hewett


Example 1

class DPMIhost {
public:
 DPMIhost(void);
 DPMIerr getStatus(void);
 boolean enterProtectedMode(uChar bitness);
 void getVersion(uChar *major, uChar *minor);
 uChar getSelectorDelta(void);
}




Example 2

class Block {
public:
 Block(uLong size);
 ~Block(void);

 boolean setSize(uLong);
 uLong blockHandle(void);
 uLong blockSize(void);
 uLong blockBase(void);
protected:
 uLong handle;
 uLong base;
 uLong size;
}




Example 3

class AbstractSegment
{
public:
 virtual uLong segmentSize(void);
 virtual uLong segmentBase(void);
 virtual booelan queryProp(SegmentProp_t);
 virtual operator+(SegmentProp_t)=0;
 virtual operator-(SegmentProp_t)=0;
 virtual boolean resize(uShort)=0;
 virtual boolean move(uLong)=0;
 void far *ptrTo(void);
protected:
 selector_t selector;
}




Example 4

typedef enum SegmentProperty {
 present,
 executable,
 readable,
 writable,
 big,
 expandDown
} SegmentProp_t;




Example 5

if (myDataSeg + executable)
{
// operation succeeded
}
else
{
// operation failed
}

































































October, 1992
TIMED CALLBACKS IN C++


Implementing real-time clock services for small embedded systems




Christian Stapfer


Christian, a freelance programmer who specializes in small-system development,
can be contacted at Friesenbergstrasse 38, CH-8055 Zurich, Switzerland.


Recently, I was systems programmer of a group that developed an 80186-based
embedded system designed to monitor and control moisture. The software which
controlled the analyzer and a variety of peripherals included the following
functionality: a force-compensation cell, an infrared heater, several
temperature sensors, a customized keyboard and LCD display, a speaker, several
serial channels, and a printer. One of the problems I faced was how to keep
track of time--triggering rescheduling of compute-bound tasks, detecting
time-out on I/O devices, making LCD display elements blink, implementing
autorepeat keys, polling the temperature sensors with a chosen frequency,
playing a few simple tunes with the speaker, and the like.
Following the design maxim that "data flow across module boundaries must be
accompanied by control flow," I implemented in Borland C++ a "timed callback"
scheme, whereby functions are queued to be invoked after a given number of
system-clock ticks. The implementation uses a bounded priority queue and is
quite efficient. The timed callback transmits the information that a certain
number of clock ticks has passed by invoking a client-specified function from
within the clock module. From within the callback, therefore, the client can
switch a semaphore write to several message queues, output a byte to a
hardware port, and so on. I added the ability to transmit context information
from the client to its callback function. By using the return value of the
callback to specify, that it should be called again after so many clock ticks,
I also implemented autorepeat keys, blinking display elements, polling
sensors, and sounds. Listing One (page 120) is TMDCBK.HPP, the timed-callback
interface, and Listing Two (page 120) is TMDCBK.CPP, the actual
implementation.
Ignoring the problems of concurrency for the moment, I proposed the scheme in
Figure 1. You call timedCallback as in Example 1, then queue the callback. For
example, if you have detected a keypress and want to generate a first
duplicate after Delta clock ticks: AutoRepeat.queue(Delta);.
Example 1: Calling timedCallback.

 timedCallback AutoRepeat (RepeatFun);
 assuming that you have defined the function:
 long RepeatFun (timedCallback* Self)
 {
 // Eg. generate duplicate keypress..
 return NewDelta;
 }

Figure 1: Class timedCallback.

 class timedCallback
 {
 public:
 typedef long (*timeCallbackFun) (timedCallback* Self);
 timedCallback(timedCallbackFun SomeFun);
 void queue(long Delta);
 //-- Queues a timed callback for Delta ticks.
 void cancel ();
 // -- Cancels a queued callback
 static void tick ();
 // -- To be called for each clock tick
 private:
 ...
 };

If your RepeatFun gets invoked, it can generate the first duplicate keypress
and requeue itself. That's an easy way to have the repeat frequency
increase--up to a certain limit, say--if the user keeps the key pressed for a
long time. If you detect a key release, however, you just cancel the callback:
AutoRepeat.cancel();.
The pointer passed to the callback function is meant to allow the client to
transmit context information to its callback by embedding the timedCallback
object in another class and using a cast within the callback to access it; see
Example 2. In this example, I used inheritance, not embedding, to relate class
context to a timed callback.
Example 2: Embedding the timedCallback object in another class and using a
cast within the callback to access it.

 class context : timedCallback

 {
 // Whatever RepeatFun needs to know!
 };
 long RepeatFun (timedCallback* Self)
 {
 context *ContextPtr = (context *)Self;
 ...// Use ContextPtr ->* to access data

 return NewDelta;
 }



Top of the Heap


The scheme described here can be efficiently implemented using a heap. A heap
is a fully balanced binary tree mapped neatly into a linear array (hence the
fixed upper boundary on queued callbacks) and maintained in such a way as to
keep the heap property invariant at all times.
The callback to be invoked first is placed on top of the heap using as the key
value the required delivery time expressed as the number of clock ticks since
startup. If the tick counter overflows, it's possible to use a signed long
tick counter instead of an unsigned one. Then you replace the restriction on
the total number of clock ticks with a restriction on the number of ticks to
wait before delivery of a callback on the Delta parameter of member queue().
Requiring 0 less than equal Delta less than 231, we can handle the wraparound
problem by defining Key1 less than equal Key2 && 0 less than equal Key2-Key1
and base the heap on that ordering; see Listing Two.


An Example


The example program in Listing Three, page 124, plays a simple tune in lieu of
a more complete example from our embedded system (which would require
inordinate amounts of detail and, above all, a multitasking system). Other
examples that could make use of timed callbacks to solve time-related problems
might include the following:
Whenever the scheduler wants to run a task for which time slicing has been
enabled, it queues a timed callback. However, the scheduler need never cancel
that callback. Instead, if the callback is invoked by the clock module, it
checks whether the task for which it has been queued is still active and
forces rescheduling if necessary.
To implement time-out control for I/O devices, you can add a timed-callback
member to each I/O control block. Again, whenever a task gets blocked waiting
for some I/O to complete, queue that callback. If the callback is invoked, it
uses its argument pointer to access the I/O stream control block of which it
is a part and unblocks the task with an appropriate status indication.
You can use a timed callback to implement a simple automaton for sounds. If
the automaton is idle and gets invoked--because we queued it from without--it
starts playing a tune as indicated to it by, say, a list of triples. Each list
entry specifies pitch, duration, and mode. The callback keeps working on that
list, modifying its internal state as needed and requeuing itself until it
reaches the end of the list, at which point it becomes idle again. Listing
Three is a much-simplified version of this idea.
To delay tasks for a given number of clock ticks, you can design a class of
timers like that in Figure 2. These objects can be made more useful than a
mere task delay: They can be made drift-free sources of periodic time events,
requiring fewer system resources than additional tasks. (Remember, this is for
a small system.) Additionally, multiple tasks can wait for the same timer to
expire.
Figure 2: Class timer.

 class timer : timedCallback
 {
 public:
 typedef enum {

 stopped, started, expired
 } timerState;
 timer();

 void start(long Delta);
 // -- Timer will time out after Delta clock ticks
 timerState wait ();
 // -- Blocks task until the timer expires or stops
 void stop ();
 // -- Stop timer and reactivate blocked tasks

 void setPeriod();
 // -- Makes the timer into a driftfree periodic
 // source of events
 timerState state();
 private:
 ...
 };

This example shows that callbacks offer time-related services at a level below
that of tasks. If your system is rich in MIPS and bytes and your multitasking
system already offers time services, you may want to use tasks instead of
callbacks.


What About Concurrency?


An easy way to implement concurrency--and one that will get by during the
development process--is to disable interrupts when accessing the shared data
structure used to support timed callbacks--essentially, the priority queue. We
can simply use the clock interrupt to drive tick(). But the more you use this
facility, the more you'll find that either interrupt latency caused by these
interrupt-disable periods is unacceptable or you need more processing within
timed callbacks, except when servicing the clock interrupt.
In this case, you can offload callback invocation and heap maintenance from
the clock-interrupt handler to a high-priority task. The clock interrupt can
still be used to drive member tick(), but we would move the body of the
current implementation of tick() to that task and reduce tick() to something
like Example 3.
Example 3: Moving and reducing tick().

 void timedCallback::tick()
 {
 tickCount += 1;

 if (0 <= tickCount - nextOut)
 wakeUp.up();
 }

This implementation of timedCallback uses nextOut to remember when the next
callback is to be invoked. The high-priority task simply sits in a loop: It
first blocks on a semaphore wakeUp until it is time for it to invoke some
callbacks. The task must be careful, however, not to get confused if it cannot
finish dishing out callbacks before the next clock interrupt arrives.


Conclusion


I occasionally envy programmers who develop for targets larger than that
described here. The sometimes unnecessary complexity dictated by such systems,
however, can eliminate much of the joy experienced when working on small
embedded systems--those that still force you into "running light without
overbyte."
_TIMED CALLBACKS IN C++_
by Christain Stapfer


[LISTING ONE]

//===== tmdcbk.hpp -- timedCallback: Interface ======
#ifndef tmdcbk_hpp
#define tmdcbk_hpp
// -- Maximum number of simultaneously queued callbacks:
#define MaxTimedCallbacks 16 // Any number > 0 will do
class timedCallback
 {
 public:
 typedef
 long (*timedCallbackFun)(timedCallback *Entry);
 // Type of function to be called after a given number of tick()
 // invocations.
 // Return value = Number of ticks to wait till next call (0 if
 // the callback must not be requeued).
 timedCallback( );
 // Constructs a callback using a default 'do-nothing' function.
 // This allows you to create arrays of timedCallback objects and
 // define the callback function later by use of member setFun().
 timedCallback(timedCallbackFun SomeFun);
 // Constructs a real-time clock callback to be used for later calls
 // to queue() and cancel().
 void queue(long Delta);
 // Queues the callback to be called after the specified number of
 // invocations of member tick(). Delta == 0 cancels the callback.
 // Raises : NoRoom Callback queue cannot hold another entry.
 // (Not implemented)
 // Note : It is ok to requeue an already queued callback.
 void cancel( );
 // Cancels 'this' callback - if it is queued.
 // Note : It is ok to cancel a callback that is not queued.

 int isQueued( )
 // Returns 1 if 'this' callback is queued, 0 otherwise.
 { return index != 0; }
 static void tick( );
 // Advances the time by one tick. Queued callbacks may time out and
 // are invoked from within member tick().
 void setFun(timedCallbackFun SomeFun);
 // Redefines the callback function to be used for 'this' callback.
 ~timedCallback( )
 // Cancels the callback before it goes out of scope.. .
 { cancel(); }

 private:
 timedCallbackFun fun; // Address of the function to be called back.
 long clock;// System clock count to wait for
 unsigned index;// Aux. handshake index into the heap
 // Priority heap of queued callbacks is contained in:
 static timedCallback *heap[MaxTimedCallbacks + 1];
 static unsigned inUse;
 // Number of invocations of member tick():
 static long tickCount;
 void upHeap( );
 // Repositions 'this' heap entry if count member has been decreased
 void downHeap( );
 // Repositions 'this' heap entry if count member has been increased
 static long defaultFun(timedCallback* );
 // Default callback function used by the constructor timedCallback().
 // Prevent copying of timedCallback objects:
 // (You may want to modify this.. )
 timedCallback(const timedCallback& );
 timedCallback& operator = (const timedCallback& );
 };// class timedCallback
#endif // ndef tmdcbk_hpp
// tmdcbk.hpp







[LISTING TWO]

//===== timedCallback: Implementation. timedCallback::queue() silently drops
//===== queueing requests that cannot be satisfied if the heap is already
full.

#include "tmdCbk.hpp" // timedCallback interface
//#include "exc.hpp" // Exception handling (not used)

// static timedCallback data members
long timedCallback::tickCount = 0;
timedCallback *timedCallback::heap[MaxTimedCallbacks + 1];
unsigned timedCallback::inUse = 0;

// timedCallback::*() members
 long timedCallback::defaultFun(timedCallback* )
// Callback used as a default for the constructor.
 {
 return 0;// Don't requeue
 }// defaultFun()
 timedCallback::timedCallback( )
 {
 clock = index = 0;
 fun = timedCallback::defaultFun;
 }// timedCallback()
 timedCallback::timedCallback(timedCallbackFun SomeFun)
 {
 clock = index = 0;
 fun = SomeFun;
 }// timedCallback()
 void timedCallback::setFun(timedCallbackFun SomeFun)

 {
 fun = SomeFun;
 }// setFun()
 void timedCallback::queue(long Delta)
 {
 long OldClock = clock;
 if (Delta == 0) {
 cancel();
 }
 else {
 clock = tickCount + Delta;
 if (0 < index) {
 // Still queued ..
 if (0 <= clock - OldClock) {
 downHeap();
 }
 else {
 upHeap();
 }
 }
 else if (inUse < MaxTimedCallbacks) {
 // Not currently queued (and there is room!)
 index = inUse += 1;
 heap[inUse] = this;
 upHeap();
 }
 else {
 // NoRoom.raise(); Exception handling not available!
 // You may want to exit() to DOS or something.
 }
 }
 }// queue()
 void timedCallback::cancel( )
 {
 unsigned Index;
 Index = index;
 if (0 < Index) {
 index = 0;
 inUse -= 1;
 if (Index <= inUse) {
 heap[Index] = heap[inUse + 1];
 heap[Index]->downHeap();
 }
 // else cancelling the last entry is trivial ..
 }
 }// cancel()
 void timedCallback::tick( )
 {
 timedCallback **First = heap + 1;
 long NewTicks;
 // Advance tick count
 tickCount += 1;
 // Deliver timed-out callbacks
 while (0 < inUse && (*First)->clock == tickCount) {
 // Expired - deliver!
 NewTicks = (*(*First)->fun)(*First);
 if (NewTicks != 0) {
 // Callback wants to be requeued
 (*First)->clock = tickCount + NewTicks;

 }
 else {
 // Callback doesn't want to be requeued
 heap[inUse]->index = 1;
 (*First)->index = 0;
 *First = heap[inUse];
 inUse -= 1;
 }
 if (0 < inUse) {
 (*First)->downHeap();
 }
 }// while
 }// tick()
// PRIORITY-QUEUE (HEAP) MANAGEMENT
 void timedCallback::upHeap( )
 {
 unsigned K = index;
 // Use alias as a sentinel entry to ensure we'll dropout of this loop:
 heap[0] = this;
 // Move 'this' up until the heap condition is satisfied again:
 while (0 < heap[K >> 1]->clock - clock) {
 heap[K] = heap[K >> 1];
 heap[K]->index = K;
 K >>= 1;
 }
 // Actually insert 'this' at its new position
 heap[K] = this;
 index = K;
 }// upHeap()
 void timedCallback::downHeap( )
 {
 unsigned J,
 K = index,
 Kmax = inUse >> 1;
 // Scan down the heap to locate the new position for 'this':
 while (K <= Kmax) {
 J = K << 1;
 if (J < inUse) {
 if (heap[J 1]->clock - heap[J]->clock < 0) {
 J = 1;
 }
 }
 if (0 <= heap[J]->clock - clock)
 break;
 heap[K] = heap[J];
 heap[K]->index = K;
 K = J;
 }
 // Actually insert 'this' at its new position
 heap[K] = this;
 index = K;
 }// downHeap()
// tmdcbk.cpp




[LISTING THREE]


//===== play.cpp -- Example usage of timedCallback objects: Play a tune.
//===== Compiled for MS-DOS with Borland C++.

#include "tmdcbk.hpp" // Defines timed callbacks
#include <dos.h> // We need sound()
#include <time.h> // .. and clock()

// -- A sound is defined by the struct:
typedef struct {
 unsigned freq;
 unsigned delta;
 } aSound; // More elaborate sounds include fading, up/down sweeps, etc.
// To create a tune we must hand it a list of sounds:
aSound List[] = { {1000,4},{0,1},{500,3},{0,1},{600,3},{0,1},
 {700,1},{0,3},{700,1},{0,3},{700,4},{0,0} };
// A tune is defined as:
class tune : timedCallback {
 static long PlayFun(timedCallback* Self);
 aSound *toPlay;
 unsigned nextSound;
 public:
 tune(aSound* Tune): timedCallback(tune::PlayFun)
 { toPlay = Tune; };
 void play( )
 { nextSound = 0; queue(1); };
 void stop( )
 { cancel(); };
 private:
 tune();// Only allow tune(aSound*) to be used!
 };
 long tune::PlayFun(timedCallback* Self)
//-- Timed callback: Walks down the list of sounds, sets the speaker
// and requeues itself accordingly until it reaches delta == 0.
 {
 tune *This = (tune *)Self;
 aSound ThisSound = This->toPlay[This->nextSound];
 if (ThisSound.freq == 0) {
 nosound(); // sound(0) will not do!
 }
 else {
 sound(ThisSound.freq);
 }
 if (ThisSound.delta != 0) {
 This->nextSound += 1;
 }
 return ThisSound.delta;
 }// PlayFun()
void DriveTick(unsigned Delta)
//-- Aux. function used to avoid fooling around with the clock interrupt.
 {
 static clock_t LastClock = 0;
 for ( ; 0 < Delta ; Delta -= 1) {
 while (LastClock == clock())
 ;// Wait till next clock tick
 LastClock = clock();
 timedCallback::tick();
 }
 }// DriveTick()
 void main ( )

// Plays a single tune and quits.
 {
 tune Tune(List);
 Tune.play(); // If we had multi-tasking we'd simply do this ..
 DriveTick(50); // .. without having to drive the callbacks ourselves!
 }// main()




Example 1


timedCallback AutoRepeat(RepeatFun);
assuming that you have defined the function:
long RepeatFun(timedCallback* Self)
 {
 // Eg. generate duplicate keypress..
 return NewDelta;
 }





Example 2


class context : timedCallback

 {
 // Whatever RepeatFun needs to know!
 };
long RepeatFun(timedCallback* Self)
 {
 context *ContextPtr = (context *)Self;
 ...// Use ContextPtr->* to access data
 return NewDelta;
 }



Example 3:

void timedCallback::tick()
 {
 tickCount += 1;
 if (0 <= tickCount - nextOut)
 wakeUp.up();
 }












October, 1992
 IMPLEMENTING NLM-BASED CLIENT/SERVER ARCHITECTURES


Writing NLMs isn't as hard as you think


 This article contains the following executables NLM CS.ARC


Michael Day


Michael is a documentation engineer at Novell in the NetWare operating system
group. He is the coauthor of Troubleshooting NetWare for the 386 (M&T Books,
1991) and is working on a new book about NLM programming for NetWare 4.0 to be
published by Novell Press. You can reach him on CompuServe at 71670,475.


NetWare Loadable Modules (NLMs) are applications which are executed by the
NetWare 3.x operating system. As such, they are 32-bit protected-mode programs
able to take full advantage of the multitasking, multithreaded architecture of
the NetWare operating system.
This article presents a distributed file manager made up of two
modules--ENGINE.NLM, an NLM running on a NetWare 3.x server, and CLIENT.EXE, a
DOS-based front end running on the client. ENGINE.NLM is not a full-featured
file manager; rather, it's a basic implementation designed to illustrate
several key aspects of NLM development:
Implementing client-server communications using IPX.
Using thread-control mechanisms for NLMs, including semaphores.
Informing other servers about the NLM via the Service Advertising Protocol
(SAP).
Using transaction-tracking services (TTS) to ensure the integrity of the data
file.
Additionally, ENGINE.NLM illustrates multithreaded techniques in the context
of non-preemptive multitasking, and general issues such as NLM exit routines
and the NLM development environment.


About NLMs and NLM Programming


The most accurate way to view NLMs is as extensions to the NetWare operating
system. The NetWare loader and linker essentially bind a loaded NLM to the
operating system, and thereafter do not make a distinction between that NLM
and the OS itself. Like the NetWare OS, NLMs use a flat 32-bit memory
addressing model.
While it's possible to develop interactive, client-style applications as NLMs,
it isn't appropriate to do so. NLMs should be "server" modules, providing
"services" to clients located on other network stations. Moreover, NLMs should
provide services that take advantage of the NetWare OS architecture. For
example, a distributed file manager can take advantage of the fast 32-bit
NetWare file system. It wouldn't make sense to develop an NLM for rendering
bit-mapped graphics. Not only would this NLM slow file service to NetWare
clients, but the NetWare OS offers no inherent advantage to this type of
application over other 32-bit operating systems.
Even though NLMs are protected-mode programs, you don't need to worry about
protected-mode issues; the NetWare operating system and loader handle the
low-level details for you. The NetWare 386 runtime interface is a superset of
ANSI C, so there's no need to delve into the more complex NetWare APIs unless
you are particularly compelled to do so. Finally, NLMs use a flat memory
space, which means you don't have to worry about segmentation, near or far
pointers, and the like.
The most oft-repeated criticism of NLMs is that they run on an operating
system that lacks memory protection. An errant NLM can therefore bring down
the NetWare server by corrupting memory it doesn't own. Because NLMs don't run
on a memory-protected OS, some say that they're not a suitable platform for
application development. I believe this criticism has been blown out of
proportion. The implied assumption is that bugs are acceptable unless they
bring the server down--even if they corrupt something like an accounting
database. The fact remains that buggy programs can do serious damage even when
running on a memory-protected OS. Protection or not, it's the developer's
responsibility to ensure the program is bug free and robust. The lack of
memory protection hasn't kept developers from writing thousands of successful
applications for unprotected-environment platforms like DOS and the Macintosh
OS.
Memory protection benefits the developer by allowing detection of errant
memory operations early in the development process, before releasing the
program for QA test and production. You can, of course, trap NLM memory errors
during the development cycle using tools like Nu-Mega Technologies' NLM
Developers Kit, which includes a profiler, both network and NLM memory
checkers, and a low-level debugger.


About Netware 3.x


The NetWare 3.x environment is different from most mainstream commercial
operating systems because it is highly optimized for networking operations.
Therefore, it lacks certain features, such as memory protection and preemptive
multitasking, found in general-purpose operating systems. (Multitasking in
NetWare 3.x is non-preemptive.)
Other aspects of NetWare 3.x present opportunities and challenges to NLM
developers. For example, NetWare uses all its free memory to cache data files,
thus speeding network file service. This has interesting implications for my
distributed record manager, discussed later.
NetWare performs load-time linkage of NLMs to external routines. This reduces
memory consumption by NLMs because multiple NLMs may share the same libraries.
NLMs may also export routines for linkage by other NLMs when they load. The
external routines called by ENGINE.NLM are all exported by the NetWare C
Interface (CLIB.NLM).


The Development Environment


To create NLMs, you need Novell's software development kit (SDK) for NetWare
3.x. The SDK documents the NetWare C interface and provides the Watcom 32-bit
C/386 compiler, the Novell linker (NLMLINK), header files, and miscellaneous
utilities. You don't need to ship runtime support with your NLM because
CLIB.NLM, which provides the actual public symbols of the NetWare C interface,
ships with the standard NetWare package purchased by end users. For debugging,
there's the NetWare internal debugger, an assembly language debugger which is
part of the NetWare OS, and the Watcom Debugger, which runs on DOS.
NLMs must be compiled to support a flat memory model (using the /mf compiler
switch for Watcom C 8.0 and earlier), to use stack-based calling conventions
(using the /3s compiler switch), and to generate object files in Phar Lap's
Easy-OMF format (using the /ez compiler switch).
The Novell linker (NLMLINK) requires a definition file that describes the
characteristics of the NLM being linked, including which symbols it imports
from CLIB, which symbols it exports to other NLMs, and so on. All NLMs must be
linked to a special object file called PRELUDE.OBJ, which provides startup
code for the NLM.


ENGINE.NLM


ENGINE.NLM is the back end of a distributed record manager. Because of its
size (over 1000 lines of C source code), the program is only available
electronically; see "Availability" on page 5. The operations supported by
ENGINE are: adding a new record to the database; editing an existing record;
reading an existing record; and marking an existing record as deleted. ENGINE
performs all the file I/O itself, on behalf of the client, 32 bits at a time.
ENGINE is designed to clearly demonstrate NLM programming methods; as such,
it's too simple a record manager for practical use.
ENGINE performs the following basic operations:
1. Initialization.
2. Listening for request packets from clients.

3. Spawning a worker thread to perform the client's request when a packet
comes in.
4. Returning to step 3.
A record (rec) consists of a record header (rHeader) and a data structure
(rData). Example 1 shows the relevant typedefs in the header files ENGINE.H
and CLIENT.H.
Example 1: A record's two-part structure.

 typedef struct recordHeader {
 unsigned long status;
 unsigned long offset;
 unsigned long hashkey;

 unsigned long recordNumber;
 unsigned long transactionNumber;
 unsigned char key [128];
 } rHeader;

 typedef struct recordData {
 time_t creationTime;
 time_t lastReferenceTime;
 time_t lastUpdateTime;
 unsigned char nodeAddress[10];
 unsigned long objectID;
 unsigned char data[128];
 } rData;

 typedef struct record {
 rHeader header;
 rData data;
 } rec;

The status field of the recordHeader structure indicates the state of the
record. A value of 0 means the record is deleted or free, 1 means the record
is occupied, and 2 means that the record is the special database header always
located at offset 0 of the data file. The offset field gives the offset of the
record within the data file. The hashkey field is not used in this version of
the record manager, but will be included in a future version which supports
hashed keys. The transactionNumber field is used in conjunction with TTS, a
feature of the NetWare OS which preserves the integrity of data files in the
face of I/O errors. The recordNumber and key fields of the record header are
self explanatory.
The first three fields of the record's data portion indicate when the record
was created, when it was last read, and when it was last updated. The
nodeAddress field gives the network address of the last station which
requested an update of the record. The objectID field gives the NetWare ID of
the user who last updated the record. Finally, the data field contains the
record's data.
Note that an entire record is designed to fit within a single IPX packet,
which may be up to 576 bytes in size (including the 30-byte IPX header). This
simplifies things considerably, because ENGINE does not need to construct
multipacket messages in order to send an entire record to the client, nor does
the client need to defragment them. (Multipacket-message support is more
easily implemented using NetWare's Sequenced Packet eXchange, or SPX.)


CLIENT.EXE


CLIENT.EXE is the front-end component of the distributed record manager. Like
its ENGINE.NLM counterpart, CLIENT.EXE is available electronically; see
"Availability" on page 5. CLIENT provides all data input and display for the
user. CLIENT doesn't perform file I/O; it simply makes requests for ENGINE.NLM
to do so on its behalf and presents the results to the user. This lets the
application take full advantage of the speed and robustness of the NetWare
file system.
CLIENT.EXE's basic operations are:
1. Initialization.
2. Scanning for ENGINE.NLMs located on the internetwork.
3. Allowing the user to select a specific ENGINE.
4. Allowing the user to select specific operations.
5. Making a request of the ENGINE to perform the operation selected by the
user.
6. Displaying the results of the requested operation.
7. Returning to step 4.
CLIENT.EXE communicates with ENGINE.NLM using a simple client/server protocol.
Each packet contains an operation code telling the ENGINE which operation the
client is requesting. Supported operations are specified using #defined
constants; see header files ENGINE.H and CLIENT.H. For example, the code for
ADD_RECORD is 2 and for READ_RECORD is OxF5. Other codes include EDIT_RECORD,
DELETE_RECORD, and FIND_RECORD_KEY.
Packets sent or received by either ENGINE or CLIENT consist of an IPX header
followed by either a shortPacket or longPacket structure; see Example 2. The
only difference between the two is that the shortPacket structure contains the
record header, while the longPacket structure contains the entire record. Both
structures include a responseCode field and an operation field. The operation
field contains the operation code of a client's request, while the
responseCode contains a value unique to each request-response sequence. CLIENT
uses the responseCode field to verify the integrity of the response packets it
receives from ENGINE.NLM.
Example 2: Short and long IPX packets.

 typedef struct ipxheader {
 WORD checkSum;
 WORD length;
 BYTE transportControl;
 BYTE packetType;
 LONG destNet;
 BYTE destNode [6];
 WORD destSocket;
 LONG sourceNet;

 BYTE sourceNode [6];
 WORD sourceSocket;
 } IPX_HEADER;

 typedef struct shortPacket {
 WORD responseCode;
 BYTE operation;
 rHeader header;
 } sPacket;

 typedef struct longPacket {
 WORD responseCode;
 BYTE operation;
 rec record;
 } lPacket;



The Client/Server Protocol


Client-to-server communication follows a simple sequence that is slightly
different for each possible operation; see Figure 1.
Figure 1: The operation sequences for each command.
Add a Record
1. CLIENT allows the user to input the record key and data fields.
2. CLIENT sends an AddRecord request to the ENGINE. The packet includes the
entire record to be added.
3. ENGINE attempts to add the record to the data file.
4. ENGINE sends a response packet back to the client containing the record
header.
5. CLIENT infers from the status field of the record header (contained in the
response packet) whether or not the record was successfully added.
Read a Record
1. CLIENT allows the user to input a record key.
2. CLIENT sends a FindRecordKey request to the ENGINE. The packet includes
only the record header.
3. ENGINE attempts to find the matching record.
4. ENGINE sends a response packet back to the client containing the entire
record.
5. CLIENT infers from the status field of the record header (contained in the
response packet) whether or not the record was found.
6. If the record was found, CLIENT displays the record.
Edit a Record
Steps 1 through 5 are same as "Read a Record" above.
6. If the record was found, CLIENT displays the record and allows the user to
edit the record's data field. 7. CLIENT sends an EditRecord request to ENGINE.
The request packet contains the entire edited record. 8. ENGINE updates the
record by writing the edited record to the data file. 9. ENGINE sends a
response packet back to the client containing the updated record's header. 10.
CLIENT infers from the updated record's status field whether or not the edit
operation was successful.
Delete a Record
Steps 1 through 5 are same as "Read a Record" above.
6. If the record was found, CLIENT changes the record's status field to 0.
Steps 7 through 10 are the same as "Edit a Record" above.

Initially, sending and receiving packets using IPX is tricky, but it quickly
becomes familiar. All IPX operations use a data structure called an event
control block (ECB) that fully describes the packet an application wishes to
send or receive. Information contained in an ECB includes the socket number
upon which to send or receive the packet and the address and length of buffers
which contain the packet header and data. The ECB is declared in CLIENT.H; see
Example 3.
Example 3: Event control block (ECB) structure.

 typedef struct fragment {
 void far *fragAddress; // DOS version. All pointers
 WORD fragSize; // are NEAR for an NLM
 } ECBFragment;

 typedef struct ecb { // DOS version. NLM version
 void far *linkAddress; // is slightly different.
 void far *ESRAddress;
 BYTE inUseFlag;
 BYTE completionCode;
 WORD socket;
 BYTE IPXWorkspace[4];

 BYTE driverWorkspace[12];
 BYTE immediateAddress[6];
 WORD fragCount;
 ECBFragment fragList[2];
 } IPX_ECB;

Not all ECB fields must be initialized when you send or receive a packet. For
example, the immediateAddress field contains the network address of the
nearest router which knows the path to the ultimate destination of the packet.
You only need to initialize the immediateAddress field when you send a packet.
The most important ECB field is the fragment-descriptor field which contains
the address and length of the buffers that make up the packet. Buffer 0 must
always be the IPX header. Other buffers can be anything the application
defines. Combined, they make up the data field of the packet.
When you send a packet, IPX copies the buffers described in the sending ECB's
fragment-descriptor field and combines them into a packet, which it sends over
the network. When you receive a packet, IPX fragments to packet and copies the
different components to the buffers described in the receiving ECB's
fragment-descriptor field.
Note that when you send a packet, you must initialize the packet's IPX header
with the packet-type code (a value of 4 for IPX packets) and with the packet's
destination address.
Once you've initialized the ECB and (if necessary) the IPX header, you can
post the ECB for sending or receiving by calling either IPXSendPacket(IPX_ECB
*) or IPXListenForPacket(IPX_ECB *).


ENGINE.NLM Thread Control


ENGINE uses three primary threads to accomplish its work. The first thread is
main, which allocates required resources from the OS; registers functions with
the OS for cleaning up the environment when the NLM is unloaded; begins the
other two threads; and then goes to sleep until the user unloads ENGINE.
An important item within main is the call to AdvertiseService, which causes
the operating system to send broadcasts every 60 seconds informing other
servers on the network of the name, type, and network address of the
ENGINE.NLM. Novell calls this feature the service-advertising protocol (SAP).
Once ENGINE is advertising itself, CLIENT.EXE can discover the name and
location of the ENGINE.NLM by scanning the bindery of any server on the
network.
The second thread is InitMain, whose only job is to listen for query packets
from CLIENTs wishing to begin a session with the ENGINE. After initializing
itself, InitMain goes to sleep by calling WaitOnLocalSemaphore. The OS wakes
up InitMain as soon as a query packet comes in from a client. InitMain then
sends a query response packet back to the client and goes back to sleep. The
entire purpose of InitMain is to provide a starting point for the
client-server dialogue.
The third thread is EngineMain--the workhorse of the entire NLM. It listens
for request packets, evaluates the op code of packets it receives, and spawns
worker threads to accomplish the appropriate tasks on behalf of the client.
Like InitMain, EngineMain sleeps when there are no packets for it to process.
As soon as a request packet comes in, the OS awakens EngineMain. Because most
of the client/server traffic is in the form of request packets, EngineMain is
able to handle up to six incoming packets per execution cycle. Moreover, it
can spawn a worker thread and begin to listen for additional packets even
before the worker thread has started to perform the task requested by the
client.
ENGINE starts all its threads by making a call to BeginThread. This call
requires a pointer to the function that ENGINE wishes to execute as a thread,
and a pointer to a stack for the thread. Alternately, if you pass a NULL stack
pointer to BeginThread, the OS allocates a stack for the new thread.
Because NetWare is a non-preemptive multitasker, it is possible for a thread
to execute in a tight loop without relinquishing the CPU, thus shutting other
threads out and slowing overall server performance. For this reason, each
thread in ENGINE.NLM makes calls to the ThreadSwitch function within the body
of loops. ThreadSwitch merely moves the calling thread to the back of the
kernel's run queue, giving other threads a chance to run.


Unloading ENGINE Cleanly


ENGINE continues to run until a user unloads it using the Unload command at
the NetWare server console. When this occurs, the OS calls two functions in
the NLM which were registered by main at run time. The first function,
UnloadCleanUp, was registered using the signal API. The second function,
ShutdownCleanUp, was registered using the atexit API.
The OS calls UnloadCleanUp as soon as the user has issued the UNLOAD ENGINE
command. At this point, all threads continue to run. UnloadCleanUp sets the
global shutdown variable to 1, causing threads to exit their loops and return.
Next, UnloadCleanUp awakens sleeping threads by calling SignalLocalSemaphore,
and returns. When the threads wake up, they see that the shutdown variable is
equal to 1, and they kill themselves by exiting to main.
The OS calls ShutdownCleanUp after it has killed any threads which didn't kill
themselves. At this point, the only thing the NLM can do is free OS resources
it previously allocated, such as semaphores and sockets. ShutdownCleanUp does
this and then dies, allowing the NLM to unload cleanly.


Improving the Record Manager


Although ENGINE.NLM is admittedly simple, it does sport some advanced
features. For example, it supports an unlimited number of concurrent clients.
It also uses NetWare's TTS to ensure the integrity of the data file. I've
designed the architecture of ENGINE.NLM so it's easy to extend into a more
sophisticated record manager without substantial changes to the program's
structure. For example, defining the record header and data structures
separately allows for easy migration to an indexed record manager (which uses
a separate index file). In such a case, TTS support would be essential,
because updating a record requires updating two separate files.
Despite the simplicity of ENGINE.NLM's record-handling components, the basic
design scales up well because of its intelligent use of the NetWare OS. For
efficiency, all threads sleep when they don't have any work to do. Each client
request is handled by a different worker thread, which distributes the total
workload evenly, meaning that no single client may consume a disproportionate
share of computing resources. These are NLM design fundamentals that can apply
to any industrial-strength NLM.
If I were extending ENGINE.NLM to become a full-fledged record manager, I'd be
tempted to construct an index of the data file in memory, thus speeding access
to records. However, this approach can backfire in an NLM because NetWare uses
all its free memory to cache data files. An index file for ENGINE.NLM would,
over a short period of time, become cached by the OS. Therefore, it would be
as though the index were buffered in memory. However, the OS caches data files
on a per-block basis using an intelligent LRU algorithm. If I were to buffer
the index file myself, I would be overriding the OS caching algorithm with a
brute-force approach. This would be less effective overall than simply
allowing the OS to cache the index file for me.
_IMPLEMENTING NLM-BASED CLIENT/SERVER ARCHITECTURE_
by Michael Day


Example 1: A record's two-part structure

typedef struct recordHeader {
 unsigned long status;
 unsigned long offset;
 unsigned long hashkey;

 unsigned long recordNumber;
 unsigned long transactionNumber;
 unsigned char key[128];
} rHeader;
typedef struct recordData {
 time_t creationTime;
 time_t lastReferenceTime;
 time_t lastUpdateTime;
 unsigned char nodeAddress[10];

 unsigned long objectID;
 unsigned char data[128];
} rData;
typedef struct record {
 rHeader header;
 rData data;
} rec;




Example 2: Short and long IPX packets

typedef struct ipxheader {
 WORD checkSum;
 WORD length;
 BYTE transportControl;
 BYTE packetType;
 LONG destNet;
 BYTE destNode[6];
 WORD destSocket;
 LONG sourceNet;
 BYTE sourceNode[6];
 WORD sourceSocket;
} IPX_HEADER;
typedef struct shortPacket {
 WORD responseCode;
 BYTE operation;
 rHeader header;
} sPacket;
typedef struct longPacket {
 WORD responseCode;
 BYTE operation;
 rec record;
} lPacket;





Example 3: Event control block (ECB) structure.

typedef struct fragment {
 void far *fragAddress; // DOS version. All pointers
 WORD fragSize; // are NEAR for an NLM
} ECBFragment;
typedef struct ecb { // DOS version. NLM version
 void far *linkAddress; // is slightly different.
 void far *ESRAddress;
 BYTE inUseFlag;
 BYTE completionCode;
 WORD socket;
 BYTE IPXWorkspace[4];
 BYTE driverWorkspace[12];
 BYTE immediateAddress[6];
 WORD fragCount;
 ECBFragment fragList[2];
} IPX_ECB;
































































October, 1992
SAFE PROGRAMMING WITH MODULA-3


A full-featured language for software engineering and object-oriented
programming




Sam Harbison


Sam is the director of commercial systems at Tartan Inc. For the past two
years he has been working under contract to DEC SRC on the Modula-3
programming language. He is also the author of C: A Reference Manual and can
be contacted at Tartan Inc., 300 Oxford Dr., Monroeville, PA 15146 or at
harbison@tartan.com.


Programmers who prefer strongly typed, structured programming are frustrated
by languages that are either too simplistic, such as Pascal, or too costly and
complex, like Ada. They are looking for a language that is "just right," to
quote Goldilocks--language that supports long-term reliability and
maintainability, but also has enough modern, practical features to handle
large problems efficiently. A language, in fact, like Modula-3.
Modula-3 was designed by Digital Equipment Corp. and Olivetti in 1989 for
systems programming. (See the text box entitled "History of Modula-3.") The
designers had several goals:
To provide the abstractions necessary to structure large systems programs:
modules, objects, threads, and generics.
To provide the mechanisms for making programs safe and robust: strong type
checking, exceptions, isolation of unsafe code, and automatic garbage
collection.
To keep the language simple. Features were chosen that had been proven in
other languages, but compatibility with older languages wasn't important.
The result was a full-featured language for software engineering and
object-oriented programming. A feature-by-feature comparison puts Modula-3
roughly on par with Ada and C++; see Table 1. However, Modula-3 avoids the
complexity of those larger languages by simplifying individual features. For
example, Modula-3 supports object-oriented programming but implements single
rather than multiple inheritance. It supports generics, but the mechanism is
considerably simpler than that of Ada or C++. In practice, these
simplifications do not affect day-to-day programming. Paradoxically, Modula-3
is also the most stable language: C++, Ada, and Modula-2 are all being
"enhanced" in standards committees, in many cases to add features already
found in Modula-3.
Table 1: Feature comparison of some popular programming languages.

 Modula-3 C++ Ada Modula-2 Turbo C
 Pascal
 5.5
 ------------------------------------------------------------------------

 Generics yes no {*} yes no no no
 Exceptions yes no {*} yes no {*} no no
 Threads yes no yes no no no
 OOP yes yes no {*} no yes no
 User-defined
 operators no yes yes no no no
 Interfaces yes no yes yes yes no
 Strong typing yes some yes yes yes no
 Runtime safety
 checks yes no yes yes yes no
 Isolate unsafe
 features yes no yes yes no no
 Procedure types yes yes no yes yes yes
 Case-sensitive names yes yes no yes no yes
 Garbage collection yes no no no no no


{*} These features are coming in new versions of the language.


SRC Modula-3


Before discussing language features, I should note that DEC provides
free-of-charge a high-quality Modula-3 compiler, called SRC Modula-3, which is
available in source form on the Internet (gatekeeper.dec.com in directory
/pub/DEC/Modula-3). SRC Modula-3 runs on most UNIX workstations and is in use
at many universities, companies, and research laboratories. SRC Modula-3 also
includes a rich runtime library, including UNIX and X-window interfaces, and
an object-oriented X-Window programming system called Trestle.


Touring the Language


Modula-3's syntax holds no big surprises; it is based on Modula-2 and,
therefore, Pascal. Statements, expressions, and declarations are similar to
those found in other Pascal-family languages. Modula-3, however, deviates from
Modula-2 when necessary. For example, the precedence of arithmetic and logical
operators follows the more natural convention found in C, Ada, and Fortran
rather than the one used in Pascal and Modula-2; see Table 2.

Table 2: Some differences between Modula-2 and Modula-3.

 Modula-3 Modula-2
 ------------------------------------------------------------------------

 Declarations Names visible throughout Must declare names
 scope; can initialize before use; cannot
 variables. initialize variables.

 Types Structural equivalence. Name equivalence.

 Expressions C-like precedence; A.B Pascal-like precedence;
 shorthand for A^.B, and so on. no shorthand for A^.B.

 Statements FOR loop declares its own FOR-loop variable must
 variable. be declared by
 programmer.

 Pointers Syntax: REF T or REFANY; Syntax: POINTER TO T;
 runtime type testing and no runtime type
 garbage collection. testing; manual
 deallocation of storage.

 Built-in MIN, MAX apply to value MIN, MAX apply to
 functions pairs, yield smaller and types, yield smallest
 larger values. and largest elements.

 Strings Variable-length, read-only; Fixed maximum length,
 different from ARRAY OF CHAR. read-write, same as
 ARRAY OF CHAR.

 Isolation of UNSAFE keyword reveals Most unsafe features
 unsafe features unsafe features of are provided by
 language. SYSTEM interface.

 OOP, generic, Supported. Not currently available.
 exceptions

In large programs, it is important to place some structure on collections of
procedures and variables, restricting the proliferation of names. Modula-3
programs are structured as collections of modules and interfaces. An interface
specifies a set a public facilities: types, variables, constants, and
procedures. The interface is a contract between the facilities' developers and
their clients. A module implements an interface by supplying private data and
bodies for interface procedures. To use an interface, a client must import the
interface. Interfaces and modules are stored in different files and compiled
separately.
You can change a module without recompiling clients of the interface. Consider
the Modula-3 version of "Hello, World" in Figure 1. The Hello module exports
(implements) an interface named Main, a built-in interface that identifies the
starting point of a program. Hello also imports two interfaces, Wr and Stdio,
that provide basic stream-oriented I/O facilities. (These interfaces are part
of the standard libraries supplied with SRC Modula-3.) Wr.PutText is an
example of how names of imported procedures and variables are qualified by the
interface name. This makes it easy to keep track of name sources in large
programs.
Figure 1: Modula-3 version of the classic "Hello, World!" program.

 MODULE Hello EXPORTS Main;
 IMPORT Wr, Stdio;
 BEGIN
 Wr. PutText(Stdio.stdout, "Hello, World!\n");
 Wr. Close(Stdio.stdout);
 END Hello.

More Details.


Field Lists


In the rest of this article I'll illustrate the features of Modula-3 by
looking at a realistic example--an input line parser modeled on the input
facilities of the Awk language. In most programming languages, dealing with
free-form numeric and text input is a hassle. Even C, which has a pretty good
I/O library, forces you to descend into the mysteries of scanf to read
numbers. In Awk, input is effortless: Input lines are automatically broken
into whitespace-delimited fields that are referred to by number and can be
used as text or in numeric expressions. The goal is to write a module to
provide Awk-like input for Modula-3.
Modula-3 supports both object-oriented and traditional programming models. In
this case, our input-parser interface uses an object model in which a client
creates an object called a "field list" and then uses it to read fields from
input lines. The steps to follow are:
1. Create a field-list object, fl.
2. Read a line into fl by calling fl.getLine().

3. Get the value of the nth field as a string with fl.text(n). Get the value
of the nth field as a number with fl.number(n).
4. When finished with the current line, go back to step 2.
The design for field lists gives us the opportunity to discuss two
particularly interesting features of Modula-3: opaque types and threads.
Opaque object types allow you to reveal only the user-visible part of an
object definition in an interface, hiding its implementation in a module.
Modula-3's support for threads lets you take advantage of preemptive
multitasking on any computer, and real multiprocessing on computers that
support it.


The Interface


The FieldList interface (see Listing One, page 126) declares several types,
two constants, and an exception. The principal type (the field list) is named
T by convention; clients will refer to it as FieldList.T. The relevant
declarations in Listing One begin with the statement T<: Public; and include
the METHODS ..END block. The <: operator signals that this is an opaque-type
declaration. Translated into English, the declarations say, "Type T is an
object type descended from type Public, which in turn is descended from type
MUTEX. Type Public (and hence T) has these methods." Thus, we have the method
specifications for T, but have yet to explain the private data and methods.
(We'll see how type T's declaration is completed later.)
The FieldList interface also includes declarations to support error handling
(the EXCEPTION declaration and the RAISES clauses in the method declarations)
and multitasking (the type MUTEX in the declaration of Public and the
Thread.Alerted exception in the declaration of getLine).
There are a few other interesting things to note in this interface. Some names
are used before they are declared. This is because in Modula-3, the visibility
of a name extends both before and after the name in the current scope. Type
Rd.T is a "reader," a general input stream that acts like the input side of a
C FILE * stream. Type NumberType is the particular floating-point type used to
represent the numeric values of fields. Modula-3 has three floating-point
types: REAL, LONGREAL, and EXTENDED, corresponding to the IEEE floating-point
standard types. Several convenient shortcuts are also demonstrated. The
declaration getLine(rd: Rd. T:= NIL) RAISES {...} indicates that method
getline takes a single parameter of type Rd. T, and that the parameter has a
default value of NIL. You can omit the argument when calling the method. Even
more concisely, the declaration init(ws:=DefaultWS): T, the parameter of the
init method has a default value, and the parameter type is omitted. Modula-3
determines the types from the default value, so this declaration is the same
as init(ws: SET OF CHAR: = DefaultWS): T. You can also omit the type in
variable declarations if you supply an initial value of the same type.


The Client


Listing Two (page 126) is Sum, a small program that reads lines, sums all the
numbers on each line, and prints the result. The basic idea is simple, but it
demonstrates several features of Modula-3. The field list is created in the
top-level variable declaration VARfl: = NEW (FieldList.T).init(WhiteSpace).
The NEW function creates a new field-list object, to which the init method is
immediately applied. The init method returns the initialized object. Modula-3
does not have automatic constructors (like C++), but use of the init method is
a convention. The program calls fl.getLine() to read a line from the standard
input and loops over the input fields, summing the numbers.
The loop body also shows an example of a WITH statement. Modula-3's WITH
statement differs from those of Pascal and Modula-2, which are used to make
record field names visible. In Modula-3, WITH is used to introduce a new
identifier and bind it to an arbitrary variable or value for the duration of
the enclosed statements. If the value is a variable designator (such as an
array element or record field), the new name is aliased to the variable, and
it can be read or written. Otherwise, as in this case, the new identifier is a
read-only value. WITH is surprisingly convenient, and you will see many
examples of it in the implementation of FieldList.


Exceptions


Error handling in Modula-3 programs is accomplished with exceptions. By using
exceptions, you don't have to check return values on every procedure
call--something so tedious that most C programmers don't bother to do it. The
FieldList interface exports the exception Error, which is raised by certain
methods in response to an error. A typical client error would be trying to
read the 12th field in a line with only 11 fields.
When an exception is raised, it propagates out of the current procedure into
the caller. If it is not handled there, it continues to propagate outward
until a handler is found. If no procedure handles the exception, the Modula-3
runtime system terminates the program with a suitable message. Exceptions are
part of the specification of procedures in Modula-3; every procedure or method
that can raise an exception must list that exception in a RAISES clause. Note
the RAISES {Error} clauses in the method declarations in FieldList.
The Sum program deals with exceptions in two ways: by handling some exceptions
and ignoring others. The outer loop in the main program is surrounded by a
TRY..EXCEPT..END block, which handles any exceptions raised by procedures
inside the loop. This particular exception handler simply prints a message and
allows the program to finish normally. One message is provided for the
end-of-file exception, and another message for all other exceptions.
An alternative to providing an exception handler can be seen in the Put
procedure, which includes the pragma <*FATAL Wr.Failure, Thread.Alerted*>.
Because there is no exception handler or RAISES clause in the Put procedure,
it is a checked runtime error to raise any exception within that procedure.
But the compiler knows from the Wr interface that Wr.PutText can raise the
Wr.Failure and Thread.Alerted exceptions, so it will warn the programmer of
the potential runtime error. The FATAL pragma says, in effect, that it's OK to
halt the program if these exceptions are raised within Put, and so the warning
message should be suppressed.


The Module


It's time to turn to the implementation of FieldList, shown in listing Three,
page 126. Given the interface shown in Listing One, this module must do at
least two things: complete the opaque declaration of FieldList.T, and supply
procedures to implement the methods for that type.
Note the REVEAL declaration in Listing Three. This "revelation" adds to the
previous declaration of T in the interface a set of private fields that
clients cannot see. The keyword BRANDED is required for reasons we won't go
into here. Notice also that the object fields have initializers. These
initializations are performed whenever the object is created, and take the
place of C++ constructors. For more complex initialization, a separate method
must be used.
The revelation also associates procedures with methods by a series of lines of
the form methodname:=procedure name. In the FieldList example we've given the
procedures the same names as the methods. The method and its procedure must
have compatible signatures. The signatures for isANumber are shown in Figure
2. The procedure includes an extra argument representing the object on which
the method is operating. Some programmers name the extra parameter self by
convention. The method call fl.isANumber(n) is equivalent to isANumber(fl, n),
assuming that this procedure is currently bound to the method.
Figure 2: Signatures for isANumber.

 Method:
 isANumber (n: FieldNumber) : BOOLEAN RAISES {Error}
 Procedure:
 isANumber (self: T; n: FieldNumber) : BOOLEAN RAISES {Error}

In addition to providing better abstraction and information hiding, keeping
the type revelation in the module means that changing the hidden fields or
procedures of the object type does not force clients to be recompiled.


Strings and Arrays


You'll notice that the FieldList interface uses the built-in string type TEXT,
but the module also uses arrays of characters. Strings (type TEXT) are
extremely convenient in Modula-3. They are dynamically allocated and can be of
any length. Once created, a TEXT value cannot be changed, so strings have
value semantics--no one can change the value of a string you are holding. The
built-in interface Text provides basic construction and testing operations on
strings, but no searching functions.
For intense character manipulation, you can also convert a TEXT value to an
array of characters, as in the getLine procedure. In most languages, arrays of
characters are difficult to deal with because they must have a fixed
compiletime size. Modula-3 provides a compromise: Although stack-based arrays
must have a fixed size, dynamic arrays can be allocated with a runtime size.
For example, the FieldList.T object contains a field declared as chars: REF
ARRAY OF CHAR. This is a reference (pointer) to an open (unbounded) array of
characters. In getLine, the self.chars and Text.SetChars statements store in
self.chars a reference to an array of line-Length characters and then fill
that array with the characters from string text. Unlike C and C++, dynamic
arrays in Modula-3 have subscript bounds checking built in. The AddDescriptor
procedure is a good example of how to use dynamic arrays, including how to
grow them when necessary. All open arrays are 0 based, and the function
NUMBER(a) can be used to determine the number of elements in any array a.
SUBARRAY can be used to designate a contiguous set of elements in an array.


Threads


Multitasking with threads is a useful structuring tool in many applications.
Threads are independent control points, or mini processes, that execute within
your program's address space. Each thread has its own stack but shares access
to global variables and the heap. It is not unreasonable for a large program
to have dozens of threads performing various activities. Most new operating
systems, including Windows NT, OS/2, and Mach (OSF/1) provide built-in support
for preemptive multitasking with threads. So does Modula-3, even on operating
systems that don't provide thread support directly.
Coupled with the benefits of threads is the danger of race conditions, which
occurs when two threads attempt to modify a shared data structure at the same
time. One thread may be suspended after partially modifying the data
structure, leaving the data structure in an inconsistent state. A second
thread might then trip over the inconsistency. The solution is to synchronize
access by "locking" the data structure while it is being used, using a
mutual-exclusion semaphore ("mutex" for short). Each thread locks the mutex
while using the data structure; if a thread already has the mutex locked, the
second thread will be forced to wait.
The implementation of FieldList does not use threads, but it is "thread
friendly." That means FieldList provides the necessary synchronization so that
clients can be multithreaded without worrying about race conditions. In
Modula-3, mutexes are provided by the built-in object type MUTEX. You can
store MUTEX objects in your data structures or, as in FieldList, you can
simply make your object type a descendant of MUTEX, effectively turning your
object into a mutex itself. In using mutexes, there is a special block
statement, LOCK mu DO..END, so that you can't forget to unlock a locked mutex.
The LOCK statement locks the mutex mu while the enclosed statements are
executed, and ensures that the mutex is properly unlocked when the statements
terminate, even if they are terminated by an exception or RETURN within the
LOCK statement. Throughout the FieldList module, you will see LOCK statements
surrounding access to field lists.



Safety


What is Modula-3's most important advantage? Without a doubt, it is safety.
While the exception mechanism encourages the creation of robust programs, the
Modula-3 language is inherently safe and does not require special attention by
the programmer.
Some of the hardest bugs to find are those that cause a valid source-code
statement to execute incorrectly at runtime. For example, the C statement
s->a[i]=v could fail for many reasons: The pointer s might be null or might
point to unallocated storage; the value of i might be too large to use as an
index into the array; or v might contain an out-of-bounds value because it was
never initialized. ANSI C lists 97 ways in which a C program's behavior might
be unpredictable at compile time or run time. Even Ada, usually considered a
"safe" language, does not protect you against dangling pointers or
uninitialized variables.
Modula-3 guarantees that all runtime assumptions remain valid. It will
initialize your variables (if you don't) to ensure that they always contain
values of their declared types. It checks pointer conversions at runtime for
type safety, and does not permit you to deallocate storage directly. Some
features found in other languages, such as taking the address of a local
variable, are prohibited because they are unsafe and because detecting their
misuse at run time would be too costly. These checks and rules avoid many bugs
and catch the remaining ones quickly, before their effects can spread. In my
experience, the error messages provided by the Modula-3 runtime system are so
good that I rarely have to use the debugger to locate the cause of an error.
A good example of this safety and the new programming features it makes
possible is runtime type testing. Consider the getReal procedure in Figure 3,
which accepts a pointer of any type (type REFANY) and returns as a
floating-point number the value pointed to.
Figure 3: Procedure to accept a pointer of any type and return as a
floating-point number the value pointed to.

 PROCEDURE GetReal(ptr: REFANY) : REAL = (* Return ptr^ as a REAL *)
 VAR realPtr:= NARROW(ptr, REF REAL);
 BEGIN
 RETURN realPtr^;
 END GetReal;

The built-in NARROW function is used here to convert a "pointer to anything"
to a "pointer to REAL" (REF REAL). In most languages, this code would be
unsafe: The argument ptr might point to some other type of value whose bits
don't constitute a valid floating-point number. Modula-3, however, checks at
run time that ptr is a value of type REF REAL. If it is not, the call to
NARROW will fail. This runtime type checking is possible because all dynamic
storage contains type information primarily intended for the garbage
collector. You can use this type information yourself. For example, Figure 4
is another version of GetReal that shows how runtime type testing can be made
explicit.
Figure 4: Making explicit runtime type testing in the GetReal procedure.

 PROCEDURE GetReal2(ptr: REFANY) : REAL = (* Return prt^, or 0.0*)
 BEGIN
 IF ptr # NIL AND ISTYPE(ptr, REF REAL) THEN
 RETURN NARROW(ptr, REF REAL)^;
 ELSE
 RETURN 0.0; (* ptr is not what we expected *)
 END;
 END GetReal2;

Some programmers worry about garbage-collection overhead, but what are the
real costs and benefits? Good garbage-collection algorithms seem to impose no
more than a 10 percent overhead on runtime performance, and can actually save
time on smaller programs or programs that use inefficient heap managers. This
is a reasonable investment for reliability. Garbage collection also shortens
development and makes programs smaller by eliminating the need to write
storage-management code. Many OOP languages, such as Small-Talk, Eiffel, and
Trellis include garbage collection.


Loopholes


Of course, systems programmers cannot restrict themselves to "safe"
programming at all times. Languages that are too strict about safety may not
be usable for writing low-level code, like storage allocators. Therefore, if
you declare a module UNSAFE, Modula-3 gives you access to a variety of unsafe
but practical features, such as unrestricted type conversions (via function
LOOPHOLE), address arithmetic, and the DISPOSE procedure to deallocate
storage. The compiler cannot ensure the safety of UNSAFE modules, but at least
it makes you isolate and identify unsafe code.


The Tools


A programming language without good tools is almost useless. As previously
mentioned, SRC Modula-3 includes a rich runtime library, including UNIX and
X-window interfaces and an object-oriented X-Window programming system.
Program rebuilding is easy in SRC Modula-3 using the supplied m3 driver
program. The command line m3 -o Progmake * .i3 * .m3 will cause the driver to
inspect all the interfaces (*.i3) and modules (*.m3) in the current directory;
compute dependencies based on IMPORT and EXPORT declarations; determine which
source files have changed and which interfaces are out of date; recompile
anything that needs to be recompiled; and link program Prog. For more complex
programs, the m3make utility allows you to describe your program abstractly
without computing file dependencies by hand. You can also add arbitrary
make-like dependencies to your m3makefile.
Modula-3 brings together the long-term maintainability of Ada, the simplicity
of Pascal, and the modern object-oriented programming facilities of C++. The
result is a clean language that makes it easy to write robust and maintainable
programs.


References


Harbison, Samuel P. Modula-3. Englewood Cliffs, NJ: Prentice Hall, 1992.
Modula-3 News. Pittsburgh, PA: Pine Creek Software.
Nelson, Greg. Systems Programming with Modula-3. Englewood Cliffs, NJ:
Prentice Hall, 1991.
Usenet News Group: comp.lang. modula3.


History of Modula-3


Modula-3 was developed by researchers at DEC's Sytems Research Center (SRC)
and the Olivetti Research Center in 1989. It borrows from two evolutionary
lines of programming languages: an academic line, represented by Niklaus
Wirth's Pascal, Modula-2, and Oberon languages; and an industrial research
line, represented by the Mesa, Cedar, and Euclid languages from the Xerox Palo
Alto Research Center. Its immediate parent is Modula-2+, an extension of
Modula-2 developed at SRC in the early 1980s and used in their research
systems. In 1986, Maurice Wilkes, who had developed the first practical
electronic stored-program computer at Cambridge 37 years earlier, sparked an
effort to "clean up" Modula-2+. With Niklaus Wirth's blessing, this became a
design for a new language, Modula-3. The original language report was issued
in 1988, with minor revisions in 1989 and 1990.
Modula-3 emphasizes safety and maintainability and is gaining important
converts outside DEC. The Computer Science Laboratory at Xerox PARC has
adopted the language for its research software, and the University of
Cambridge in England is now teaching Modula-3 to its computer science
students.
--S.H.


_SAFE PROGRAMMING WITH MODULA-3_
by Sam Harbison


[LISTING ONE]

INTERFACE FieldList;
(* Breaks text lines into a list of fields which can be treated
 as text or numbers. This interface is thread-safe. *)
IMPORT Rd, Wr, Thread;
EXCEPTION Error;
CONST
 DefaultWS = SET OF CHAR{' ', '\t', '\n', '\f', ','};
 Zero: NumberType = 0.0D0;
TYPE
 FieldNumber = [0..LAST(INTEGER)]; (* Fields are numbered 0, 1, ... *)
 NumberType = LONGREAL; (* Type of field as floating-point number *)
 T <: Public; (* A field list *)
 Public = MUTEX OBJECT (* The visible part of a field list *)
 METHODS
 init(ws := DefaultWS): T;
 (* Define whitespace characters. *)
 getLine(rd: Rd.T := NIL)
 RAISES {Rd.EndOfFile, Rd.Failure, Thread.Alerted};
 (* Reads a line and breaks it into fields that can be
 examined by other methods. Default reader is Stdio.stdin. *)
 numberOfFields(): CARDINAL;
 (* The number of fields in the last-read line. *)
 line(): TEXT;
 (* The entire line. *)
 isANumber(n: FieldNumber): BOOLEAN RAISES {Error};
 (* Is the field some number (either integer or real)? *)
 number(n: FieldNumber): NumberType RAISES {Error};
 (* The field's floating-poinnt value *)
 text(n: FieldNumber): TEXT RAISES {Error};
 (* The field's text value *)
 END;
END FieldList.






[LISTING TWO]

MODULE Sum EXPORTS Main; (* Reads lines of numbers and prints their sums. *)
IMPORT FieldList, Wr, Stdio, Fmt, Rd, Thread;
CONST WhiteSpace = FieldList.DefaultWS + SET OF CHAR{','};
VAR
 sum: FieldList.NumberType;
 fl := NEW(FieldList.T).init(ws := WhiteSpace);
PROCEDURE Put(t: TEXT) =
 <*FATAL Wr.Failure, Thread.Alerted*>
 BEGIN
 Wr.PutText(Stdio.stdout, t);
 Wr.Flush (Stdio.stdout);
 END Put;

BEGIN
 TRY
 LOOP
 Put("Type some numbers: ");
 fl.getLine();
 sum := FieldList.Zero;
 WITH nFields = fl.numberOfFields() DO
 FOR f := 0 TO nFields - 1 DO
 IF fl.isANumber(f) THEN
 sum := sum + fl.number(f);
 END;
 END;
 WITH sumText = Fmt.LongReal(FLOAT(sum, LONGREAL)) DO
 Put("The sum is " & sumText & ".\n");
 END(*WITH*);
 END(*WITH*);
 END(*LOOP*)
 EXCEPT
 Rd.EndOfFile =>
 Put("Done.\n");
 ELSE
 Put("Unknown exception; quit.\n");
 END(*TRY*);
END Sum.






[LISTING THREE]

MODULE FieldList;
(* Designed for ease of programming, not efficiency. We don't bother to reuse
 data structures; we allocate new ones each time a line is read. *)
IMPORT Rd, Wr, Text, Stdio, Fmt, Thread, Scan;
CONST DefaultFields = 20; (* How many fields we expect at first. *)
TYPE
 DescriptorArray = REF ARRAY OF FieldDescriptor;
 FieldDescriptor = RECORD
 (* Description of a single field. The 'text' field and 'real'
 fields are invalid until field's value is first requested.
 (Invalid is signaled by 'text' being NIL. *)
 start : CARDINAL := 0; (* start of field in line *)
 len : CARDINAL := 0; (* length of field *)
 numeric: BOOLEAN := FALSE; (* Does field contain number? *)
 text : TEXT := NIL; (* The field text *)
 number : NumberType := 0.0D0; (* The field as a real. *)
 END;
REVEAL
 T = Public BRANDED OBJECT
 originalLine: TEXT; (* the original input line *)
 chars : REF ARRAY OF CHAR := NIL; (* copy of input line *)
 nFields : CARDINAL := 0; (* number of fields found *)
 fds : DescriptorArray := NIL; (* descriptor for each field *)
 ws : SET OF CHAR := DefaultWS; (* our whitespace *)
 OVERRIDES (* supply real procedures for the methods *)
 init := init;
 getLine := getLine;

 numberOfFields := numberOfFields;
 line := line;
 isANumber := isANumber;
 number := number;
 text := text;
 END;
PROCEDURE AddDescriptor(t: T; READONLY fd: FieldDescriptor) =
 (* Increment the number of fields, and store fd as the
 descriptor for the new field. Extend the fd array if necessary. *)
 BEGIN
 IF t.nFields >= NUMBER(t.fds^) THEN
 WITH
 n = NUMBER(t.fds^), (* current length; will double it *)
 new = NEW(DescriptorArray, 2 * n)
 DO
 SUBARRAY(new^, 0, n) := t.fds^; (* copy in old data *)
 t.fds := new;
 END;
 END;
 t.fds[t.nFields] := fd;
 INC(t.nFields);
 END AddDescriptor;
PROCEDURE getLine(self: T; rd: Rd.T := NIL)
 RAISES {Rd.EndOfFile, Rd.Failure, Thread.Alerted} =
 (* Read an input line; store it in the object; finds all the
 whitespace-terminated fields. *)
 VAR
 next : CARDINAL; (* index of next char in line *)
 len : CARDINAL; (* # of characters in current field *)
 lineLength: CARDINAL; (* length of input line *)
 BEGIN
 IF rd = NIL THEN rd := Stdio.stdin; END; (* default reader *)
 LOCK self DO
 WITH text = Rd.GetLine(rd) DO
 lineLength := Text.Length(text);
 self.originalLine := text;
 self.fds := NEW(DescriptorArray, DefaultFields);
 self.nFields := 0;
 self.chars := NEW(REF ARRAY OF CHAR, lineLength);
 Text.SetChars(self.chars^, text);
 END;
 next := 0;
 WHILE next < lineLength DO (* for each field *)
 (* Skip whitespace characters *)
 WHILE next < lineLength AND (self.chars[next] IN
 self.ws) DO INC(next);
 END;
 (* Collect next field *)
 len := 0;
 WHILE next < lineLength
 AND NOT (self.chars[next] IN self.ws) DO
 INC(len); INC(next);
 END;
 (* Save information about the field *)
 IF len > 0 THEN
 AddDescriptor(self, FieldDescriptor{start:=
 next - len, len := len});
 END;
 END(*WHILE*);

 END(*LOCK*);
 END getLine;
PROCEDURE GetDescriptor(t: T; n: FieldNumber): FieldDescriptor RAISES {Error}
=
 (* Return the descriptor for field n. Depending on user's wishes,
 treat too-large field numbers as empty fields or as an error. *)
 BEGIN
 (* Handle bad field number first. *)
 IF n >= t.nFields THEN
 RAISE Error;
 END;
 (* Be sure text and numeric values are set. *)
 WITH fd = t.fds[n] DO
 IF fd.text # NIL THEN RETURN fd; END; (* Already done this *)
 fd.text := Text.FromChars(SUBARRAY(t.chars^, fd.start,
 fd.len));
 TRY (* to interpret field as floating-point number *)
 fd.number := FLOAT(Scan.LongReal(fd.text), NumberType);
 fd.numeric := TRUE;
 EXCEPT
 Scan.BadFormat =>
 TRY (* to interpret field as integer *)
 fd.number := FLOAT(Scan.Int(fd.text),
 NumberType);
 fd.numeric := TRUE;
 EXCEPT
 Scan.BadFormat => (* not a number *)
 fd.number := Zero;
 fd.numeric := FALSE;
 END;
 END;
 RETURN fd;
 END(*WITH*);
 END GetDescriptor;
PROCEDURE numberOfFields(self: T): CARDINAL =
 BEGIN
 LOCK self DO RETURN self.nFields; END;
 END numberOfFields;
PROCEDURE isANumber(self: T; n: FieldNumber): BOOLEAN RAISES {Error} =
 BEGIN
 LOCK self DO
 WITH fd = GetDescriptor(self, n) DO RETURN fd.numeric; END;
 END;
 END isANumber;
PROCEDURE number(self: T; n: FieldNumber): NumberType RAISES {Error} =
 BEGIN
 LOCK self DO
 WITH fd = GetDescriptor(self, n) DO RETURN fd.number; END;
 END;
 END number;
PROCEDURE line(self: T): TEXT =
 BEGIN
 LOCK self DO RETURN self.originalLine; END;
 END line;
PROCEDURE text(self: T; n: FieldNumber): TEXT RAISES {Error} =
 BEGIN
 LOCK self DO
 WITH fd = GetDescriptor(self, n) DO
 RETURN self.fds[n].text;
 END;

 END(*LOCK*);
 END text;
PROCEDURE init(self: T; ws := DefaultWS): T =
 BEGIN
 LOCK self DO
 self.ws := ws;
 END;
 RETURN self;
 END init;
BEGIN
 (* No module initialization code needed *)
END FieldList.



Figure 1: Modula_3 version of the classic "Hello, World!" program


MODULE Hello EXPORTS Main;
IMPORT Wr, Stdio;
BEGIN
 Wr.PutText(Stdio.stdout, "Hello, World!\n");
 Wr.Close(Stdio.stdout);
END Hello.




Figure 2. Signatures for isANumber.

Method Procedure
isANumber(n: FieldNumber): BOOLEAN
 RAISES {Error} isANumber(self: T; n: FieldNumber): BOOLEAN
 RAISES {Error}


Figure 3. Procedure to accept a pointer of any type and return as
a floating-point number the value pointed to.


PROCEDURE GetReal(ptr: REFANY): REAL = (* Return ptr^ as a REAL *)
 VAR realPtr := NARROW(ptr, REF REAL);
 BEGIN
 RETURN realPtr^;
 END GetReal;


Figure 4. Making explicit run-time type testing in the GetReal
procedure

PROCEDURE GetReal2(ptr: REFANY): REAL = (* Return ptr^, or 0.0 *)
 BEGIN
 IF ptr # NIL AND ISTYPE(ptr, REF REAL) THEN
 RETURN NARROW(ptr, REF REAL)^;
 ELSE
 RETURN 0.0; (* ptr is not what we expected *)
 END;
 END GetReal2;
































































October, 1992
A SOURCE CODE PROFILER


Identifying code that needs to work faster




Keith W. Boone


Keith is the general manager of Computer Maintenance Organization in
Tallahassee, Florida and is a contributing author to Tricks of the Windows 3.1
Masters (Howard Sams, 1992). He can be reached on CompuServe at 75230,2070.


The 90/10 rule is a well-known rule in programming that says the users will
spend 90 percent of their time using 10 percent of your code. Knowing this,
you should spend most of your time optimizing that code, but how do you avoid
spending 90 percent of your time just identifying it? This problem is
especially apparent in highly complex or recursive algorithms -- you cannot
easily determine how often functions are used. Without special hardware or use
of assembly language, how can you identify the code that really needs to work
faster?
Fortunately, this is a well-studied problem, and tools called "profilers"
exist to help you locate which functions are being used. A number of profilers
are available (Microsoft's Source Profiler, Borland's Turbo Profiler, and
Watcom's Profiler and Sampler all spring to mind); some are bundled with
compilers, others are not. However, effective profilers are not always
efficiently used. In this article, I'll discuss ways you can make the most out
of a profiler, using as an example a working profiler you can build yourself.
While most of my discussion focuses on the design and implementation of my
profiler, many of the techniques governing the program's use can be extended
to other profilers.
I've used my profiler on 286/386/486 systems under DOS 3.3, 4.01, and 5.0. It
makes use of the hardware clock interrupt (INT 8) which executes a service
routine 18.2 times per second. The profiler uses this interrupt to
periodically determine where your program is currently executing.
The program was originally written using Microsoft's C 5.1 compiler and has
been tested with C 5.1, C 6.0, and QuickC 2.0. It should work with little or
no modification in other environments, provided that the extended keywords far
and interrupt are supported. It may cause problems in systems using
diskcaching programs, and it will not run under Windows. In order to generate
useful reports, your linker must be able to create MAP files that supply the
segment and offset address of the functions in your program in a format
similar to that created by Microsoft's linker.


How the Profiler Works


The profiler is divided into three parts: PROFILER.C (Listing One, page 128);
SPEEDUP.C (Listing Two, page 128); and PROF.C (Listing Three, page 128).
Listings One and Two contain the runtime functions executed by your program.
Listing One contains an interrupt service routine (ISR) which traps interrupt
8, the system timer interrupt. When a timer interrupt occurs, this routine
compares the return address on the stack to the region of code being profiled.
If the address is not in the region being profiled, no further action is
taken. Otherwise, the address is scaled to the size of the buffer, and the
word in the buffer corresponding to that address is incremented. The output of
the profiler is essentially an array of unsigned integers. Each word indicates
the number of times the program was found executing code at an address
corresponding to the position of the word in the array. (For more details on
the IBM timer, see the accompanying textbox, "Hardware Notes.")
Listing Two contains a function called speedup(), which is used to increase
the number of interrupts received per second. This code was originally
developed on an 80286/12-MHz processor. At the time, 18.2 samples per second
was sufficient. Today I use an 80486/50-MHz system. Needless to say, my
programs run a lot faster; therefore, I need many more samples per second.
This function would be better coded in assembly to reduce the overhead of the
timer interrupt. However, my intention is to provide the entire profiler using
the C language.
The last part of the program reports on the data output by the writemon()
function. It uses the MAP file output by your linker and a few additional
bytes of information to determine what functions were executed. It accumulates
the time used by each function and outputs them in sorted order.
Several problems came up while I was writing this profiler. A number of these
occurred simply because the profiler intercepts the timer-interrupt. The
timer-interrupt function can be running at any time, and any attempt to change
variables used by the interrupt handler can cause a "race" condition -- a
condition where executing two processes produces different (and possibly
incorrect) results, depending on which process finished first. Consider the
case in which a buffer address needs to be changed. The 8086 CPU does not have
an uninterruptible instruction that could transfer four bytes to memory. The
assignment of the buffer address takes at least two, and possibly several more
instructions. If, during this process, the timer-interrupt handler is
activated, the pointer to the buffer would be half right. That is to say, it
would contain the segment address of the new buffer and the offset address of
the old buffer, or some other equally bad combination. The pointer could
address memory that is not a part of either buffer, or even part of your
program. The solution is to disable any profiling when these variables are
being changed. This is the purpose of the global variable status in Listing
One.
Further problems arise when speeding up the system timer, Fortunately, I
happened across an assembly routine by Alan Holub from his "C Chest" column
(DDJ, December 1987). I translated his code into C and added this speedup()
function to the profiler for this article. I separated the speedup() function
from the rest of the program for two reasons. You can disable the speedup()
feature of the profiler by including an empty speedup() function in your
program. Furthermore, I think I will find speedup() useful in other
programming projects (and hopefully you will as well). Passing speedup() too
large a value for your system can cause your system to hang because of lost
interrupts. Values up to 1024 seem to be acceptable for most uses; larger
values produce too much overhead and really slow down your system; even larger
values will hang it.
The next problem was resolving function addresses. The 8086 CPU uses a
segmented architecture in which several different values can actually be used
to refer to the same address. My profiler is not intended to support protected
mode programs although it could with major modification, Fortunately,
converting a segmented address into a linear (or physical) address is not
difficult. The linear() function performs the same math that the 8086 CPU uses
to convert a SEGMENT:OFFSET address into a physical address.
The last major design problem I had to solve was relocation. In order to
identify the function being executed, the reporting program must know where
your program was loaded in memory. The profiler assumes that the program will
be loaded in a contiguous block of memory, and that all functions in your
program will have the same position in relationship to the start of that
block, no matter where the code is loaded. If these conditions are met, you
only need to know the location of one function in the program. Since nearly
all C programs have a main function, its address is used as the reference.
Microsoft's linker outputs virtual function addresses in the MAP file. These
addresses would only be correct if your program were loaded at physical
address 0:0. However, the relationship between the function addresses found in
the MAP file and the actual addresses in memory are the same.


Using the Profiler in Your Application


To initialize the profiler, simply call profile(buf, len, first, last, speed).
buf is an array of len unsigned integers that should be initialized to 0. The
profile() function does not initialize buf for you. You could take advantage
of the fact that buf is not reset to 0 to accumulate profiling data over
several program sessions. If you do this, look out for overflow errors caused
by accumulating more than 65,535 hits at one code address. Overflow could have
been checked using the _profmon() function, but I've decided to let it occur
silently. You can, however, modify _profmon() to check for overflow. If
overflow occurs, you have several choices: You could simply increment the next
or previous word in the buffer (but what if that overflows?); you could
generate an error message; or you could generate a table of overflowed
addresses. Remember that anything done in _profmon() must be done fast and
cannot use most DOS or runtime routines because it is executing at interrupt
time. You can reset the profiler by passing it a NULL buffer pointer.
The first and last addresses given to the profile() function are used to
determine the region of code being profiled. Any functions not between first
and last will not be profiled. Note that the last function will not be
profiled since its starting address is the end of the profiled region. Some C
compilers automatically create a function etext(), which is simply the last
address used by any program code. You can also determine the last function
address by examining the MAP file generated by the linker. If you want to
profile your entire program, simply create an empty function after the last
function listed in the linker MAP and use it as the argument last. It is
perfectly legal to pass NULL and 0xFFFF:0x000F as the values for first and
last. The profiler will then profile everything being executed in memory.
More Details.
The value provided in speed is passed to the speedup() function whenever the
profiler is enabled. The profiler disables speedup() when it is disabled. In
my initial implementation, speedup() was not called by any of the profiler
functions. You simply called it before profiling to speed up the system timer,
and afterwards to restore the system timer to normal speed. Since these two
functions intercept interrupt vector 8, it was possible to hang the system by
calling them in incorrect order. By including a call to speedup() in the
monitor() function, I was able to control the order and further bullet-proof
the code. It also allowed me to include the speedup factor in the profiler
output file, so you will see correct times.
Having initialized the profiler, you must enable it by calling monitor(flag).
If flag is True, profiling will be enabled. Conversely, profiling will be
disabled if flag is False. This feature allows you to profile just a few
sections of code without suffering from profiler overhead where it is not
needed. For example, I initially used the profiler to speed up the evaluation
function of a chess program. I simply placed a monitor(1) statement before the
call to the evaluation function, and a monitor(0) after it. Thus I profiled
only those functions used to evaluate the chessboard, and ignored the
functions that were part of the rather weighty user interface.
Always disable the profiler before exiting your program. Failing to disable it
or reset it (by calling profile with a NULL buffer) will cause errors, because
the timer interrupt will still be pointing to an ISR in your program. The code
and memory for the routine may still work through several simple DOS programs,
but eventually your system will crash. If your program crashes while running
the profiler, you should reboot it to remove the profiler, otherwise dangerous
things could happen! Using Control-Break to exit a program while it is
profiling could hang your system. This inadequacy could be fixed by adding
code to remove the profiler on any abnormal exit, but this requires
intercepting more vectors. I have found little need for this feature as I use
the profiler on already debugged code (which should be the normal order of
things).
After profiling your program, the data collected must be output to a file for
processing by PROF.EXE. This is handled by calling writemon(filename), where
filename is the name of the file in which you want to store the data. It saves
the position of your program in memory, the total number of clock ticks that
occurred while the profiler was active, the addresses of the program region
being profiled, the system-timer speedup factor used, and the length and
contents of the profile buffer. The writemon() function should be called
before resetting the profiler, otherwise the profiler will not know what data
should be output.


Notes on Using the Profiler


TESTPROF.C (see Listing Four, page 131) is a simple "Sieve of Eratosthenes"
program. It shows the proper use of the profiler runtime functions. Figure 1
shows sample output from PROF.EXE and the effect of different speedup values
on profiler output from the same program. Many things can affect the output of
the profiler, including disk caching, RAM caching, and various TSR programs.
When profiling, disable any disk-caching programs you are using, especially
any cache that performs delayed writes. This will give you more accurate
profiler output. The profiler itself increases program-execution time because
it steals cycles during a timer interrupt to perform its work. Using larger
speedup values will increase this overhead, and you will begin to see altered
results in the output of the profiler.
Figure 1: Sample profiler output for the TESTPROF.C program.

 Start: 24624
 End: 24952
 Length: 4096
 Scale: 1
 Speed: 1
 Time: 00:10.384
 Time Routine % of Total % of Accounted

 00:02.197 _markbit 21.16% 40.82%
 00:02.087 _markall 20.11% 38.78%
 00:00.659 _findnext 6.35% 12.24%
 00:00.439 _testbit 4.23% 8.16%
 00:05.384 Total 51.85%
 Start: 24624
 End: 24952
 Length: 4096
 Scale: 1
 Speed: 2
 Time: 00:15.741
 Time Routine % of Total % of Accounted
 00:02.060 _markbit 13.09% 41.90%
 00:01.483 _markall 9.42% 30.17%
 00:00.741 _testbit 4.71% 15.08%
 00:00.467 _findnext 2.97% 9.50%
 00:00.164 DOS 1.05% 3.35%
 00:04.917 Total 31.24%
 . . .
 Start: 24624
 End: 24952
 Length: 4096
 Scale: 1
 Speed: 512
 Time: 00:39.369
 Time Routine % of Total % of Accounted
 00:01.964 _markbit 4.99% 57.89%
 00:01.161 _testbit 2.95% 34.21%
 00:00.182 _findnext 0.46% 5.38%
 00:00.082 _markall 0.21% 2.42%
 00:00.002 DOS 0.01% 0.09%
 00:03.393 Total 8.62%

For best results, start the profiler with no speedup value (0 or 1) and a
buffer at least as long as all of the functions you wish to profile. Continue
doubling the speedup value until you get acceptable results. Buffer size also
affects the output. If your buffer is significantly smaller than the region of
code you are profiling, the profiler may incorrectly assign clock ticks to the
function before the one actually being executed. Your buffer should have at
least one element for every 16 to 32 bytes of code being profiled. Using
larger scales (or, one element per 64 bytes) will give erroneous results in
some cases because results from smaller functions will be combined with
functions immediately before them in memory. Smaller functions are often
heavily used, which could further distort your results.


Reporting


PROF.C in Listing Three(page 128) provides the reporting features of the
profiler. The readmap() function imports the MAP file produced by the linker.
It searches for the strings by value to locate the section of the MAP file
containing function addresses sorted by address. It ignores all absolute
symbols found in the MAP. You could replace this function with one that
imported symbols from other file formats. The readmap() function should set
nmap to the number of map entries used, and set emaina to the address of the
main function. emaina is used as a reference address to determine the location
of other functions in memory.
My readmap() function adds two entries used to indicate the amount of time
spent in other code. The DOS entry simply collects all time spent in any code
whose address is before the first address of your program. The BIOS entry
collects all of the time spent in code that occurs after your program in
memory. If you profile all memory in a plain vanilla MS-DOS system (no
HIMEM.SYS driver or other use of upper-memory blocks), these entries will
correctly report the amount of time your program spends executing MS-DOS or
BIOS routines.


Possible Improvements


A good profiler not only tells you how much time is spent in each function,
but also indicates how many times each function is called. This is useful if
you want to identify functions that would benefit from fast calling
conventions, conversion to inline code (if your compiler supports it), or
macros (if not). Counting function calls could be done by modifying the stack
check (__CHKSTK) function provided with your compiler. The __CHKSTK function
could call a routine similar to _profmon(), passing its return address. This
would require a second buffer, or you could split the buffer passed to
profile() in two.
You could also add support for different symbol-table formats or additional
features. For example, some compilers will output the line numbers and
addresses of each line of code. Adding support for line numbers would involve
adding a little more code to the reporting program, and the code would be very
similar to that already used for counting time spent in a function. Note that
reporting on line numbers would provide limited data unless the system timer
was sped up by an appreciable factor. At that point, it might be necessary to
code _profmon() directly in assembly and to limit scaling to powers of two in
order to limit the amount of time spent in the ISR.


Hardware Notes


The IBM PC and compatible systems use three programmable timers for system
timing, memory refresh, and speaker control. Originally, the timers were all
part of one chip on the system board--the Intel 8254 programmable interval
timer. Today, a VLSI chip usually handles these functions and much more, but
the timing functions work the same as with the 8254. The input of Timer 0 is
connected to a 1.19318-MHz signal which decrements the timer by 1 each time
the signal goes high. The output of Timer 0 is connected to the interrupt 0
input of the first 8259 interrupt controller, which is used to control and
prioritize interrupts occurring in the system. The 8259 essentially forces the
processor to execute an INT 8 instruction whenever the counter element of
Timer 0 reaches 0.
Each timer has a control register, status register, counter register, counter
element, and output latch. The counter register contains the starting value
loaded into the counter element after its reaches 0. The output latch is used
to save the contents of the counter element to be read by the CPU. Your system
BIOS loads Timer 0's counter register with a value of 65,536 shortly after
your system is booted. This generates approximately 18.2 interrupts per second
(1.19318 MHz/65536=18.2065 Hz). You can speed up the system timer by
decreasing the value contained in its counter register.
Listing Two contains a function that speeds up the system timer in this
manner. Note that it calls the old timer interrupt only when its internal
counter is 0. Otherwise, it sends an end-of-interrupt command to the 8259
controller. The end-of-interrupt command is necessary to allow the 8259 to
continue processing interrupt requests.
--KB

_YOUR OWN SOURCE CODE PROFILER_

by Keith W. Boone


[LISTING ONE]

/* PROFILER.C -- Copyright (c) 1992 by Keith W. Boone, Tallahassee, FL
 * This program contains the runtime functions required to profile C source
 * code. Permission is granted to use or distribute this source code without
 * any restrictions as long as this entire notice is included intact. You may
 * use or include this software in any product that you sell for profit.
 * This software is distributed as is, and is no liability is granted.
 */

#include <stdio.h>
#include <dos.h>

static long start, /* Linear code start address */
 end, /* Linear code end address */
 reference, /* Main function address */
 ticker; /* Global total time in profiler */
static int *buffer, /* Pointer to CS:IP tick buffer */
 length, /* Length of the buffer */
 scale, /* Scaling factor for the buffer */
 status, /* Status = TRUE if profiling */
 (interrupt far *old8)(); /* Old clock interrupt */
unsigned speedup_factor = 1; /* Clock speedup factor */

/***** Convert a far pointer to a linear address *****/
long linear(void far *ptr)
{
 long addr = FP_SEG(ptr);
 if ((long)ptr == -1l)
 return 0x100000;
 addr <<= 4;
 addr += FP_OFF(ptr);
 return addr;
}
/***** _profmon is a clock interrupt handler which is used to mark what
function your program is in during the interrupt. *****/
interrupt far _profmon(int es, int ds, int di, int si,
 int bp, int sp, int bx, int dx, int cx, int ax,
 unsigned ip, unsigned cs
)
{
 if (status) /* If profiling */
 { long addr = cs;
 addr <<= 4;
 addr += ip; /* Compute Phys address */
 ticker++; /* Increase total time */
 if ((addr >= start) && (addr <= end)) /* If in range of code */
 { addr -= start;
 buffer[addr / scale]++; /* Update buffer */
 }
 }
 _chain_intr(old8);
}
/***** profile(buf, len, first, last, speed) initializes the profiler. Buf is
 Buf is a pointer to an array of integers that keeps track of CS:IP locations
 while profiling. First and last are pointers to the functions where profiling

 will occur. Note that the function last is never profiled! Speed is the clock
 frequency multiplier passed to speedup() when turning profiling on. *****/
profile(int *buf, int len, int (*first)(), int (*last)(), unsigned speed)
{
 unsigned i, j;
 long size;
 int main();
 buffer = buf;
 length = len;
 start = linear((void far *)first); /* Conver to linear address */
 end = linear((void far *)last);
 reference = linear((void far *)main); /* main() is relocation ref */
 status = 0; /* Profiling off initially */
 speedup_factor = speed; /* Used with speedup() */

 if (buf == NULL) /* buf==NULL is reset */
 { if (old8 != NULL)
 _dos_setvect(0x08, old8);
 return 0;
 }
 size = (end - start); /* Compute size and scale */
 scale = ((size - 1) / length) + 1;
 return 0;
}
/***** monitor(flag) turns profiling on or off. If turning profiling on then
 the program needs to trap the clock interrupt, then set status = TRUE.
 Otherwise the program resets the clock interrupt, then sets status = FALSE.
 This ensures that status does not change to FALSE while in _profmon(),
 not really a big deal, but cleaner. ******/
monitor(int flag)
{
 if (buffer == NULL) /* Turn on w/out initialization is error */
 return -1;
 if (flag) /* If turning profiling on */
 { if (speedup_factor > 1) /* Speed up clock by saved */
 speedup(speedup_factor); /* speedup factor */
 old8 = _dos_getvect(0x08); /* get old clock handler */
 _dos_setvect(0x08, _profmon); /* set new vector */
 }
 else if (old8 != NULL) /* Else turning off */
 { _dos_setvect(0x08, old8); /* reset old vector */
 old8 = NULL;
 if (speedup_factor > 1) /* Restore clock speed */
 speedup(1); /* Note reverse order! */
 }
 status = flag; /* Set on/off flag */
 return 0;
}
/***** writemon(filename) copies the profiling data to the file specified
 for later use by PROF.EXE. *****/
writemon(char *filename)
{
 FILE *fp = fopen(filename, "wb");
 if (fp == NULL)
 return -1;
 monitor(0);
 fwrite(&reference, 1, sizeof(reference), fp);
 fwrite(&ticker, 1, sizeof(ticker), fp);
 fwrite(&start, 1, sizeof(start), fp);

 fwrite(&end, 1, sizeof(start), fp);
 fwrite(&length, 1, sizeof(length), fp);
 fwrite(&scale, 1, sizeof(scale), fp);
 fwrite(&speedup_factor, 1, sizeof(speedup_factor), fp);
 fwrite(buffer, length, sizeof(*buffer), fp);
 fclose(fp);
}






[LISTING TWO]

#include <dos.h>

static int (interrupt far *oldclk)(); /* Old clock interrupt */
static unsigned counter, /* Current counter value */
 factor;
#define PICCMD 0x20 /* 8259 PIC command port */
#define EOI 0x20 /* EOI command */

#define TCTRL 0x43 /* Timer control port */
#define T0DATA 0x40 /* Timer 0 data port */
#define T0LOAD 0x36 /* Timer 0 load command */

/****** _spdup is an interrupt handler which processes clock interrupts which
 have been sped up. *****/
interrupt far _spdup()
{
 if (counter--) /* If counter non-zero */
 outp(PICCMD, EOI); /* Send EOI and return */
 else
 { counter = factor; /* Otherwise reset counter */
 _chain_intr(oldclk); /* and exec old handler */
 }
}
speedup(unsigned newfact)
{
 unsigned divider;
 if (newfact <= 1) /* Reset timer handler */
 { factor = 1;
 if (oldclk) /* reset only if set */
 { _dos_setvect(0x08, oldclk); /* in the first place */
 outp(TCTRL, T0LOAD);
 outp(T0DATA, 0);
 outp(T0DATA, 0); /* standard divisor */
 oldclk = 0;
 }
 }
 else
 { if (!oldclk)
 oldclk = _dos_getvect(0x08); /* Save old handler */
 counter = factor = newfact;
 divider = 65536L / newfact;
 _disable(); /* Disable interrupts */
 outp(TCTRL, T0LOAD);
 outp(T0DATA, divider);

 outp(T0DATA, (divider >> 8)); /* load new divisor */
 _enable(); /* Enable interrupts */
 _dos_setvect(0x08, _spdup); /* use our handler */
 }
}






[LISTING THREE]

/* PROF.C -- Copyright (c) 1992 by Keith W. Boone, Tallahassee, FL
 * This program contains the reporting facilities used to profile C code.
 * Permission is granted to use or distribute this source code without any
 * restrictions as long as this entire notice is included intact. You may use
 * or include this software in any product that you sell for profit.
 * This software is distributed as is, and is no liability is granted.
 */

#include <stdio.h>
long start, /* linear address of Start of profiled code */
 end, /* End of profiled code */
 reference, /* Address of main function to resolve reloc */
 ticker; /* Value of ticker at end of profiling */
int *buffer, /* Pointer to profile buffer */
 length, /* Length of profile buffer */
 scale, /* Scale used (code address to buffer address) */
 factor; /* Clock speedup factor */

typedef struct mapentry /* Used to store MAP file information */
{ long address, /* Linear address of function */
 count; /* computed # of ticks in function */
 char *name; /* Function Name */
} MAP;

MAP map[4096]; /* Limited to 4096 functions */
int nmap = 0; /* Number of Map table entries used */
long emaina; /* Linear address of main function */

/***** Sortmap(m1, m2) is a callback function used by qsort to sort
 map table entried by linear address. *****/
sortmap(MAP *m1, MAP *m2)
{ if (m1->address < m2->address)
 return -1;
 if (m1->address > m2->address)
 return 1;
 return 0;
}
/***** Readmap(filename) reads in the MAP stored in 'filename' and loads the
data into map[]. *****/
readmap(char *filename)
{
 char line[80], *ptr, *strrchr(), *strdup();
 long off, seg;
 FILE *fp = fopen(filename, "rt");

 if (fp == NULL) /* Return error code if cannot open MAP file */

 return -1;
 map[nmap++].name = "DOS"; /* Address below first are in DOS */

 /* Read each line of the map file until you find "Publics by Value".
 * This is VERY linker dependent! */
 while (fgets(line, 80, fp))
 { if (ptr = strrchr(line, 'b'))
 { if (strcmp(ptr, "by Value\n") == 0)
 break;
 else ptr = NULL;
 }
 }
 if (ptr == NULL) /* If we didn't find "Publics by Value" */
 { fclose(fp); /* the return with an failure code. */
 return -1;
 }
 while (fgets(line, 80, fp))
 { /* remove trailing linefeed from line */
 if (ptr = strrchr(line, '\n'))
 *ptr = 0;
 if (!*line)
 continue; /* Ignore empty lines */
 else if (strncmp(line + 12, "Abs", 3) == 0)
 continue; /* And Absolue values */
 sscanf(line, " %lX:%lX", &seg, &off); /* Get the seg:off */
 map[nmap].address = (seg << 4) + off; /* convert to linear */
 ptr = strrchr(line, ' '); /* find function name */
 map[nmap].name = strdup(++ptr); /* make a copy of it */

 if (strcmp(ptr, "_main") == 0) /* Found main() */
 emaina = map[nmap].address; /* save it's address */
 nmap++;
 }
 qsort(map + 1, nmap - 1, sizeof(MAP), sortmap); /* Sort by address */
 map[nmap++].name = "BIOS"; /* Addresses beyond last are in BOIS */
 return 0;
}
/***** Readmon(filename) reads in the profile data stored in 'filename' and
 and outputs vitals on the profiled data. *****/
readmon(char *filename)
{
 FILE *fp = fopen(filename, "rb");
 void *calloc();
 int m, s, k;
 float T;

 if (fp == NULL)
 return -1;
 fread(&reference, 1, sizeof(reference), fp);
 fread(&ticker, 1, sizeof(ticker), fp);
 fread(&start, 1, sizeof(start), fp);
 fread(&end, 1, sizeof(start), fp);
 fread(&length, 1, sizeof(length), fp);
 fread(&scale, 1, sizeof(scale), fp);
 fread(&factor, 1, sizeof(factor), fp);
 buffer = (int *)calloc(sizeof(int), length);
 fread(buffer, length, sizeof(int), fp);
 fclose(fp);


 T = ticker / (18.2 * factor);
 m = T / 60;
 T -= m * 60;
 s = T;
 T -= s;
 k = T * 1000;
 printf(" Start: %6ld\n", start);
 printf(" End: %6ld\n", end);
 printf("Length: %6d\n", length);
 printf(" Scale: %6d\n", scale);
 printf(" Speed: %6d\n", factor);
 printf(" Time: %02d:%02d.%03d\n", m, s, k);
 return 0;
}
/***** mapcmp(m1, m2) is a callback function used by qsort to sort map table
 entried by amount of time used by the function. *****/
mapcmp(MAP *m1, MAP *m2)
{ if (m1->count > m2->count)
 return -1;
 else if (m1->count == m2->count)
 return 0;
 else return 1;
}
main(int argc, char **argv)
{
 int i, /* profile buffer index */
 j; /* MAP entry index */
 long addr; /* Current linear address */
 int m, /* Minutes */
 s, /* Seconds */
 t; /* thousandths */
 char buf[32], /* File name buffer */
 *strchr();
 float seconds, /* time in a function in seconds */
 accounted, /* accounted for in all profiled functions */
 percent; /* Percent of total accounted */
 if (argc < 2)
 { fprintf(stderr, "Usage: %s program[.map] [program[.mon]]\n", *argv);
 exit(1);
 }
 /* Append .map to files that have no extension specified for arg 1 */
 if (strchr(argv[1], '.'))
 strcpy(buf, argv[1]);
 else sprintf(buf, "%s.map", argv[1]);
 if (readmap(buf))
 { fprintf(stderr, "Error reading map '%s'.\n", buf);
 exit(1);
 }
 /* If second arg specified, then use it, otherwise use first arg */
 if (argc > 2)
 i = 2;
 else
 { i = 1;
 if (strchr(argv[1], '.')) /* remove ext from first arg */
 *strchr(argv[1], '.') = 0;
 }
 /* Append .mon to files that have no extension specified for arg 2 */
 if (strchr(argv[i], '.'))
 strcpy(buf, argv[i]);

 else sprintf(buf, "%s.mon", argv[i]);
 if (readmon(buf))
 { fprintf(stderr, "Error reading '%s'.\n", buf);
 exit(1);
 }
 start -= reference; /* Offset start by the &main */
 end -= reference;
 for (i = 1; i < (nmap - 1); i++) /* adjust map by &main */
 map[i].address -= emaina;
 map[0].address = 0x8000000; /* DOS start = max neg */
 map[nmap].address = 0x7FFFFFF; /* BIOS end = max pos */

 /***** For each entry in the buffer, find the first map address (j)
 whose function encompasses addr, and update the time spent in
 that function. *****/
 accounted = 0.0;
 addr = start;
 j = 0;
 for (i = 0; (i < length) && (j < nmap); i++)
 { if (buffer[i])
 { while ((j < nmap) && (addr > map[j+1].address))
 j++;
 map[j].count += buffer[i];
 accounted += buffer[i];
 }
 addr += scale;
 }
 qsort(map, nmap, sizeof(MAP), mapcmp); /* Sort by time used */
 /* For each function, output time used, and accumulate the total */
 printf("Time\t\tRoutine\t\t %% of Total\t %% of Accounted\n");
 for (i = 0; i < nmap; i++)
 { if (map[i].count)
 { seconds = map[i].count;
 seconds /= (18.2 * factor);
 m = seconds / 60;
 seconds -= m * 60;
 s = seconds;
 seconds -= s;
 t = seconds * 1000;
 printf("%02d:%02d.%03d\t%-20s\t%5.2f%%\t%5.2f%%\n",
 m, s, t, map[i].name,
 map[i].count * 100.0 / ticker,
 map[i].count * 100.0 / accounted);
 }
 }
 /* Output accounted/total times */
 percent = (100.0 * accounted) / ticker;
 accounted /= 18.2 * factor; /* Convert accounted to seconds */
 m = accounted / 60;
 accounted -= m * 60;
 s = accounted;
 accounted -= s;
 t = accounted * 1000;
 printf("%02d:%02d.%03d\t%-20s\t%5.2f%%\n", m, s, t, "Total", percent);
}







[LISTING FOUR]

#include <stdio.h>
#define PLEN 4096
unsigned pbuf[PLEN];

#define ALEN 8000
#define MAX (ALEN*8L)
char array[ALEN];

markbit(char *array, long bit)
{
 array[bit/8] = 1 << (bit % 8l);
}
testbit(char *array, long bit)
{
 return array[bit/8] & (1 << (bit % 8l));
}
markall(char *array, long bit)
{
 long temp;
 for (temp = bit + bit; temp < MAX; temp += bit)
 markbit(array, temp);
}
long findnext(char *array, long bit)
{
 long temp;
 for (temp = bit + 1; temp < MAX; temp++)
 if (!testbit(array, temp))
 return temp;
 return 0;
}
extern last();
main()
{
 long value,
 start,
 clock();
 int speed;
 char buff[16];
 printf("Speedup\tTime\n");
 for (speed = 1; speed <= 2048; speed += speed)
 { /* reset program workspace on each iteration */
 memset(pbuf, 0, sizeof(pbuf));
 memset(array, 0, sizeof(array));
 /* Set profile buffer, program addresses and speedup */
 /* note that main() is not profiled */
 profile(pbuf, PLEN, markbit, main, speed);
 start = clock();
 printf("%5d\t", speed);
 /* Enable profiling */
 monitor(1);
 /* Execute sieve of Eratosthenese */
 for (value = 2; value; value = findnext(array, value))
 markall(array, value);
 /* Disable profiling */
 monitor(0);

 /* Output profile data for this pass */
 printf("%ld\n", clock() - start);
 sprintf(buff, "tst%05d.mon", speed);
 writemon(buff);
 }
}



Figure 1: Sample profiler output for the TESTPROF.C program

 Start: 24624
 End: 24952
Length: 4096
 Scale: 1
 Speed: 1
 Time: 00:10.384
Time Routine % of Total % of Accounted
00:02.197 _markbit 21.16% 40.82%
00:02.087 _markall 20.11% 38.78%
00:00.659 _findnext 6.35% 12.24%
00:00.439 _testbit 4.23% 8.16%
00:05.384 Total 51.85%
 Start: 24624
 End: 24952
Length: 4096
 Scale: 1
 Speed: 2
 Time: 00:15.741
Time Routine % of Total % of Accounted
00:02.060 _markbit 13.09% 41.90%
00:01.483 _markall 9.42% 30.17%
00:00.741 _testbit 4.71% 15.08%
00:00.467 _findnext 2.97% 9.50%
00:00.164 DOS 1.05% 3.35%
00:04.917 Total 31.24%
 Start: 24624
 End: 24952
Length: 4096
 Scale: 1
 Speed: 4
 Time: 00:18.434
Time Routine % of Total % of Accounted
00:01.895 _markbit 10.28% 45.10%
00:01.208 _markall 6.56% 28.76%
00:00.659 _testbit 3.58% 15.69%
00:00.398 _findnext 2.16% 9.48%
00:00.041 DOS 0.22% 0.98%
00:04.203 Total 22.80%
 Start: 24624
 End: 24952
Length: 4096
 Scale: 1
 Speed: 8
 Time: 00:19.848
Time Routine % of Total % of Accounted
00:01.737 _markall 8.75% 36.19%
00:01.675 _markbit 8.44% 34.91%
00:00.748 _testbit 3.77% 15.59%

00:00.521 _findnext 2.63% 10.87%
00:00.116 DOS 0.59% 2.43%
00:04.800 Total 24.19%
 Start: 24624
 End: 24952
Length: 4096
 Scale: 1
 Speed: 16
 Time: 00:20.669
Time Routine % of Total % of Accounted
00:02.026 _markbit 9.80% 44.66%
00:01.146 _markall 5.55% 25.28%
00:00.831 _testbit 4.02% 18.32%
00:00.484 _findnext 2.34% 10.67%
00:00.048 DOS 0.23% 1.06%
00:04.536 Total 21.95%
 Start: 24624
 End: 24952
Length: 4096
 Scale: 1
 Speed: 32
 Time: 00:21.299
Time Routine % of Total % of Accounted
00:02.558 _markbit 12.01% 49.03%
00:01.237 _markall 5.81% 23.72%
00:00.781 _testbit 3.67% 14.97%
00:00.506 _findnext 2.38% 9.71%
00:00.133 DOS 0.63% 2.57%
00:05.218 Total 24.50%
 Start: 24624
 End: 24952
Length: 4096
 Scale: 1
 Speed: 64
 Time: 00:22.092
Time Routine % of Total % of Accounted
00:02.423 _markbit 10.97% 45.22%
00:01.655 _markall 7.49% 30.88%
00:00.728 _testbit 3.30% 13.58%
00:00.510 _findnext 2.31% 9.53%
00:00.042 DOS 0.19% 0.78%
00:05.359 Total 24.26%
 Start: 24624
 End: 24952
Length: 4096
 Scale: 1
 Speed: 128
 Time: 00:23.507
Time Routine % of Total % of Accounted
00:01.892 _markall 8.05% 37.09%
00:01.677 _markbit 7.14% 32.89%
00:01.008 _testbit 4.29% 19.77%
00:00.518 _findnext 2.21% 10.16%
00:00.004 DOS 0.02% 0.08%
00:05.101 Total 21.70%
 Start: 24624
 End: 24952
Length: 4096
 Scale: 1

 Speed: 256
 Time: 00:27.389
Time Routine % of Total % of Accounted
00:04.228 _markall 15.44% 46.14%
00:02.544 _markbit 9.29% 27.77%
00:01.134 _findnext 4.14% 12.38%
00:00.973 _testbit 3.55% 10.62%
00:00.283 DOS 1.04% 3.09%
00:09.164 Total 33.46%
 Start: 24624
 End: 24952
Length: 4096
 Scale: 1
 Speed: 512
 Time: 00:39.369
Time Routine % of Total % of Accounted
00:01.964 _markbit 4.99% 57.89%
00:01.161 _testbit 2.95% 34.21%
00:00.182 _findnext 0.46% 5.38%
00:00.082 _markall 0.21% 2.42%
00:00.002 DOS 0.01% 0.09%
00:03.393 Total 8.62%








































October, 1992
PROGRAMMING PARADIGMS


Confessions and Conversations




Michael Swaine


Last month I had intended to write something about debugging, the theme of
that issue. I didn't, and the reason probably goes back to the middle of the
last decade, to a decision that haunts me to this day.
It's not true, as former Apple evangelist and current spokesmodel Guy Kawasaki
wrote recently, that all computer writers have serious conflicts of interest.
Guy is a master of speaking for himself, but he probably shouldn't try to
speak for other writers. For the record, I have never been an employee of a
computer or software company, have never held stock in a computer or software
company, have never been romantically involved with an employee of a computer
or software company, and have, with one exception, never done contract work
for any computer or software company.
I'm not telling you this to brag, as will be apparent momentarily, or even to
show you how dull my life is. Rather, I'm setting up a context for confession.
Because there was that one exception.


The Laconic Programmer


It was back when I was warming the chair for Jon Erickson as editor-in-chief,
the Macintosh was new, and I was spending a lot of time talking with the
intrepid developers for that new Platform. One day one of the most intrepid,
Steve Jasik, called me. Steve had written an interesting code-snooping tool
called MacNosy. Steve had also written the manual for MacNosy, and it could
also be called interesting. He felt that maybe it could use some work and
wanted to know if I wanted the job.
I hadn't done any documentation and thought I should get some experience. I
also thought it would be good for the magazine for me to work closely with a
software developer on a commercial programming tool. I had programmed, I had
looked over the shoulders of developers, I had edited code, but I hadn't sat
at a developer's elbow day after day as he coded an application. I understood
how difficult the documentation writer's job was, but fool that I was, I
thought I could handle it.
Working with Steve proved to be a Beckettesque experience. Laconic is the word
for Steve's conversational style. Mostly he typed at me. "And you can ..." he
would begin, then type, and a table of hex codes would appear in a window. Or:
"Of course, you've got this..." and he would triple-click on a piece of code
and some more code would appear in another window. Any reasonable person would
have insisted that Steve explain all these off-hand demonstrations, but a deep
flaw in my character makes it difficult for me to admit to ignorance of
anything. Conversing productively with Jasik would have required just such an
admission about 90 times an hour. My ego wasn't up to it.
Ultimately I gave up and passed the project on to someone else. Steve's
documentation still acknowledges me, I think, but for what I couldn't say.
I never took any money from Steve for my efforts. But that doesn't get me off
the hook for conflict of interest, because there remains the possibility that
anything I write about Steve's products is tainted with a sense of guilt at
not finishing that documentation project.
I ran into Steve in an Apple Developer room at MacWorld Expo in Boston in
August, demo-ing his debugger. "Want to see the latest and greatest?" he
asked, and I, motivated I know not to what degree by guilt, fixed my eyes on
the screen. The debugger, I was happy to see, is a much more visual product
than the original MacNosy: For example, it uses bar graphs to show usage of
subroutines. This time, when Steve said "And you can..." and typed, these
clearly labeled bar graphs popped up, and I got it.
But I didn't ask to see the docs.


The Democrat, the Republican, and Various Other Parties


There were some cute products at the show, including a solar panel for
Power-Books and a morphing tool that jammed the aisle with people watching the
demo of George Bush's face slowly and eerily transforming into Bill Clinton's.
It's worrisome to a writer to see a joke told without words. Video, I
sometimes worry, is the enemy of words. Then I remember that video, like
everything else in the world, is subject matter, and I quit worrying.
The hot products were chiefly in the video realm, where it's getting more
possible to do serious video post-production on the desktop with equipment
that doesn't cost an arm and a leg. I know it doesn't make any more sense to
say "more possible" than it does to say "a little bit dead," but the state of
this field seems best described by this kind of logic. Right now, there are
tools for controlling analog video devices from the computer, tools for
editing digital video on the computer, a standard format for video, powerful
compression software, boards for capturing analog video digitally, boards and
monitors for displaying video--all the pieces seem to be available, but
putting them together is difficult. But it's getting more possible.
As usual, the most interesting part of the show was the parties, at one of
which I ran into another pioneer Mac developer who is now doing distribution.
Well, sort of distribution; he is making a business of acquiring software for
companies. Although he currently represents large companies that buy in
sufficient volume to justify hiring a buying service, he had some thoughts
about how his service could be provided to ordinary consumers. Although he's
not ready to go public with his ideas yet, it is intriguing to consider the
consequences of a shift in emphasis between buyer and seller in the software
market. I mentioned Brad Cox's concept of pay-per-use as expressed in his
then-upcoming DDJ article "Superdistribution and Electronic Objects" (see page
44 in this issue), and he agreed that some new ideas about connecting buyers
and sellers were going to be needed to make object libraries really useful.
NeXT (no, it wasn't exhibiting at the Expo, although Silicon Graphics was) is
using one fairly conventional technique for connecting customers and vendors
for software objects. The NeXT ObjectWare Catalog contains a hundred or so
objects, ranging from interesting tricks to highly useful components. Some of
the objects are commercial products and others are freebies from academics.
There's nothing new about selling software by catalog, but the dynamics of the
object market may make channels like this more important than before.


Media Trolling in the Basements of Boston


I also ran into Steve Rosenthal at the Expo, an old friend and a writer who
has appeared in these pages and in many other mags. He is one of the more
trenchant observers of media today; and he took me media trolling.
We screened Todd Rundgren's latest Toaster video, Theology. Nutek, whose Video
Toaster is Rundgren's video tool of choice, has been showing his Change Myself
video for some time now, but this latest work is technically and artistically
much better. It was enthusiastically received at this year's SIGGRAPH, as it
should have been. With all due respect to the masters of new video at Lucas
Films and elsewhere, what Rundgren is up to is the best demonstration of the
artistic potential of computers and video I've ever seen.
We wound up in an MIT basement where Steve moderated the third Media Event, an
informal gathering of video developers and others interested in issues related
to the confluence of technologies that the Media Lab folks are so deeply into.
One viewpoint expressed at the media event interested me particularly. In a
discussion of rights, it became clear that those present don't want software
to enforce copyright, even if that is possible. The copyright page in a book
informs, it does not enforce, and the developers indicated that they would
like to see such a scheme of informing rather than enforcing in video.
How to inform is still a tricky issue. Do you label every frame somehow? The
issue is crucial, because digital technology makes it easy and tempting to put
together works by using existing video. The more this is done, the more often
it will be the case that the video clip you want from Fred is not really owned
by Fred: He got it from Alice, who got it from Geoff, but Alice doesn't know
how to get in touch with Geoff any more. The solution is to attach
owner/creator information to the clip, but "clip" is not a well-defined unit.
Hence the unappealing idea that you might have to attach this kind of
information to every single frame of a video. The extra bytes are no big deal,
but all copying, editing, and encryption software would have to know about the
copyright information, and preserve it through transformations. Tricky.
But not everyone feels that informing users about copyright is adequate
protection, and there will continue to be proposals for ways to incorporate
some protection into software and other digital products. The concept of
monitoring use and charging users on a per-use basis, as discussed by Brad Cox
and by Jon (in his October 1991 editorial) is certainly interesting.


Subscription Software


I have a variant proposal. I think of it as software subscription, but that
term may already be claimed; I know that Microsoft calls their automatic
upgrade deal, Microsoft Maintenance, a software subscription. But there is
nothing new about my idea, anyway, except possibly how the already-existing
pieces fit together. And someone else may already have put them together.
Maybe this is another one of those things done 20 years ago on mainframes.
Maybe some software vendor is doing it today and I just haven't noticed. I
claim no priority. But on the theory that we need several fleshed-out
proposals to pick away at, I put mine forth for the picking.
Your application is distributed freely online just like freeware or shareware.
All documentation is electronic and supplied with the application. The
application may also be sent directly and freely to target customers through
mail-order lists, and other distribution methods are possible. But it won't
appear on dealer shelves, and nobody will be charged for merely receiving the
disk containing the program. Payment is for use.
The program contains a use counter, and is shipped with the counter set at a
low value, like ten. Anyone who gets his or her hands on one of your disks, or
on a copy someone has made of one of your disks, gets ten free uses of the
application. Or fewer than ten: Fred might try your application once, then
make a copy for a friend. Copying the disk copies the current counter value,
so Fred's friend will have only nine free uses. The application should
probably do something useful when the counter gets to zero, such as telling
Fred's friend how to get in touch with you.


Buying Uses



If Fred (or his friend) decides the application is worth using more than ten
times, he calls the number you display in a splash panel, and orders more
uses. He can place the order before the counter gets down to zero, but it's
not a problem if he waits until the last minute, because the purchased uses
are available immediately.
The technology for delivering uses is basically the same as is used in font
CD-ROMs. With these products, Fred can call and give his serial number and a
credit card number and the font he wants to buy, and he is given, over the
phone, an encryption key to unlock the font on the disk. He keys it in, and
he's got the use of the font. (He already had the font itself; it was there on
the disk.)
The process is similar for buying uses of your application. Fred gives you his
serial number and a credit card number and the number of uses he wants. He
gets back a key that, rather than decrypting a particular file, resets the
application counter to a particular value. If this is his first purchase and
he got the application from someone else, there will need to be an additional
interchange: He gives you his credit card number and you give him a different
key, which, when he types it in, inserts a new serial number into his copy of
the program.
You might give Fred the ability to specify any number of uses, or you might
want to restrict him to certain values. One plan might be: $20 for 10 uses,
$100 for 100 uses, $200 for 1000 uses, and $500 for unlimited uses (or
effectively unlimited; maybe you set the counter to 65,535). Another, more
security-minded plan would allow only relatively small use purchases: no more
than 100 uses per call, say, or in an extreme case, no more than ten.


Fixing the Holes


There are some obvious security problems in this scheme. For one thing, the
counter is on the disk, so it can obviously be changed by a sufficiently adept
user. But this would require some serious effort, since the portion of the
code containing the counter would be encrypted. Another problem is that, as I
have described the system, there is zero copy protection. To encourage wide
distribution of sample copies of the application, you let all copies be full
working copies, indistinguishable from the originals. This means that if Fred
has purchased 1000 uses and now copies the disk, he can give a friend a
1000-use free copy. Not good.
There are various ways to deal with this, from only allowing reasonably small
numbers of uses to be purchased at one time to a fairly benign copy-protection
gimmick that would allow unlimited copying, but would reset the counter on any
copy to a low number.
Note, though, that the copy is only free until its counter runs down. Now when
Fred's friend tries to get more uses, he will have to get himself a new serial
number and start paying. You know that he needs to get himself a serial number
because your software notices that his serial number and his credit card
number don't agree (unless he has stolen Fred's, in which case Fred has a
problem, not you).
More than this, your software can also determine: 1. that this is a copy (by
the lack of agreement of serial number and credit card number); and 2. that it
was copied with the counter set high (because the last usage purchase on this
serial number was a large number). Given these facts, you could then tell
Fred's friend that you would be glad to sell him more uses, but that he will
first have to pay you what he owes you for past uses.
You could only do this if you had informed him before he ever used it that his
copy was a special kind of copy, and that, say, only the first ten uses were
free. But it would be possible to drop that warning message into the
application when the original user (Fred) made that large use purchase. Since
Fred's disk and his friend's are identical, the message would have to make
sense to both of them. A splash screen like this might do it:
WARNING: Use of this program may cost you money. 100 uses of this program have
been purchased by Fred Grimly. While this program may be copied freely, its
use by anyone other than Fred Grimly will be charged to the user at the rate
of $5 per use. If you are not Fred Grimly and do wish to use the program, we
recommend that you register it by calling 1-800-555-5555. Registering the
program will allow you to purchase uses at a substantially lower cost and will
allow you to be informed of upgrades, bugs, and related products.
It seems to me that a system like this should work, and would result in
greatly reduced marketing costs, no fighting for shelf space in computer
stores, elimination of paper documentation costs, excellent information on the
actual use of your product, more flexibility in pricing, fairer pricing, and
better contact with your customers.
That's how it seems to me. But running into Steve Jasik at MacWorld Expo has
temporarily humbled me, so that I suspect that this idea is fatally flawed.
So where is the fatal flaw?











































October, 1992
C PROGRAMMING


D-Flat, the Home Stretch


 This file contains the following executables: DFLT14.ARC D14TX.ARC


Al Stevens


As I write this, the Democratic National Convention is on TV in the background
trying to grab my attention. It's old news by the time you read this. Bill and
Hillary and Al and Tipper are riding the donkey, and in November we'll know
where it carried them. For now, we are watching the world a little closer and
occasionally seeing how our technology changes it.
Ollie North was amazed to learn that even though he deleted sensitive e-mail
messages between himself and Admiral Poindexter, PROFS made archive copies of
the deleted messages, and those copies later helped prove he was less than
forthcoming with the facts.
Before he withdrew, the media were probing Ross Perot's policies and positions
and questioning how informed he was on the issues. Perot built EDS into the
giant it is and became a billionaire. EDS is a leader in our industry. When
Perot was asked if his plans for an electronic national town meeting would use
technology similar to France's Minitel, he said he'd never heard of it. That's
how informed he is about his own business.
President Bush can't program his VCR and was impressed by the scanner at a
market check-out counter.
Dan Quayle's spell checker flunked the beta test.
Do you ever get the feeling that you're a little bit brighter than the people
in charge?


D-Flat Wrapup


This month concludes the series on D-Flat, the DOS text-mode CUA
user-interface library. Next month I'll begin D-Flat++, the rewrite in C++.
DF++ will not take the year-and-a-half that D-Flat did. I'm using DF++ to
examine the issues that surround a C++ rewrite and to compare the two
languages when applied to the same solution. Therefore, this column's DF++
coverage will include the source code that supports those discussions and will
not try to publish the whole enchilada. It will, however, be available for
download from CompuServe and M&T Online and under the DDJ "careware" program,
explained further on.
When I started D-Flat, there were only a few DOS text-mode options for
implementing the CUA interface. There are more now, including some new
surprises in the months to come. As Windows devours the PC
operating-environment marketplace, I wonder about the future of DOS text-mode
applications. Virtually every major PC application now has a Windows version,
which is stealing all the attention from its DOS ancestor. OS/2 is growling
from the sidelines, threatening to grab its share. There are other GUIs:
X-Window, Motif, NeXT, GeoWorks, OpenLook, GEM, AppleDOS, AmigaDOS. If you
could identify the lowest common denominator shared by all the GUIs, you could
write a C++ class library for each that would allow an application to compile
for any GUI platform without changes to the source code. The application
programs wouldn't be very interesting to the user because they wouldn't use
the slickest, unique qualities of any particular operating environment, but
they would be more or less portable.
To finish the D-Flat project, I will discuss the File Open and Save As dialog
boxes, the application window's status bar, and the text compression of the
D-Flat help database.


File Open Dialog Boxes


When operating an application, users specify existing filenames to open and
names of new files to create. D-Flat includes two dialog boxes for these
purposes: File Open and Save As dialog boxes. These dialog boxes allow the
user to select a file from any drive or subdirectory in the system. The format
includes a one-line edit box where the user types the file specification, a
text display that shows the current drive and subdirectory, and list boxes to
select existing files and change the drive and subdirectory.
Listing One, page 160, is direct.c, code that supports the open and save
dialog boxes for drive and directory processing. The CreatePath function
converts an ambiguous file specification into one where the path parts are
unambiguous and the filename and extension are represented at least by wild
cards. If the specification has no drive or subdirectory, the function adds
the currently logged-on drive and subdirectory to the path. If there is a
drive but no path, the function adds the path for that drive. If the
specification has no filename or extension, the function adds the * wild card
for them.
The DlgDirList function builds a list box with entries for all the disk drives
and subdirectories below the current one on the current drive. It also builds
a text-control display that shows the current drive and directory. The two
dialog boxes use this function to display where the user is in the file system
and give the choices for changing the drive and directory.
Listing Two, page 161, is fileopen.c, the program that implements the File
Open and Save As dialog boxes. The application calls either the
OpenFileDialogBox or SaveAsDialogBox function, depending on which dialog box
it wants. The former function accepts a wild-card parameter that specifies the
starting path and filename to search. if you passed the value "C:\DOS\*.EXE",
the dialog box would begin with the \DOS subdirectory on the C: drive and
would display all the *.EXE files in the filename list box. Both functions
accept a character-pointer parameter with an address to write the file
specification selected by the user.
The program begins by displaying the caller's filename specification in a
one-line edit box, the current drive and directory in a text control, a list
of filenames that match the file specification in a list box, and the
available drives and subdirectories in another list box. Users can type in a
new file specification or select one from the list box. Users can also select
a different drive or subdirectory, and the other displays on the dialog box
modify themselves to reflect the new path. When users choose the OK command,
the current path and filename are copied into the caller's string, and the
function returns.


The Status Bar


The D-Flat application window can have a status bar on the window's bottom
border. Listing Three, page 162, is statbar.c, code that implements the
window-processing module for the STATUS-BAR window class. The class captures
the clock for the window and displays the time for each CLOCKTICK event. When
the application wants to use the status bar to display a one-line text
message, it sends the ADDSTATUS message to the application window, which uses
the SETTEXT message to put the text into the window. The status bar's PAINT
message displays any text value the window contains.


Compressing Help


About two years ago, I published in this column the code that implements the
Huffman compression and decompression algorithms. I decided to adapt those
algorithms to compress the D-Flat help-text database for two reasons. First, a
compressed help file makes for a smaller distribution of your application.
Second, a compressed text file is protected from potential changes made by a
curious user.
Listings Four, Five, and Six, page 162, are htree.h, htree.c, and huffc.c,
code that compresses the help text. htree.h defines the structure of the
Huffman tree as it is built and its representation in the compressed file.
Huffman compression consists of reducing characters in the text to bit
strings. The more frequent characters are represented by shorter bit strings.
To build a Huffman tree, first read the text file and count the characters. At
the bottom of the tree is a spread of 256 nodes, each representing an
eight-bit value. At first, these nodes are all the tree has, and each one
records how many times the character it represents occurs in the text. The
first node represents the value 0, the second node represents 1, and so on.
None of these nodes has children or a parent. Next, pass through the nodes and
find the two that represent the characters having the least number of
occurrences in the text. For those two nodes, build a new node that is their
parent and that has as its frequency count the total of the two. That node
will point to its two child nodes. Repeat this scan, always by-passing nodes
that have grown parents, and include any new parent nodes that have been added
until there is only one node left that has no parent. That node will be the
root node of the tree.
After the tree is built, the program can write it to the compressed file. Not
all of the tree is needed, but the decompression algorithm needs enough of it
to decompress the text. You need the number of bytes in the text, the number
of parent nodes in the tree, and the offset to the root node. Then you need
only the child-node pointers for each of the nodes that are parents--the nodes
above the original 256.
To compress the text, the program rereads the file a byte at a time. For each
byte, the program traces a path from the root node down the right or left
children until it gets to the node that represents the character. For each
right path, the program writes a 0 bit. For each left path, the program writes
a 1 bit. The trick involves the program finding its way down the correct path.
To do that, the function begins at the base node that represents the character
and recursively calls itself until it gets to the root.
Listing Seven, page 163, is decomp.c, which contains the decompression code
for the help database. Decompression occurs twice. When the program loads the
help database, it reads the text and builds a table of help texts, specifying
the text identifier and its byte/bit offset into the compressed file. Later,
when the user asks for help, the program seeks the compressed text and
decompresses it for display.
The decompression algorithm is the inverse of the compression algorithm. As it
reads bits from the compressed stream, the program navigates from the root
node to the base, taking the left child path for a 1 bit and the right child
path for a 0 bit. When the path delivers a node number that is less than 256,
that number is the decompressed text character.


How to Get D-Flat Now



The D-Flat source code is on CompuServe in Library 0 of the DDJ Forum and on
M&T Online. If you cannot use either online service, send a formatted 360K or
720K diskette and an addressed, stamped diskette mailer to me in care of Dr.
Dobb's Journal, 411 Borel Ave., San Mateo, CA 94402. I'll send you the latest
version of D-Flat. The software is free, but if you'd care to, stuff a dollar
bill in the mailer for the Brevard County Food Bank. They help the homeless
and hungry. We call it DDJ's program of careware. If you want to discuss
D-Flat with me, use CompuServe. My ID is 71101,1262, and I monitor the DDJ
Forum daily.


What's Wrong with C++?


Its critics call it a "flawed language." Bjarne Stroustrup responds by saying,
"It works." To identify C++'s shortcomings, you must approach it with a
certain point of view. An OOP devotee and purist will observe that C++ permits
you to use the classic procedural approach--that it is a C superset with
object-oriented extensions, but one that does not enforce rigid
object-oriented design and programming. The procedural structured programming
gang will bash C++'s abilities to use the same function name for more than one
purpose, to overload operators to do nonintuitive things, and to invoke an
unseen barrage of hidden anonymous objects. The adherents to some--any--other
language will grouse that C++ is still C with all its attendant faults: The
identifiers are case-sensitive; the syntax is cryptic; everything is an
expression; you can stomp all over memory with pointers; and you can plough
outside the bounds of an array. Cobol, dBase, and Basic programmers will
complain that the priesthood is only getting more elite, elusive, and
exclusive. The C++ programmer will maintain that there is nothing at all to
criticize.
Obviously, those people are not qualified to critique C++. Who is? The C
programmer, that's who. You and me. But we need an open mind. We need to watch
for the thorns as we embrace the rose. In the months to come, I will discuss
some of the problems I perceive with C++. Not that I expect to influence the
language design or standardization. C++ is pretty much what it is going to be,
and its flaws are vastly overshadowed by its benefits. But, by understanding
its problems, we will know how to deal with them when we write code. And by
airing them, perhaps we can in some small way influence the next
programming-language designer.
There will be a tendency, particularly among compiler vendors, to respond to
such problems by explaining why the language must be that way. The compiler
would be difficult to write; the compiled code would be inefficient;
correcting the problem would disable some other feature or generate a new
problem. That's all fine. Although I am interested in such answers, learning
them is not my objective. What follows are C++ stumbling blocks, large and
small. We need to know what they are so we can step over or around them. The
following are just a few.
The size of operator is not polymorphic. If you apply it to an object from
within its base class, the operator delivers the size of the base class, not
the object. Now this isn't a big deal. It's just something that surprises you
if you didn't know about it. It also illustrates how defensive C++ programmers
can be about their language. Even programmers who didn't realize that sizeof
does that are quick to improvise all manner of extemporaneous reasons why it
simply must work that way. I don't care why it works that way. I don't even
care that it works that way. I just want to know about it, preferably before I
try to use it.
Base classes cannot construct new copies of objects that are of classes
derived from themselves. If an abstract base class is a disk-based container
class that holds objects of the derived class, the base class cannot construct
an empty object to read the next object into. It doesn't know the size of the
object. The derived class has to cooperate in its own persistence strategy.
The experts will tell you that templates are the answer, but...
Every use of a template compiles a new copy of the template code. If you use
the same container template for ints, longs, and complex structures, you will
get three copies of the code. All the code. Before you rush to write me a
letter about deriving the template from a general-purpose base class that uses
void pointers, remember two things. First, a purpose of templates is to do
away with inheritance solutions that hide types behind void pointers. Second,
in a class that really needs to be a template, most of the member functions
will be sensitive to the format of the template parameter.
One of the tenets of object-oriented design is that the class consists of two
parts--the interface and the implementation. The class user--the program that
creates an object of that class--sees the interface. The class itself sees the
implementation. The user does not need to see the implementation and does not
need to know its details. By hiding the implementation details, the class
protects the integrity of the object-oriented design.
A C++ programmer sees it all. There is no way to separate the details of the
interface from those of the implementation because the class definition
describes them both. The class member functions can be hidden in an
object-module library, but the data members are hanging out there for all the
world to see. The compiler needs to see them to know how to construct the
object. The public and private keywords protect the programmer from
inadvertently diddling with the implementation, but nothing stops you from
moving the public keyword in your own copy of the class to suit your own
needs. Benign protection at best.


A Book by Any Name...


Two years ago the first edition of my book Teach Yourself C++ came out, and I
mentioned it in this column. I cautioned readers that Herb Shildt would no
doubt soon have a book with the same title. Herb writes books with the Teach
Yourself title too, but for a different publisher. A laughing Herb called me
soon after the magazine hit the stands. He really enjoyed being needled in my
column, he said, but not to worry. Herb wasn't doing a Teach Yourself C++; I
would have full claim to that one. Well, Herb's Teach Yourself C++ just came
out--a mere two years later. I repeat what I said then. Don't be fooled by
imitations. Mine has a yellow cover; Herb's is purple, and I just got the old
Purple Herbie.
_C PROGRAMMING COLUMN_
by Al Stevens


[LISTING ONE]

/* ---------- direct.c --------- */
#include "dflat.h"

static char path[MAXPATH];
static char drive[MAXDRIVE] = " :";
static char dir[MAXDIR];
static char name[MAXFILE];
static char ext[MAXEXT];

/* --- Create unambiguous path from file spec, filling in the drive and
directory if incomplete. Optionally change to new drive and subdirectory ---
*/
void CreatePath(char *path,char *fspec,int InclName,int Change)
{
 int cm = 0;
 unsigned currdrive;
 char currdir[64];
 char *cp;

 if (!Change) {
 /* ---- save the current drive and subdirectory ---- */
 currdrive = getdisk();
 getcwd(currdir, sizeof currdir);
 memmove(currdir, currdir+2, strlen(currdir+1));
 cp = currdir+strlen(currdir)-1;
 if (*cp == '\\')
 *cp = '\0';
 }
 *drive = *dir = *name = *ext = '\0';
 fnsplit(fspec, drive, dir, name, ext);
 if (!InclName)
 *name = *ext = '\0';

 *drive = toupper(*drive);
 if (*ext)
 cm = EXTENSION;
 if (InclName && *name)
 cm = FILENAME;
 if (*dir)
 cm = DIRECTORY;
 if (*drive)
 cm = DRIVE;
 if (cm & DRIVE)
 setdisk(*drive - 'A');
 else {
 *drive = getdisk();
 *drive += 'A';
 }
 if (cm & DIRECTORY) {
 cp = dir+strlen(dir)-1;
 if (*cp == '\\')
 *cp = '\0';
 chdir(dir);
 }
 getcwd(dir, sizeof dir);
 memmove(dir, dir+2, strlen(dir+1));
 if (InclName) {
 if (!(cm & FILENAME))
 strcpy(name, "*");
 if (!(cm & EXTENSION) && strchr(fspec, '.') != NULL)
 strcpy(ext, ".*");
 }
 else
 *name = *ext = '\0';
 if (dir[strlen(dir)-1] != '\\')
 strcat(dir, "\\");
 memset(path, 0, sizeof path);
 fnmerge(path, drive, dir, name, ext);
 if (!Change) {
 setdisk(currdrive);
 chdir(currdir);
 }
}
static int dircmp(const void *c1, const void *c2)
{
 return stricmp(*(char **)c1, *(char **)c2);
}
BOOL DlgDirList(WINDOW wnd, char *fspec,
 enum commands nameid, enum commands pathid, unsigned attrib)
{
 int ax, i = 0, criterr = 1;
 struct ffblk ff;
 CTLWINDOW *ct = FindCommand(wnd->extension,nameid,LISTBOX);
 WINDOW lwnd;
 char **dirlist = NULL;

 CreatePath(path, fspec, TRUE, TRUE);
 if (ct != NULL) {
 lwnd = ct->wnd;
 SendMessage(ct->wnd, CLEARTEXT, 0, 0);

 if (attrib & 0x8000) {

 union REGS regs;
 char drname[15];
 unsigned int cd, dr;

 cd = getdisk();
 for (dr = 0; dr < 26; dr++) {
 unsigned ndr;
 setdisk(dr);
 ndr = getdisk();
 if (ndr == dr) {
 /* ----- test for remapped B drive ----- */
 if (dr == 1) {
 regs.x.ax = 0x440e; /* IOCTL func 14 */
 regs.h.bl = dr+1;
 int86(DOS, &regs, &regs);
 if (regs.h.al != 0)
 continue;
 }
 sprintf(drname, "[%c:]", dr+'A');

 /* ---- test for network or RAM disk ---- */
 regs.x.ax = 0x4409; /* IOCTL func 9 */
 regs.h.bl = dr+1;
 int86(DOS, &regs, &regs);
 if (!regs.x.cflag) {
 if (regs.x.dx & 0x1000)
 strcat(drname, " (Network)");
 else if (regs.x.dx == 0x0800)
 strcat(drname, " (RAMdisk)");
 }
 SendMessage(lwnd,ADDTEXT,(PARAM)drname,0);
 }
 }
 setdisk(cd);
 }
 while (criterr == 1) {
 ax = findfirst(path, &ff, attrib & 0x3f);
 criterr = TestCriticalError();
 }
 if (criterr)
 return FALSE;
 while (ax == 0) {
 if (!((attrib & 0x4000) &&
 (ff.ff_attrib & (attrib & 0x3f)) == 0) &&
 strcmp(ff.ff_name, ".")) {
 char fname[15];
 sprintf(fname, (ff.ff_attrib & 0x10) ?
 "[%s]" : "%s" , ff.ff_name);
 dirlist = DFrealloc(dirlist,
 sizeof(char *)*(i+1));
 dirlist[i] = DFmalloc(strlen(fname)+1);
 if (dirlist[i] != NULL)
 strcpy(dirlist[i], fname);
 i++;
 }
 ax = findnext(&ff);
 }
 if (dirlist != NULL) {
 int j;

 /* -- sort file/drive/directory list box data -- */
 qsort(dirlist, i, sizeof(void *), dircmp);
 /* ---- send sorted list to list box ---- */
 for (j = 0; j < i; j++) {
 SendMessage(lwnd,ADDTEXT,(PARAM)dirlist[j],0);
 free(dirlist[j]);
 }
 free(dirlist);
 }
 SendMessage(lwnd, SHOW_WINDOW, 0, 0);
 }
 if (pathid) {
 fnmerge(path, drive, dir, NULL, NULL);
 PutItemText(wnd, pathid, path);
 }
 return TRUE;
}






[LISTING TWO]

/* ----------- fileopen.c ------------- */
#include "dflat.h"

static BOOL DlgFileOpen(char *, char *, DBOX *);
static int DlgFnOpen(WINDOW, MESSAGE, PARAM, PARAM);
static void InitDlgBox(WINDOW);
static void StripPath(char *);
static BOOL IncompleteFilename(char *);

static char *OrigSpec;
static char *FileSpec;
static char *FileName;
static char *NewFileName;

static BOOL Saving;
extern DBOX FileOpen;
extern DBOX SaveAs;

/* ---- Dialog Box to select a file to open ---- */
BOOL OpenFileDialogBox(char *Fpath, char *Fname)
{
 return DlgFileOpen(Fpath, Fname, &FileOpen);
}
/* ---- Dialog Box to select a file to save as ---- */
BOOL SaveAsDialogBox(char *Fname)
{
 return DlgFileOpen(NULL, Fname, &SaveAs);
}
/* --------- generic file open ---------- */
static BOOL DlgFileOpen(char *Fpath, char *Fname, DBOX *db)
{
 BOOL rtn;
 char savedir[80];
 char OSpec[80];

 char FSpec[80];
 char FName[80];
 char NewFName[80];

 OrigSpec = OSpec;
 FileSpec = FSpec;
 FileName = FName;
 NewFileName = NewFName;

 getcwd(savedir, sizeof savedir);
 if (Fpath != NULL) {
 strncpy(FileSpec, Fpath, 80);
 Saving = FALSE;
 }
 else {
 *FileSpec = '\0';
 Saving = TRUE;
 }
 strcpy(FileName, FileSpec);
 strcpy(OrigSpec, FileSpec);

 if ((rtn = DialogBox(NULL, db, TRUE, DlgFnOpen)) != FALSE)
 strcpy(Fname, NewFileName);
 else
 *Fname = '\0';

 setdisk(toupper(*savedir) - 'A');
 chdir(savedir);

 return rtn;
}
static int CommandMsg(WINDOW wnd, PARAM p1, PARAM p2)
{
 switch ((int) p1) {
 case ID_FILENAME:
 if (p2 != ENTERFOCUS) {
 /* allow user to modify the file spec */
 GetItemText(wnd, ID_FILENAME, FileName, 65);
 if (IncompleteFilename(FileName) Saving) {
 strcpy(OrigSpec, FileName);
 StripPath(OrigSpec);
 }
 if (p2 != LEAVEFOCUS)
 SendMessage(wnd, COMMAND, ID_OK, 0);
 }
 return TRUE;
 case ID_OK:
 if (p2 != 0)
 break;
 GetItemText(wnd, ID_FILENAME,
 FileName, 65);
 strcpy(FileSpec, FileName);
 if (IncompleteFilename(FileName)) {
 /* no file name yet */
 InitDlgBox(wnd);
 strcpy(OrigSpec, FileSpec);
 return TRUE;
 }
 else {

 GetItemText(wnd, ID_PATH, FileName, 65);
 strcat(FileName, FileSpec);
 strcpy(NewFileName, FileName);
 }
 break;
 case ID_FILES:
 switch ((int) p2) {
 case ENTERFOCUS:
 case LB_SELECTION:
 /* selected a different filename */
 GetDlgListText(wnd, FileName, ID_FILES);
 PutItemText(wnd, ID_FILENAME, FileName);
 break;
 case LB_CHOOSE:
 /* chose a file name */
 GetDlgListText(wnd, FileName, ID_FILES);
 SendMessage(wnd, COMMAND, ID_OK, 0);
 break;
 default:
 break;
 }
 return TRUE;
 case ID_DRIVE:
 switch ((int) p2) {
 case ENTERFOCUS:
 if (Saving)
 *FileSpec = '\0';
 break;
 case LEAVEFOCUS:
 if (Saving)
 strcpy(FileSpec, FileName);
 break;
 case LB_SELECTION: {
 char dd[25];
 /* selected different drive/dir */
 GetDlgListText(wnd, dd, ID_DRIVE);
 if (*(dd+2) == ':')
 *(dd+3) = '\0';
 else
 *(dd+strlen(dd)-1) = '\0';
 strcpy(FileName, dd+1);
 if (*(dd+2) != ':' && *OrigSpec != '\\')
 strcat(FileName, "\\");
 strcat(FileName, OrigSpec);
 if (*(FileName+1)!=':'&&*FileName!='.') {
 GetItemText(wnd,ID_PATH,FileSpec,65);
 strcat(FileSpec, FileName);
 }
 else
 strcpy(FileSpec, FileName);
 break;
 }
 case LB_CHOOSE:
 /* chose drive/dir */
 if (Saving)
 PutItemText(wnd, ID_FILENAME, "");
 InitDlgBox(wnd);
 return TRUE;
 default:

 break;
 }
 PutItemText(wnd, ID_FILENAME, FileSpec);
 return TRUE;
 default:
 break;
 }
 return FALSE;
}
/* ---- Process File Open dialog box messages ---- */
static int DlgFnOpen(WINDOW wnd,MESSAGE msg,PARAM p1,PARAM p2)
{
 switch (msg) {
 case CREATE_WINDOW: {
 int rtn = DefaultWndProc(wnd, msg, p1, p2);
 DBOX *db = wnd->extension;
 WINDOW cwnd = ControlWindow(db, ID_FILENAME);
 SendMessage(cwnd, SETTEXTLENGTH, 64, 0);
 return rtn;
 }
 case INITIATE_DIALOG:
 InitDlgBox(wnd);
 break;
 case COMMAND:
 if (CommandMsg(wnd, p1, p2))
 return TRUE;
 break;
 default:
 break;
 }
 return DefaultWndProc(wnd, msg, p1, p2);
}
/* ---- Initialize the dialog box ---- */
static void InitDlgBox(WINDOW wnd)
{
 if (*FileSpec && !Saving)
 PutItemText(wnd, ID_FILENAME, FileSpec);
 if (DlgDirList(wnd, FileSpec, ID_FILES, ID_PATH, 0)) {
 StripPath(FileSpec);
 DlgDirList(wnd, "*.*", ID_DRIVE, 0, 0xc010);
 }
}
/* ---- Strip drive and path information from file spec ---- */
static void StripPath(char *filespec)
{
 char *cp, *cp1;
 cp = strchr(filespec, ':');
 if (cp != NULL)
 cp++;
 else
 cp = filespec;
 while (TRUE) {
 cp1 = strchr(cp, '\\');
 if (cp1 == NULL)
 break;
 cp = cp1+1;
 }
 strcpy(filespec, cp);
}

/* ---- test for an incomplete file name ---- */
static BOOL IncompleteFilename(char *s)
{
 int lc = strlen(s)-1;
 if (strchr(s, '?') strchr(s, '*') !*s)
 return TRUE;
 if (*(s+lc) == ':' *(s+lc) == '\\')
 return TRUE;
 return FALSE;
}







[LISTING THREE]

/* ---------------- statbar.c -------------- */
#include "dflat.h"

int StatusBarProc(WINDOW wnd, MESSAGE msg, PARAM p1, PARAM p2)
{
 char *statusbar;
 switch (msg) {
 case CREATE_WINDOW:
 case MOVE:
 SendMessage(wnd, CAPTURE_CLOCK, 0, 0);
 break;
 case KEYBOARD:
 if ((int)p1 == CTRL_F4)
 return TRUE;
 break;
 case PAINT:
 if (!isVisible(wnd))
 break;
 statusbar = DFcalloc(1, WindowWidth(wnd)+1);
 memset(statusbar, ' ', WindowWidth(wnd));
 *(statusbar+WindowWidth(wnd)) = '\0';
 strncpy(statusbar+1, "F1=Help", 7);
 if (wnd->text) {
 int len = min(strlen(wnd->text),
 WindowWidth(wnd)-17);
 if (len > 0) {
 int off=(WindowWidth(wnd)-len)/2;
 strncpy(statusbar+off, wnd->text, len);
 }
 }
 if (wnd->TimePosted)
 *(statusbar+WindowWidth(wnd)-8) = '\0';
 SetStandardColor(wnd);
 PutWindowLine(wnd, statusbar, 0, 0);
 free(statusbar);
 return TRUE;
 case BORDER:
 return TRUE;
 case CLOCKTICK:
 SetStandardColor(wnd);

 PutWindowLine(wnd,(char *)p1,WindowWidth(wnd)-8,0);
 wnd->TimePosted = TRUE;
 return TRUE;
 case CLOSE_WINDOW:
 SendMessage(NULL, RELEASE_CLOCK, 0, 0);
 break;
 default:
 break;
 }
 return BaseWndProc(STATUSBAR, wnd, msg, p1, p2);
}







[LISTING FOUR]

/* ------------------- htree.h -------------------- */
#ifndef HTREE_H
#define HTREE_H

typedef unsigned int BYTECOUNTER;

/* ---- Huffman tree structure for building ---- */
struct htree {
 BYTECOUNTER cnt; /* character frequency */
 int parent; /* offset to parent node */
 int right; /* offset to right child node */
 int left; /* offset to left child node */
};
/* ---- Huffman tree structure in compressed file ---- */
struct htr {
 int right; /* offset to right child node */
 int left; /* offset to left child node */
};
extern struct htr *HelpTree;
void buildtree(void);
FILE *OpenHelpFile(void);
void HelpFilePosition(long *, int *);
void *GetHelpLine(char *);
void SeekHelpLine(long, int);

#endif






[LISTING FIVE]

/* ------------------- htree.c -------------------- */
#include "dflat.h"
#include "htree.h"

struct htree *ht;

int root;
int treect;

/* ------ build a Huffman tree from a frequency array ------ */
void buildtree(void)
{
 int i;
 treect = 256;
 /* ---- preset node pointers to -1 ---- */
 for (i = 0; i < treect; i++) {
 ht[i].parent = -1;
 ht[i].right = -1;
 ht[i].left = -1;
 }
 /* ---- build the huffman tree ----- */
 while (1) {
 int h1 = -1, h2 = -1;
 /* ---- find the two lowest frequencies ---- */
 for (i = 0; i < treect; i++) {
 if (i != h1) {
 struct htree *htt = ht+i;
 /* --- find a node without a parent --- */
 if (htt->cnt > 0 && htt->parent == -1) {
 /* ---- h1 & h2 -> lowest nodes ---- */
 if (h1 == -1 htt->cnt < ht[h1].cnt) {
 if (h2 == -1 ht[h1].cnt < ht[h2].cnt)
 h2 = h1;
 h1 = i;
 }
 else if (h2 == -1 htt->cnt < ht[h2].cnt)
 h2 = i;
 }
 }
 }
 /* --- if only h1 -> a node, that's the root --- */
 if (h2 == -1) {
 root = h1;
 break;
 }
 /* --- combine two nodes and add one --- */
 ht[h1].parent = treect;
 ht[h2].parent = treect;
 ht = realloc(ht, (treect+1) * sizeof(struct htree));
 if (ht == NULL)
 break;
 /* --- the new node's frequency is the sum of the two
 nodes with the lowest frequencies --- */
 ht[treect].cnt = ht[h1].cnt + ht[h2].cnt;
 /* - the new node points to the two that it combines */
 ht[treect].right = h1;
 ht[treect].left = h2;
 /* --- the new node has no parent (yet) --- */
 ht[treect].parent = -1;
 treect++;
 }
}







[LISTING SIX]

/* ------------------- huffc.c -------------------- */
#include "dflat.h"
#include "htree.h"

extern struct htree *ht;
extern int root;
extern int treect;
static int lastchar = '\n';

static void compress(FILE *, int, int);
static void outbit(FILE *fo, int bit);

static int fgetcx(FILE *fi)
{
 int c;
 /* ------- bypass comments ------- */
 if ((c = fgetc(fi)) == ';' && lastchar == '\n')
 do {
 while (c != '\n' && c != EOF)
 c = fgetc(fi);
 } while (c == ';');
 lastchar = c;
 return c;
}
void main(int argc, char *argv[])
{
 FILE *fi, *fo;
 int c;
 BYTECOUNTER bytectr = 0;

 if (argc < 3) {
 printf("\nusage: huffc infile outfile");
 exit(1);
 }
 if ((fi = fopen(argv[1], "rb")) == NULL) {
 printf("\nCannot open %s", argv[1]);
 exit(1);
 }
 if ((fo = fopen(argv[2], "wb")) == NULL) {
 printf("\nCannot open %s", argv[2]);
 fclose(fi);
 exit(1);
 }
 ht = calloc(256, sizeof(struct htree));

 /* - read the input file and count character frequency - */
 while ((c = fgetcx(fi)) != EOF) {
 c &= 255;
 ht[c].cnt++;
 bytectr++;
 }
 /* ---- build the huffman tree ---- */
 buildtree();
 /* --- write the byte count to the output file --- */

 fwrite(&bytectr, sizeof bytectr, 1, fo);
 /* --- write the tree count to the output file --- */
 fwrite(&treect, sizeof treect, 1, fo);
 /* --- write the root offset to the output file --- */
 fwrite(&root, sizeof root, 1, fo);
 /* -- write the tree to the output file -- */
 for (c = 256; c < treect; c++) {
 int lf = ht[c].left;
 int rt = ht[c].right;
 fwrite(&lf, sizeof lf, 1, fo);
 fwrite(&rt, sizeof rt, 1, fo);
 }
 /* ------ compress the file ------ */
 fseek(fi, 0L, 0);
 while ((c = fgetcx(fi)) != EOF)
 compress(fo, (c & 255), 0);
 outbit(fo, -1);
 fclose(fi);
 fclose(fo);
 free(ht);
 exit(0);
}
/* ---- compress a character value into a bit stream ---- */
static void compress(FILE *fo, int h, int child)
{
 if (ht[h].parent != -1)
 compress(fo, ht[h].parent, h);
 if (child) {
 if (child == ht[h].right)
 outbit(fo, 0);
 else if (child == ht[h].left)
 outbit(fo, 1);
 }
}
static char out8;
static int ct8;
/* -- collect and write bits to the compressed output file -- */
static void outbit(FILE *fo, int bit)
{
 if (ct8 == 8 bit == -1) {
 while (ct8 < 8) {
 out8 <<= 1;
 ct8++;
 }
 fputc(out8, fo);
 ct8 = 0;
 }
 out8 = (out8 << 1) bit;
 ct8++;
}






[LISTING SEVEN]

/* ------------------- decomp.c -------------------- */

/* Decompress the application.HLP file or load the application.TXT file if
 * the .HLP file does not exist */

#include "dflat.h"
#include "htree.h"

static int in8;
static int ct8 = 8;
static FILE *fi;
static BYTECOUNTER bytectr;
static int LoadingASCII;
struct htr *HelpTree;
static int root;

/* ------- open the help database file -------- */
FILE *OpenHelpFile(void)
{
 char *cp;
 int treect, i;
 char helpname[65];
 /* -------- get the name of the help file ---------- */
 BuildFileName(helpname, ".hlp");
 if ((fi = fopen(helpname, "rb")) == NULL) {
 /* ---- no .hlp file, look for .txt file ---- */
 if ((cp = strrchr(helpname, '.')) != NULL) {
 strcpy(cp, ".TXT");
 fi = fopen(helpname, "rt");
 }
 if (fi == NULL)
 return NULL;
 LoadingASCII = TRUE;
 }
 if (!LoadingASCII && HelpTree == NULL) {
 /* ----- read the byte count ------ */
 fread(&bytectr, sizeof bytectr, 1, fi);
 /* ----- read the frequency count ------ */
 fread(&treect, sizeof treect, 1, fi);
 /* ----- read the root offset ------ */
 fread(&root, sizeof root, 1, fi);
 HelpTree = DFcalloc(treect-256, sizeof(struct htr));
 /* ---- read in the tree --- */
 for (i = 0; i < treect-256; i++) {
 fread(&HelpTree[i].left, sizeof(int), 1, fi);
 fread(&HelpTree[i].right, sizeof(int), 1, fi);
 }
 }
 return fi;
}
/* ----- read a line of text from the help database ----- */
void *GetHelpLine(char *line)
{
 int h;
 if (LoadingASCII) {
 void *hp;
 do
 hp = fgets(line, 160, fi);
 while (*line == ';');
 return hp;
 }

 *line = '\0';
 while (TRUE) {
 /* ----- decompress a line from the file ------ */
 h = root;
 /* ----- walk the Huffman tree ----- */
 while (h > 255) {
 /* --- h is a node pointer --- */
 if (ct8 == 8) {
 /* --- read 8 bits of compressed data --- */
 if ((in8 = fgetc(fi)) == EOF) {
 *line = '\0';
 return NULL;
 }
 ct8 = 0;
 }
 /* -- point to left or right node based on msb -- */
 if (in8 & 0x80)
 h = HelpTree[h-256].left;
 else
 h = HelpTree[h-256].right;
 /* --- shift the next bit in --- */
 in8 <<= 1;
 ct8++;
 }
 /* --- h < 255 = decompressed character --- */
 if (h == '\r')
 continue; /* skip the '\r' character */
 /* --- put the character in the buffer --- */
 *line++ = h;
 /* --- if '\n', end of line --- */
 if (h == '\n')
 break;
 }
 *line = '\0'; /* null-terminate the line */
 return line;
}
/* --- compute the database file byte and bit position --- */
void HelpFilePosition(long *offset, int *bit)
{
 *offset = ftell(fi);
 if (LoadingASCII)
 *bit = 0;
 else {
 if (ct8 < 8)
 --*offset;
 *bit = ct8;
 }
}
/* -- position the database to the specified byte and bit -- */
void SeekHelpLine(long offset, int bit)
{
 int i;
 fseek(fi, offset, 0);
 if (!LoadingASCII) {
 ct8 = bit;
 if (ct8 < 8) {
 in8 = fgetc(fi);
 for (i = 0; i < bit; i++)
 in8 <<= 1;

 }
 }
}



























































October, 1992
STRUCTURED PROGRAMMING


Parts isn't Parts




Jeff Duntemann, KG7JF


Shakespeare is in pretty good shape, for an $1800, 23-year-old Chevelle. A
bash here, a crunch there--so Carol and I drove him down to Dagley's Auto
Wrecking, Specializing in Early GM Muscle Cars. As junkyards go, it was a
pretty tidy place. There were Chevelles and GTOs all over the place, mostly in
pieces, but the pieces were stacked up in something I'd even call order, and
most of them were marked as to what they had been in their previous lives.
Larry Dagley is a pleasant enough guy, about my age, built like a weight
lifter who spends his spare time bench pressing rear-axle assemblies. Guys
like that rarely treat ignorance with anything like respect, so I learned the
jargon before I went down there. You don't ask for a trunk lid; there's no
such thing. It's a deck lid. Sure, I knew that.
Dagley had a lot of deck lids. We went out into the yard to take a look. We
were followed by Jenny, Larry Dagley's junkyard...er...donkey. All the while
Carol and I turned over deck lids looking for rust holes and dents, Jenny
stayed just beyond reach, cropping the dry May grass that was growing up
through the holes in a big-block V8 crankcase. Neither of us was sure what a
junkyard donkey would do if we misbehaved, so we studiously behaved, and Jenny
did nothing to dispel the mystery.
The problem was, Dagley had no '69 deck lids worth buying. He had a very nice
'72 lid in regurgitated avocado green, which would bolt on and work just fine.
But...it was a '72. The insignia was different. I was faced with a decision I
hadn't had to face before: Did I want to build a show car, or did I just want
something to cruise to the hamburger stand in? Because if I wanted a show car,
I could not blithely bolt a '72 deck lid onto a '69 Malibu Sports Coupe.
Uh-uh, no way.
Parts is parts, right?
Well, that depends utterly on what you're building.


The Parts is Parts Fallacy


Which returns us to the software design issue that I impetuously stated I
would attack in detail lo these many months ago. I proposed a design project
based on Turbo Vision -- but once I came to understand Turbo Vision well
enough to use it, I realized that all my previous training in software design
was for nothing. I'll admit right now, I can't decide how to design an
application in Turbo Vision.
There's a fallacy in software design circles that I'll call the Parts is Parts
Fallacy. It holds that the sorts of tools and libraries you use don't have any
bearing on your design strategy; that a design should and must transcend such
gritty, low-level issues. I used to think that myself. Then I had to confront
this thing called an "application framework." An application using Turbo
Vision already has a design--and the design question that remains is a brand
new one: How do you define the event paths required to breathe life into Turbo
Vision? The answer, I think, is that we need a whole new design discipline
specifically for event-driven programming, and that such a design discipline
does not yet exist.
I said in an earlier column that software design is at the highest level a
process of defining your constraints and living within them. The tools and
libraries you use are, of course, among those constraints. What I didn't
realize at the time is that the design scope of our tools and libraries has
grown larger and larger over the years. What used to be a sackful of
relatively independent subroutines may now be an interlocking web of objects
that weaves itself into your application from the very highest to the very
lowest levels, in totally non-obvious ways.
Most of the traditional software design texts like Yourdon and Constantine
make the generally unstated assumption that the programmer has full control
over a software design at all levels above the level of simple subroutines.
This is how I learned software design--out of Ed Yourdon and Larry
Constantine's seminal 1975 text, Structured Design (Prentice Hall)--and it is
still the way that nearly all programmers pursue their craft.
Well, programming (like American politics) is becoming less homogeneous and
more tribal in nature as time goes on. One design strategy will no longer fit
all. There are design cultures now; lots of them, and you must choose the one
that works best, depending on what you're building.


Design Levels


Complicating the design equation a little is the fact that software comes in
many sizes, from single-purpose, filter-style utilities to massive,
multi-application systems like those I used to fight with on System/370
mainframes in the early '80s. Design methods that work well at one level work
poorly or won't work at all at other levels.
Having given it a lot of thought, I've drawn out a map of how I see the
software-design equation, as shown in Figure 1. Keep in mind that this is just
my view of things, based on my own experiences. You may see it differently --
but it works for me, and may help people who still don't have a clue about
this stuff.
Design problems have sorted themselves out for me over the years in terms of
the level of coupling of the components being used. The vertical axis of
Figure 1 relates to this level of coupling, with the greatest level of
coupling at the bottom of the map, and the least at the top.
Coupling can be tough to define if you aren't steeped in the lore of software
design. Coupling is the degree to which the individual components share
assumptions. The coupling between two adjacent statements in a program is 100
percent, because they share assumptions about scope, local and global
variables and the general mission of the code sequence that they're part of.
At the other end of the spectrum (and at the top of Figure 1), the coupling
between two applications in an information system is probably closer to 5
percent, maybe less. The two applications share only a handful of very
high-level assumptions about how they work together, and perhaps some
additional assumptions about how data passes between them. Aside from that,
they're highly independent entities, and don't even have to be running on the
same machine or even the same kind of machine.


The Other Meaning of "System"


There's a source of confusion here. The word "system" has two very different
meanings in programming parlance. In the PC world, "system-level programming"
means working right down at the metal, hacking things like drivers, BIOS
layers, and so on. Don't confuse this with what a lot of people call an
information system; that is, a coordinated, ongoing process that includes
multiple applications running sequentially or concurrently, on one or perhaps
many different machines, with manual operations, data entry, output reporting,
and perhaps several different levels of connectedness (does anyone aside from
me abhor that awful nonword, "connectivity?") through different technologies
among the several host processors.
People who have worked in UNIX or mainframe shops know what an information
system is; many people who work solely on PCs do not. Much of UNIX
programming, even on a much more modest scale, is done on the information
system model; UNIX utilities can be strung together with one utility piping
data into another with very little coupling between the utilities. PC
platforms have lacked this level of operating-system intermediation until very
recently. However, Microsoft's object linking and embedding (OLE) API,
introduced with Windows 3.1, will allow serious application integration on the
information-system model, right there on your PC. But that's another column or
six; we'll get to it.
At the information-system level, I've seen nothing to match Ed Yourdon's
method of structured design. The Yourdon scheme focuses overwhelmingly on the
flows of data through an information system, and assumes an extremely low
level of coupling between a system's components.
Keeping coupling to a minimum is a good goal to have, as long as you know when
it simply i n't possible. Yourdon's structured design method breaks down when
you start working on a single application whose components, for efficiency's
sake or for other reasons (like the unavoidable internal coupling level of
Turbo Vision), are tightly coupled.
I'm not going to recap structured design, Yourdon-style, here. It works best
in massive systems running on several machines, and I don't think most of you
walk that path. For something like a modest vertical-market application, I
think the Yourdon scheme, while usable, quickly gets to be more trouble than
it's worth.


Procedure-level Design


Down at the other end of things is procedure-level design, which is quite
simply the design of program elements that do Just One Thing. This encompasses
typical Pascal procedures and functions, object methods, and some simple
filter-style utility programs.
In my experience, most people design a procedure in the following way: They
define in a paragraph or two what the procedure must do (often without ever
writing that definition down), then define the nature of the inputs and
outputs, and finally draw a flowchart that steps through the statements and
branches that implement the procedure's mission. When the time comes to
actually write the code, they write it right from the flowchart.
This works. I did it a lot while I was writing Cobol, Basic, and some of the
experimental in-house languages in use at Xerox in the late '70s. The
flowchart is the bulk of the design, and love 'em or hate 'em, flowcharts have
the advantage that they can be implemented in nearly any language, no matter
how primitive.
Flowcharts have the massive disadvantage that they come to us from the dawn of
time, and don't express the control-flow structures that define structured
programming today. You can fake a for loop in a flowchart with some care, but
there's no single symbol that represents a for loop, or a while loop, or
anything more than steps-and-branches. Flowcharts are assembly language tools,
and they have this nasty habit of making your Pascal code come out looking
like some weird variant of assembly language.



Successive Refinement


I stuck with flowcharts for procedure-level design for a long time because
they were what I had. Then I read a remarkable book called Programming
Proverbs, by Henry Ledgard (Hayden Books, 1975). It described a method for
designing procedures called stepwise refinement, a term I later learned was
coined by Niklaus Wirth himself, the man who designed Pascal, Modula-2, and
Oberon.
You may not be able to find this book anymore, but if you spot a copy down at
Just Used Books 'N Things Etc., grab it. It's not product specific because it
comes from a time when there were no products, and the perspective is
certainly refreshing.
Successive refinement substitutes pseudocode for flowcharts as the end-product
of a design task. Pseudocode is English-like verbiage that describes
statement-level program action in structured, language-independent fashion.
(At least for languages that implement the standard suite of control-flow
structures.) There's no standard definition for pseudocode, otherwise somebody
would write a compiler that compiled it directly to .EXE, and it wouldn't be
pseudo anymore. What matters is that it be both logically correct and
understandable.
As with flowcharts, pseudocode can be implemented in any structured language.
It's a much shorter trip to real code than from flowcharts, since all the
control-flow structures are there in the pseudocode in English-like form. In
fact, the biggest problem in writing pseudocode is resisting the temptation to
sprinkle it with actual program statements. If you really need genuine,
language-independent pseudocode (and if you ever in any possible world would
have to switch languages, it's a damned good thing to have in a drawer
somewhere), you'd better watch yourself pretty hard. On the other hand, if you
simply work in one language and that's all, you can make the transformation
from pseudocode to real code a gradual one, and drop in the actual code
statements at any point where they occur to you.


The Process


Successive refinement begins with a single, precise statement of what the
procedure must do, preferably written in one sentence. Why one sentence? It's
a trick I use to enforce a proper narrowness to the mission of the procedure.
A procedure should not try to do too much. A single procedure that is, in
truth, two or more procedures tightly coupled to one another inside a phony
single-procedure shell will cause you no end of trouble later on.
Let's pull a simple example together here. Suppose in your struggles you
unearth a need to determine how long the longest line in a given text file is
-- and say you're still green enough so that you can't just code it all
directly in the back of your head. Start with a concise statement of what the
procedure must do:
Return the length of the longest line in the opened text file passed as a
parameter.
They won't all be this crisp, and when they're not, I suggest suffering over
that initial statement a little. Mistakes made early in the process can't
always be corrected later. More often than not, a bad initial statement will
cause you to paint yourself into a corner later on and force you to start from
scratch.
Once you have an initial statement you can live with, begin to refine it. You
refine it by breaking it down into its major component actions. Work in
levels; that is, don't try to go from initial statement to finished pseudocode
in one swell foop unless the proc is totally trivial. The understanding of the
problem you gain in defining the pseudocode at each level will help you more
crisply define the next level. In other words, work it through. Like it says
on every paint can ever made, several thin coats are better than one thick
coat.
To continue, take a stab at refining our initial statement:
Position the file to its first record.
Scan the file, replacing a maximum-length value with a new length value each
time a longer one is found.
After the last record is read and tested, return the maximum-length value
found.
Our initial statement had at least three statements inside it. Examine each of
the new statements individually, to see if they make sense. If they do, refine
again:
Initialize the maximum-length variable to 0.
Position the file to its first record.
While records are in the file, repeat this:
Read a record.
If its length is greater than the value of the maximum-length variable,
replace the maximum-length variable's value with the new length.
After the last record has been read and tested, return the value of the
maximum-length variable.
Notice that during this refinement we've implicitly defined a variable. Some
purists have challenged me to define all my variables before I begin refining
the initial problem statement, since everybody knows that data drives good
design. Well ... not quite. At the procedure-design level, code and data are
peers. We're not fussing with Big Picture stuff here. We're zeroing in on
individual code statements. The refinement of the nature of the procedure's
data is as much a part of the process as the refinement of the nature of its
code. You should write down variables in some sort of a separate list as you
determine that you need them. "MaxValue; an integer" is all you need to say.


Pseudocode Tools


Demented writer/editor that I am, I do now and have always written my
pseudocode in my favorite word processor. (Heck, I used to write my Pascal/MT+
code itself in WordStar's nondocument mode.) My friend Chris Nelson pointed
out something that I am genuinely amazed not to have hit upon before now:
Outline processors are naturals for successive refinement.
If you write your pseudocode in an outline processor, you have the ability to
refine and retain each level of detail rather than simply expanding a level
and thereby losing the prior level. This gives you an intelligent way to
accomplish the "artful hiding of detail" that Niklaus Wirth says is the main
purpose of structured programming.
If you have an outline processor lying around or can find one, give it a try.
It still feels a little strange to me, but I can sense myself gradually
becoming addicted.


Where to Stop


Knowing how far to take pseudocode is a bit of an art, and again it depends on
what the pseudocode will eventually be used for. What I watch for is the point
when all ambiguity has left the pseudocode. That, too, is a judgment call.
Once you have your pseudocode, look it over with a critical eye. Most
importantly, see if you've left anything out. Real-world procedures that deal
with files should have some sort of error handling, and I haven't yet added
this to the pseudocode described earlier. There may be other things, too --
does the procedure have to set a help context somehow? Are all variables that
need initializing initialized? Does some aspect of the pseudocode's action
imply a variable that I haven't explicitly described and initialized? (I've
been stung on this one more than once....)
The overwhelming tendency among programmers would be to immediately take the
pseudocode for a procedure to real code once that procedure's pseudocode was
declared finished. It can be helpful to hold back and at least design any
related procedures before beginning coding. Defining the procs that work with
a proc you've already defined can spotlight conceptual errors in the first
design. You may think of some new task that has to be done somewhere, and the
best somewhere (after you've designed a half-dozen somewheres) may well be
within the first procedure you designed.
It works both ways. I'm not a purist but a realist, and I take some heat for
that occasionally. One of the heretical points I have made is that coding one
subsystem can shed certain kinds of light on the design of another subsystem
that no amount of analysis or deep thought can. This is especially true if
your tools are evolving faster than you can climb their learning curve to
genuine mastery. (This has been a growing problem in the last few years, as
machine performance and tool sophistication continue spiraling out of sight.)
This is another consequence of the Parts is Parts Fallacy: Like it or not,
your tools occasionally dictate to you, and sometimes you can do nothing but
bow and nod. You may not have time to get really good at a tool before
beginning work on a project. The project may be the only way to learn the
tool.
In a perfect world, where all programmers are full masters of their tools and
the tools sit still for years on end, you design fully before you begin
coding. In our world, you do whatcha gotta do to make things work.


Aiming for the Middle


I've recapped procedure-level design here because I can; it's well defined and
just about everybody can learn to do it well following the guidelines in this
column.
You'll notice, however, that we haven't yet touched on the middle of the
diagram in Figure 1. This is where all the neat stuff happens, and it is also
the toughest area in which to design. Designing at the information-system
level is messy simply because the system tends to be big. Making it work at
all is the realistic goal -- few teams that implement such big systems ever
bother to try to make them work efficiently or quickly. If the sole value for
a system is that it work, formal methods can serve you well, in that they can
guide you to a piece of code that produces the set of logical outputs for a
given set of logical inputs.
At the information-system level, flexibility is also an important value,
because when a system is spread out over a WAN or crosses the boundaries
between mainframe, mini, and PC, chunks of the system tend to be ripped out
and replaced regularly. Minimal coupling is thus essential, and performance
can only be tuned with minimal coupling enforced. Since information systems
are almost always custom software without competition on the open market, the
users are stuck with what they get and performance or usability is less of an
issue than with commercial applications.
The middle of the chart is the area where you're squeezed between the rocks
and the sky. It's all well and good to be a design purist and do things "by
the book" -- only to discover that the application works so badly that no one
will buy it. I've seen this happen -- almost always to innocents who are right
out of school and green enough to believe everything their design textbooks
tell them.
The thing to understand about application-level design, if you understand
absolutely nothing else, is this: You cannot substitute formal methods for a
thorough understanding of the problem and a creative enthusiasm for the task.
Parts is not parts.
Understanding is everything.

That's it for this issue. Slap down Figure 1 on your copier and tape the copy
to your wall. We'll come back to it next month.





























































October, 1992
GRAPHICS PROGRAMMING


How to Shear a Sheep, and Other Texture-mapping Niceties


 This article contains the following executables: XSHRP21.ZIP


Michael Abrash


I recently spent an hour or so learning how to shear a sheep. Among other
things, I learned -- in great detail -- about the importance of selecting the
proper comb for your shears, heard about the man who holds the world's record
for sheep sheared in a day (more than 600, if memory serves), and discovered,
Lord help me, the many and varied ways in which the New Zealand Sheep Shearing
Board improves the approved sheep-shearing method every year. The fellow
giving the presentation did his best, but let's face it, sheep just aren't
very interesting. If you have children, you'll know why I was there; if you
don't, there's no use explaining.
The chap doing the shearing did say one thing that stuck with me, although it
may not sound particularly profound. (Actually, it sounds pretty silly, but
bear with me.) He said, "You don't get really good at sheep shearing for ten
years, or 10,000 sheep." I'll buy that. In fact, to extend that morsel of
wisdom to the greater, non-ovine-centric universe, it actually takes a good
chunk of experience before you get good at anything worthwhile -- especially
graphics, for a couple of reasons. First, performance matters a lot in
graphics, and performance programming is largely a matter of experience. You
can't speed up PC graphics simply by looking in a book for a better algorithm;
you have to understand the code C compilers generate, assembly language
optimization, VGA hardware, and the performance implications of various
graphics-programming approaches and algorithms. Second, computer graphics is a
matter of illusion, of convincing the eye to see what you want it to see, and
that's very much a black art based on experience.
This month, experience figures into our current subject, real-time 3-D
animation, in several ways. Stay tuned.


Visual Quality: A Black Hole ... Er, Art


Pleasing the eye with real-time computer animation is something less than a
science, at least at the PC level, where there's no time for antialiasing and
a limited color palette; in fact, sometimes it can be more than a little
frustrating. For example, last month I implemented texture mapping in X-Sharp,
the 3-D animation package that's an ongoing project in this column. My first
implementation was disappointing; the texture maps shimmied and sheared badly,
like a loosely affiliated flock of pixels, each marching to its own drummer.
Then I added a control key to speed up the rotation; what a difference! The
aliasing problems were still there, but with the faster rotation, the pixels
moved too quickly for the eye to pick up on the aliasing; the rotating texture
maps, and the rotating ball as a whole, crossed the threshold into being
accepted by the eye as a viewed object, rather than a collection of pixels.
The obvious lesson here is that adequate speed is important to convincing
animation. There's another, less obvious side to this lesson though. I'd been
running the texture-mapping demo on a 20-MHz 386 with a slow VGA when I
discovered the beneficial effects of greater speed. When, some time later, I
ran the demo on a 33-MHz 486 with a fast VGA, I found that the faster rotation
was too fast! The ball spun so rapidly that the eye couldn't blend successive
images together into continuous motion, much like watching a badly flickering
movie.
So the second lesson is that either too little or too much speed can destroy
the illusion. Unless you're antialiasing, you need to tune the shifting of
your images so that they're in the "sweet spot" of apparent motion, in which
the eye is willing to ignore the jumping and aliasing, and blend the images
together into continuous motion. Only experience can give you a feel for that
sweet spot.


Fixed-point Arithmetic, Redux


Last month, I added texture mapping to X-Sharp, but lacked space to explain
some of the finer points. This month, I'll cover some of those points, and
discuss the visual and performance enhancements I've added since last month.
In the very first installment of this column, I spent a good bit of time
explaining exactly which pixels were inside a polygon and which were outside,
and how to draw those pixels accordingly. This was important, I said, because
only with a precise, consistent way of defining inside and outside would it be
possible to draw adjacent polygons without either overlap or gaps between
them.
As a corollary, I added that only an all-integer, edge-stepping approach would
do for polygon filling. Fixed-point arithmetic, although alluring for speed
and ease of use, would be unacceptable because round-off error would result in
imprecise pixel placement.
More than a year then passed, during which time my long-term memory apparently
suffered at least partial failure. When I went to implement texture mapping
last month, I decided that since transformed destination vertices can fall at
fractional pixel locations, the cleanest way to do the texture mapping would
be to use fixed-point coordinates for both the source texture and the
destination screen polygon. That way, there would a minimum of distortion as
the polygon rotated and moved. Theoretically, that made sense; but there was
one small problem: gaps between polygons.
Yes, folks, I had ignored the voice of experience (my own voice, at that) at
my own peril. You can be assured I will not forget this particular lesson
again: Fixed-point arithmetic is not precise. That's not to say that it's
impossible to use fixed-point for drawing polygons; if all adjacent edges
share common start and end vertices and common edges are always stepped in the
same direction, then all polygons should share the same fixed-point
imprecision, and edges should fit properly (although polygons may not include
exactly the right pixels). What you absolutely cannot do is mix fixed point
and all-integer polygon-filling approaches when drawing, as shown in Figure 1.
Consequently, I ended up using an all-integer approach in X-Sharp for stepping
through the destination polygon. However, I kept the fixed point approach,
which is faster and much simpler, for stepping through the source. Why was it
all right to mix approaches in this case? Precise pixel placement only matters
when drawing, because otherwise we can get gaps, which are very visible. When
selecting a pixel to copy from the source texture, however, the worst that
happens is that we pick the source pixel next to the one we really want,
causing the mapped texture to appear to have shifted by one pixel at the
corresponding destination pixel; given all the aliasing and shearing already
going on in the texture-mapping process, a one-pixel mapping error is
insignificant.
Experience again: knowing which flaws (like small texture shifts) can
reasonably be ignored, and which (like those that produce gaps between
polygons) must be avoided at all costs.


Texture Mapping: Orientation Independence


Last month's double-DDA texture-mapping code worked adequately, but there were
two things about it that left me less than satisfied. One flaw was
performance; that's addressed shortly. The other flaw was the way textures
shifted noticeably as the orientations of the polygons they were mapped onto
changed.
Last month's code followed the standard polygon inside/outside rule for
determining which pixels in the source texture map were to be mapped: Pixels
that mapped exactly to the left and top destination edges were considered to
be inside, and pixels that mapped exactly to the right and bottom destination
edges were considered to be outside. That's fine for filling polygons, but
when copying texture maps, it causes different edges of the texture map to be
omitted, depending on the destination orientation, because different edges of
the texture map correspond to the right and bottom destination edges,
depending on the current rotation. Also, last month's code truncated to get
integer source coordinates. This, together with the orientation problem, meant
that when a texture turned upside down, it gained one extra row and one extra
column of pixels from the next row and column of the texture map. This
asymmetry was quite visible, and not at all the desired effect.
Listing One (page 164) is one solution to these problems. This code, which
replaces the equivalently named function from last month, makes no attempt to
follow the standard polygon inside/outside rules when mapping the source.
Instead, it advances a half-step into the texture map before drawing the first
pixel, so pixels along all edges are half included. Rounding rather than
truncation to texture-map coordinates is also performed. The result is that
the texture map stays pretty much centered within the destination polygon as
the destination rotates, with a much-reduced level of orientation-dependent
asymmetry.


Mapping Textures Across Multiple Polygons


One of the truly nifty things about double-DDA texture mapping is that it is
not limited to mapping a texture onto a single polygon. A single texture can
be mapped across any number of adjacent polygons simply by having polygons
that share vertices in 3-space also share vertices in the texture map. In
fact, the demonstration program DEMO1 in the X-Sharp archive maps a single
texture across two polygons; this is the blue-on-green pattern that stretches
across two panels of the spinning ball. This capability makes it easy to
produce polygon-based objects with complex surfaces (such as banding and
insignia on a spaceship). Just map the desired texture onto the underlying
polygonal framework of an object, and let double-DDA texture mapping do the
rest.


Fast Texture Mapping


Of course, there's a problem with mapping a texture across many polygons:
Texture mapping is slow. If you run DEMO1 and move the ball up close to the
screen, you'll see that the ball slows considerably whenever a texture swings
around into view. To some extent that can't be helped, because each pixel of a
texture-mapped polygon has to be calculated and drawn independently.
Nonetheless, we can certainly improve the performance of texture mapping a
good deal over last month.
By and large, there are two keys to improving PC graphics performance. The
first -- no surprise -- is assembly language. The second, without which
assembly language is far less effective, is understanding exactly where the
cycles go in inner loops; in our case, that means understanding where the
bottlenecks are in Listing One.
Listing Two (page 164) is a high-performance assembly language implementation
of Listing One. Apart from the conversion to assembly language, this
implementation improves performance by focusing on reducing inner loop
bottlenecks. In fact, the whole of Listing Two is nothing more than the inner
loop for texture-mapped polygon drawing; Listing Two is only the code to draw
a single scan line. Most of the work in drawing a texture-mapped polygon comes
in scanning out individual lines, though, so this is the appropriate place to
optimize.
Within Listing Two, all the important optimization is in the loop that draws
across each destination scan line, near the end of the listing. One
optimization is elimination of the call to the set-pixel routine used to draw
each pixel in Listing One. Function calls are expensive operations, to be
avoided when performance matters. Also, although mode X (the undocumented
32Ox240 256-color VGA mode X-Sharp runs in) doesn't lend itself well to
pixel-oriented operations like line drawing or texture mapping, the inner loop
has been set up to minimize mode X's overhead. A rotating plane mask is
maintained in AL, with DX pointing to the Map Mask register; thus, only a
rotate and an OUT are required to select the plane to which to write, cycling
from plane 0 through plane 3 and wrapping back to 0. Better yet, because we
know that we're simply stepping horizontally across the destination scan line,
we can use a clever optimization to both step the destination and reduce the
overhead of maintaining the mask. Two copies of the current plane mask are
maintained, one in each nibble of AL. (The Map Mask register pays attention
only to the lower nibble.) Then, when one copy rotates out of the lower
nibble, the other copy rotates into the lower nibble and is ready to be used.
This approach eliminates the need to test for the mask wrapping from plane 3
to plane 0, all the more so because a carry is generated when wrapping occurs,
and that carry can be added to DI to advance the screen pointer.

In all, the overhead of drawing each pixel is reduced from a call to the
set-pixel routine and full calculation of the screen address and plane mask to
five instructions and no branches. This is an excellent example of converting
full, from-scratch calculations to incremental processing, whereby only
information that has changed since the last operation (the plane mask moving
one pixel, for example) is recalculated.
Incremental processing and knowing where the cycles go are both important in
the final optimization in Listing Two, speeding up the retrieval of pixels
from the texture map. This operation looks very efficient in Listing One,
consisting of only two adds and the macro GET_ IMAGE_PIXEL. However, those
adds are fixed-point adds, so they take four instructions apiece, and the
macro hides not only conversion from fixed-point to integer, but also a
time-consuming multiplication. Incremental approaches are excellent at
avoiding multiplication, because cumulative additions can often replace
multiplication. That's the case with stepping through the source texture in
Listing Two; ten instructions, with a maximum of two branches, replace all the
texture calculations of Listing One. Listing Two simply detects when the
fractional part of the source x or y coordinate turns over and advances the
source texture pointer accordingly.
As you might expect, all this optimization is pretty hard to implement, and
makes Listing Two much more complicated than Listing One. Is it worth the
trouble? Indeed it is. Listing Two is more than twice as fast as L1JListing
One, and the difference is very noticeable when large, texture-mapped areas
are animated. Whether more than doubling performance is significant is a
matter of opinion, I suppose, but imagine that you're in William Gibson's
Neuromancer, trying to crack a corporate database. Which texture-mapping
routine would you rather have interfacing you to Cyberspace?


Where to Get X-Sharp


The full source for X-Sharp is available in the file XSHRPn.ZIP in the DDJ
Forum on CompuServe, and as XSHARPn.ZIP in the programming/graphics conference
on M&T Online and the graphic.disp conference on Bix. (XSHARP21 is the first
version that includes fast, assembly language texture mapping.) Alternatively,
you can send me a 360K or 720K formatted diskette and an addressed, stamped
diskette mailer, care of DDJ, 411 Borel Ave., San Mateo, CA 94402; and I'll
send you the latest copy of X-Sharp. There's no charge, but it'd be very much
appreciated if you'd slip in a dollar or so to help out the folks at the
Vermont Association for the Blind and Visually Impaired.
I'm available on a daily basis to discuss X-Sharp on M&T Online and Bix; my
user name is mabrash in both cases. There is no truth to the rumor that I can
be reached under the alias "sheep-shearer," at least not for another 9999
sheep.
_GRAPHICS PROGRAMMING COLUMN_
by Michael Abrash


[LISTING ONE]

/* Texture-map-draw the scan line between two edges. Uses approach of
 pre-stepping 1/2 pixel into the source image and rounding to the nearest
 source pixel at each step, so that texture maps will appear
 reasonably similar at all angles. */
void ScanOutLine(EdgeScan * LeftEdge, EdgeScan * RightEdge)
{
 Fixedpoint SourceX;
 Fixedpoint SourceY;
 int DestX = LeftEdge->DestX;
 int DestXMax = RightEdge->DestX;
 Fixedpoint DestWidth;
 Fixedpoint SourceStepX, SourceStepY;

 /* Nothing to do if fully X clipped */
 if ((DestXMax <= ClipMinX) (DestX >= ClipMaxX)) {
 return;
 }

 if ((DestXMax - DestX) <= 0) {
 return; /* nothing to draw */
 }
 SourceX = LeftEdge->SourceX;
 SourceY = LeftEdge->SourceY;

 /* Width of destination scan line, for scaling. Note: because this is an
 integer-based scaling, it can have a total error of as much as nearly
 one pixel. For more precise scaling, also maintain a fixed-point DestX
 in each edge, and use it for scaling. If this is done, it will also
 be necessary to nudge the source start coordinates to the right by an
 amount corresponding to the distance from the the real (fixed-point)
 DestX and the first pixel (at an integer X) to be drawn) */
 DestWidth = INT_TO_FIXED(DestXMax - DestX);

 /* Calculate source steps that correspond to each dest X step (across
 the scan line) */
 SourceStepX = FixedDiv(RightEdge->SourceX - SourceX, DestWidth);
 SourceStepY = FixedDiv(RightEdge->SourceY - SourceY, DestWidth);

 /* Advance 1/2 step in the stepping direction, to space scanned pixels
 evenly between the left and right edges. (There's a slight inaccuracy
 in dividing negative numbers by 2 by shifting rather than dividing,
 but the inaccuracy is in the least significant bit, and we'll just
 live with it.) */

 SourceX += SourceStepX >> 1;
 SourceY += SourceStepY >> 1;

 /* Clip right edge if necssary */
 if (DestXMax > ClipMaxX)
 DestXMax = ClipMaxX;

 /* Clip left edge if necssary */
 if (DestX < ClipMinX) {
 SourceX += FixedMul(SourceStepX, INT_TO_FIXED(ClipMinX - DestX));
 SourceY += FixedMul(SourceStepY, INT_TO_FIXED(ClipMinX - DestX));
 DestX = ClipMinX;
 }
 /* Scan across the destination scan line, updating the source image
 position accordingly */
 for (; DestX<DestXMax; DestX++) {
 /* Get the currently mapped pixel out of the image and draw it to
 the screen */
 WritePixelX(DestX, DestY,
 GET_IMAGE_PIXEL(TexMapBits, TexMapWidth,
 ROUND_FIXED_TO_INT(SourceX), ROUND_FIXED_TO_INT(SourceY)) );
 /* Point to the next source pixel */
 SourceX += SourceStepX;
 SourceY += SourceStepY;
 }
}






[LISTING TWO]

; Draws all pixels in the specified scan line, with the pixel colors
; taken from the specified texture map. Uses approach of pre-stepping
; 1/2 pixel into the source image and rounding to the nearest source
; pixel at each step, so that texture maps will appear reasonably similar
; at all angles. This routine is specific to 320-pixel-wide planar
; (non-Chain4) 256-color modes, such as mode X, which is a planar
; (non-chain4) 256-color mode with a resolution of 320x240.
; C near-callable as:
; void ScanOutLine(EdgeScan * LeftEdge, EdgeScan * RightEdge);
; Tested with TASM 3.0.

SC_INDEX equ 03c4h ;Sequence Controller Index
MAP_MASK equ 02h ;index in SC of Map Mask register
SCREEN_SEG equ 0a000h ;segment of display memory in mode X
SCREEN_WIDTH equ 80 ;width of screen in bytes from one scan line
 ; to the next

 .model small
 .data
 extrn _TexMapBits:word, _TexMapWidth:word, _DestY:word
 extrn _CurrentPageBase:word, _ClipMinX:word
 extrn _ClipMinY:word, _ClipMaxX:word, _ClipMaxY:word

; Describes the current location and stepping, in both the source and
; the destination, of an edge. Mirrors structure in DRAWTEXP.C.

EdgeScan struc
Direction dw ? ;through edge list; 1 for a right edge (forward
 ; through vertex list), -1 for a left edge (backward
 ; through vertex list)
RemainingScans dw ? ;height left to scan out in dest
CurrentEnd dw ? ;vertex # of end of current edge
SourceX dd ? ;X location in source for this edge
SourceY dd ? ;Y location in source for this edge
SourceStepX dd ? ;X step in source for Y step in dest of 1
SourceStepY dd ? ;Y step in source for Y step in dest of 1
 ;variables used for all-integer Bresenham's-type
 ; X stepping through the dest, needed for precise
 ; pixel placement to avoid gaps
DestX dw ? ;current X location in dest for this edge
DestXIntStep dw ? ;whole part of dest X step per scan-line Y step
DestXDirection dw ? ;-1 or 1 to indicate which way X steps (left/right)
DestXErrTerm dw ? ;current error term for dest X stepping
DestXAdjUp dw ? ;amount to add to error term per scan line move
DestXAdjDown dw ? ;amount to subtract from error term when the
 ; error term turns over
EdgeScan ends

Parms struc
 dw 2 dup(?) ;return address & pushed BP
LeftEdge dw ? ;pointer to EdgeScan structure for left edge
RightEdge dw ? ;pointer to EdgeScan structure for right edge
Parms ends

;Offsets from BP in stack frame of local variables.
lSourceX equ -4 ;current X coordinate in source image
lSourceY equ -8 ;current Y coordinate in source image
lSourceStepX equ -12 ;X step in source image for X dest step of 1
lSourceStepY equ -16 ;Y step in source image for X dest step of 1
lXAdvanceByOne equ -18 ;used to step source pointer 1 pixel
 ; incrementally in X
lXBaseAdvance equ -20 ;use to step source pointer minimum number of
 ; pixels incrementally in X
lYAdvanceByOne equ -22 ;used to step source pointer 1 pixel
 ; incrementally in Y
lYBaseAdvance equ -24 ;use to step source pointer minimum number of
 ; pixels incrementally in Y
LOCAL_SIZE equ 24 ;total size of local variables
 .code
 extrn _FixedMul:near, _FixedDiv:near
 align 2
ToScanDone:
 jmp ScanDone
 public _ScanOutLine
 align 2
_ScanOutLine proc near
 push bp ;preserve caller's stack frame
 mov bp,sp ;point to our stack frame
 sub sp,LOCAL_SIZE ;allocate space for local variables
 push si ;preserve caller's register variables
 push di
; Nothing to do if destination is fully X clipped.
 mov di,[bp].RightEdge
 mov si,[di].DestX
 cmp si,[_ClipMinX]

 jle ToScanDone ;right edge is to left of clip rect, so done
 mov bx,[bp].LeftEdge
 mov dx,[bx].DestX
 cmp dx,[_ClipMaxX]
 jge ToScanDone ;left edge is to right of clip rect, so done
 sub si,dx ;destination fill width
 jle ToScanDone ;null or negative full width, so done

 mov ax,word ptr [bx].SourceX ;initial source X coordinate
 mov word ptr [bp].lSourceX,ax
 mov ax,word ptr [bx].SourceX+2
 mov word ptr [bp].lSourceX+2,ax

 mov ax,word ptr [bx].SourceY ;initial source Y coordinate
 mov word ptr [bp].lSourceY,ax
 mov ax,word ptr [bx].SourceY+2
 mov word ptr [bp].lSourceY+2,ax
; Calculate source steps that correspond to each 1-pixel destination X step
; (across the destination scan line).
 push si ;push dest X width, in fixedpoint form
 sub ax,ax
 push ax ;push 0 as fractional part of dest X width
 mov ax,word ptr [di].SourceX
 sub ax,word ptr [bp].lSourceX ;low word of source X width
 mov dx,word ptr [di].SourceX+2
 sbb dx,word ptr [bp].lSourceX+2 ;high word of source X width
 push dx ;push source X width, in fixedpoint form
 push ax
 call _FixedDiv ;scale source X width to dest X width
 add sp,8 ;clear parameters from stack
 mov word ptr [bp].lSourceStepX,ax ;remember source X step for
 mov word ptr [bp].lSourceStepX+2,dx ; 1-pixel destination X step
 mov cx,1 ;assume source X advances non-negative
 and dx,dx ;which way does source X advance?
 jns SourceXNonNeg ;non-negative
 neg cx ;negative
 cmp ax,0 ;is the whole step exactly an integer?
 jz SourceXNonNeg ;yes
 inc dx ;no, truncate to integer in the direction of
 ; 0, because otherwise we'll end up with a
 ; whole step of 1-too-large magnitude
SourceXNonNeg:
 mov [bp].lXAdvanceByOne,cx ;amount to add to source pointer to
 ; move by one in X
 mov [bp].lXBaseAdvance,dx ;minimum amount to add to source
 ; pointer to advance in X each time
 ; the dest advances one in X
 push si ;push dest Y height, in fixedpoint form
 sub ax,ax
 push ax ;push 0 as fractional part of dest Y height
 mov ax,word ptr [di].SourceY
 sub ax,word ptr [bp].lSourceY ;low word of source Y height
 mov dx,word ptr [di].SourceY+2
 sbb dx,word ptr [bp].lSourceY+2 ;high word of source Y height
 push dx ;push source Y height, in fixedpoint form
 push ax
 call _FixedDiv ;scale source Y height to dest X width
 add sp,8 ;clear parameters from stack
 mov word ptr [bp].lSourceStepY,ax ;remember source Y step for

 mov word ptr [bp].lSourceStepY+2,dx ; 1-pixel destination X step
 mov cx,[_TexMapWidth] ;assume source Y advances non-negative
 and dx,dx ;which way does source Y advance?
 jns SourceYNonNeg ;non-negative
 neg cx ;negative
 cmp ax,0 ;is the whole step exactly an integer?
 jz SourceYNonNeg ;yes
 inc dx ;no, truncate to integer in the direction of
 ; 0, because otherwise we'll end up with a
 ; whole step of 1-too-large magnitude
SourceYNonNeg:
 mov [bp].lYAdvanceByOne,cx ;amount to add to source pointer to
 ; move by one in Y
 mov ax,[_TexMapWidth] ;minimum distance skipped in source
 imul dx ; image bitmap when Y steps (ignoring
 mov [bp].lYBaseAdvance,ax ; carry from the fractional part)
; Advance 1/2 step in the stepping direction, to space scanned pixels evenly
; between the left and right edges. (There's a slight inaccuracy in dividing
; negative numbers by 2 by shifting rather than dividing, but the inaccuracy
; is in the least significant bit, and we'll just live with it.)
 mov ax,word ptr [bp].lSourceStepX
 mov dx,word ptr [bp].lSourceStepX+2
 sar dx,1
 rcr ax,1
 add word ptr [bp].lSourceX,ax
 adc word ptr [bp].lSourceX+2,dx

 mov ax,word ptr [bp].lSourceStepY
 mov dx,word ptr [bp].lSourceStepY+2
 sar dx,1
 rcr ax,1
 add word ptr [bp].lSourceY,ax
 adc word ptr [bp].lSourceY+2,dx
; Clip right edge if necessary.
 mov si,[di].DestX
 cmp si,[_ClipMaxX]
 jl RightEdgeClipped
 mov si,[_ClipMaxX]
RightEdgeClipped:
; Clip left edge if necssary
 mov bx,[bp].LeftEdge
 mov di,[bx].DestX
 cmp di,[_ClipMinX]
 jge LeftEdgeClipped
; Left clipping is necessary; advance the source accordingly
 neg di
 add di,[_ClipMinX] ;ClipMinX - DestX
 ;first, advance the source in X
 push di ;push ClipMinX - DestX, in fixedpoint form
 sub ax,ax
 push ax ;push 0 as fractional part of ClipMinX-DestX
 push word ptr [bp].lSourceStepX+2
 push word ptr [bp].lSourceStepX
 call _FixedMul ;total source X stepping in clipped area
 add sp,8 ;clear parameters from stack
 add word ptr [bp].lSourceX,ax ;step the source X past clipping
 adc word ptr [bp].lSourceX+2,dx
 ;now advance the source in Y
 push di ;push ClipMinX - DestX, in fixedpoint form

 sub ax,ax
 push ax ;push 0 as fractional part of ClipMinX-DestX
 push word ptr [bp].lSourceStepY+2
 push word ptr [bp].lSourceStepY
 call _FixedMul ;total source Y stepping in clipped area
 add sp,8 ;clear parameters from stack
 add word ptr [bp].lSourceY,ax ;step the source Y past clipping
 adc word ptr [bp].lSourceY+2,dx
 mov di,[_ClipMinX] ;start X coordinate in dest after clipping
LeftEdgeClipped:
; Calculate actual clipped destination drawing width.
 sub si,di
; Scan across the destination scan line, updating the source image position
; accordingly.
; Point to the initial source image pixel, adding 0.5 to both X and Y so that
; we can truncate to integers from now on but effectively get rounding.
 add word ptr [bp].lSourceY,8000h ;add 0.5
 mov ax,word ptr [bp].lSourceY+2
 adc ax,0
 mul [_TexMapWidth] ;initial scan line in source image
 add word ptr [bp].lSourceX,8000h ;add 0.5
 mov bx,word ptr [bp].lSourceX+2 ;offset into source scan line
 adc bx,ax ;initial source offset in source image
 add bx,[_TexMapBits] ;DS:BX points to the initial image pixel
; Point to initial destination pixel.
 mov ax,SCREEN_SEG
 mov es,ax
 mov ax,SCREEN_WIDTH
 mul [_DestY] ;offset of initial dest scan line
 mov cx,di ;initial destination X
 shr di,1
 shr di,1 ;X/4 = offset of pixel in scan line
 add di,ax ;offset of pixel in page
 add di,[_CurrentPageBase] ;offset of pixel in display memory
 ;ES:DI now points to the first destination pixel

 and cl,011b ;CL = pixel's plane
 mov al,MAP_MASK
 mov dx,SC_INDEX
 out dx,al ;point the SC Index register to the Map Mask
 mov al,11h ;one plane bit in each nibble, so we'll get carry
 ; automatically when going from plane 3 to plane 0
 shl al,cl ;set the bit for the first pixel's plane to 1
; If source X step is negative, change over to working with non-negative
; values.
 cmp word ptr [bp].lXAdvanceByOne,0
 jge SXStepSet
 neg word ptr [bp].lSourceStepX
 not word ptr [bp].lSourceX
SXStepSet:
; If source Y step is negative, change over to working with non-negative
; values.
 cmp word ptr [bp].lYAdvanceByOne,0
 jge SYStepSet
 neg word ptr [bp].lSourceStepY
 not word ptr [bp].lSourceY
SYStepSet:
; At this point:
; AL = initial pixel's plane mask

; BX = pointer to initial image pixel
; SI = # of pixels to fill
; DI = pointer to initial destination pixel
 mov dx,SC_INDEX+1 ;point to SC Data; Index points to Map Mask
TexScanLoop:
; Set the Map Mask for this pixel's plane, then draw the pixel.
 out dx,al
 mov ah,[bx] ;get image pixel
 mov es:[di],ah ;set image pixel
; Point to the next source pixel.
 add bx,[bp].lXBaseAdvance ;advance the minimum # of pixels in X
 mov cx,word ptr [bp].lSourceStepX
 add word ptr [bp].lSourceX,cx ;step the source X fractional part
 jnc NoExtraXAdvance ;didn't turn over; no extra advance
 add bx,[bp].lXAdvanceByOne ;did turn over; advance X one extra
NoExtraXAdvance:
 add bx,[bp].lYBaseAdvance ;advance the minimum # of pixels in Y
 mov cx,word ptr [bp].lSourceStepY
 add word ptr [bp].lSourceY,cx ;step the source Y fractional part
 jnc NoExtraYAdvance ;didn't turn over; no extra advance
 add bx,[bp].lYAdvanceByOne ;did turn over; advance Y one extra
NoExtraYAdvance:
; Point to the next destination pixel, by cycling to the next plane, and
; advancing to the next address if the plane wraps from 3 to 0.
 rol al,1
 adc di,0
; Continue if there are any more dest pixels to draw.
 dec si
 jnz TexScanLoop
ScanDone:
 pop di ;restore caller's register variables
 pop si
 mov sp,bp ;deallocate local variables
 pop bp ;restore caller's stack frame
 ret
_ScanOutLine endp
 end

























October, 1992
PROGRAMMER'S BOOKSHELF


Slaying the Dragon




Andrew Schulman


Is there any programmer who hasn't read the "Dragon Book," the classic text on
compiler design by Aho, Sethi, and Ullman? Or tried to read it? Or at least
bought it, and placed it in a prominent position on their bookshelf?
Certainly, you've at least seen the book: the cover (by Jean Depoian) shows a
dragon, representing "Complexity of Compiler Design," about to be slain by a
knight with "Data Flow Analysis" armor, a "Syntax Directed Translation"
shield, and a "LALR Parser Generator" sword.
You haven't tried to read the Dragon Book? Well, you should. Like every book
out of AT&T Bell Labs, it is beautifully written and organized. It is one of
those few books that can make you pleased to be involved in software, because
while reading the book you come to realize that our field--the messiness of
daily practice aside--really does have a solid mathematical foundation.
But in my more honest moments, I have to admit that, try as I might, I've
never really understood much in the Dragon Book past about page 200 (at least
that's where the underlining stops in my copy). The problem, for me at least,
is that it's one thing to read about a subject such as the equivalence of
languages and machines (a strikingly beautiful discovery) and another thing to
really see this equivalence.
For me, and I suspect for a lot of programmers, the only way to illustrate a
topic like this is with some code. You say that regular expressions and
finite-state machines are equivalent? I can appreciate that this is an
important result, but to understand it I need to see some C code that converts
a regular expression into a two-dimensional array that forms the program for
one of these machines.
This is where Allen Holub's book, Compiler Design in C, comes in. If you have
ever wanted to understand how your favorite compiler works, or if you have
ever needed to write some form of language processor (perhaps as simple as a
text-pattern searcher, or the macro or script language for a product), you
will want Holub's book. It is approachable by programmers in a way that the
Dragon Book just isn't.
Several good, readily understandable books on compiler design have been
available for years. Hendrix's A Small C Compiler(M&T Publishing, 1990), which
comes with complete C source code for an 8088-based C compiler, is the best
example. You walk away from Hendrix's book seeing exactly how his Small C
compiler is put together and feeling that you could "do it yourself."
But real compilers, such as Borland C++ or Microsoft C, aren't built using the
readily understandable, recursive-descent technique that Hendrix puts to such
good use. These compilers are built, in an initially very nonintuitive way,
using compiler-compiler tools such as LEX and YACC. LEX takes a set of regular
expressions and associated C code, and turns them into the function yylex(),
which tokenizes input (such as your .C file). The C code is essentially "event
driven:" It is invoked whenever its associated pattern is recognized in the
input. YACC takes a grammar and associated C code, and turns them into the
function yyparse(), which parses the tokens generated for example by yylex().
The main() routine for a C compiler might consist of little more than a call
to yyparse().
What Holub does in this massive book is amazing. He presents the complete C
source code for a LEX clone, two different YACC clones (one of which, Llama,
builds a top-down LL(1) parser, while the other, occs--the "other
compiler-compiler system"--builds a bottom-up, LALR(1) parser like that built
by (YACC), and a C compiler. The C compiler's source code consists largely,
not of .C files, but of a LEX file (c.lex) and a YACC file (c.y). The .C files
do symbol-table management, manipulation of lvalues and rvalues, code
generation, arithmetic operations, and the like.
DDJ readers may know Holub as this magazine's former C columnist. He brings to
Compiler Design in C the same attention to detail found in his earlier The C
Companion and On Command (a detailed presentation of the C source code for a
command interpreter). Above all, Holub excels at showing how things work. This
has always been my favorite kind of reading, escape reading almost. Such "how
it works" writing is quite different from "how to" writing. The pleasure is
not so much a "Hey, I could do that!" feeling (I know I couldn't), but rather
that of seeing a little bit of how the software I use every day actually does
its stuff. What's Borland C++ doing when it crunches through my .C files?
Having worked through Holub's book, I have a much better idea.
The source code itself is presented in an almost ideal way, with each .C file
broken up by just the right amount of text. The order in which code is
presented is crucial in a book like this, and Holub has made good use of
Arachne, a C preprocessor he wrote. Arachne is a version of Knuth's WEB system
(which I discussed in the August 1992 DDJ), allowing source code and
documentation to be put together in a single input file. Arachne is itself a
compiler and, as Holub notes, it stands as "an example of how you can apply
the techniques presented in this book to applications other than writing
compilers for standard programming languages." Compiler design is a very
general-purpose programming skill.
The chief advantage of Holub's book over the other books on this subject is
clearly its skillful presentation of a large amount of C code. But Holub does
not just fling gobs of source code at you, as so many programming books do.
Many authors discuss the implementation of their programs in loving detail,
but forget to describe what the program does, or what output it produces.
Revealing the workings of a machine, without disclosing what the machine does,
is a classic symptom of engineer's disease. Holub's book doesn't suffer from
this at all. He walks the reader through each stage of the compiler process,
always remembering to point out what the final goal is, what the program's
output will look like, and so on. The C code generated by LEX, Llama, and occs
is nicely commented and essentially self-documenting. He constantly shows the
input to the program, its output, and exactly how the program turned the
former into the latter.
Often, Holub shows the same thing from multiple angles. For example, he
presents a regular expression, a hand-written recognizer for that expression,
a diagram of the equivalent finite-state machine (FSM) to recognize that
expression, a two-dimensional state table representing the machine, and
finally the C version of the machine, both compressed and uncompressed, with
the driver function that uses the tables. Holub walks carefully through the
whole process, showing how a LEX tokenizer can take text such as " 1 + 2 * 3"
and turn it into a stream of tokens (input for a Yacc grammar) such as NUM
PLUS NUM STAR NUM.
One of the points that becomes clear as you read this book is the tremendous
generality of the finite-state machine as a way of programming. In essence, an
FSM consists of a two-dimensional table "next," with the rows holding states
and the columns holding input. A generic "driver" steps through the table; see
Example 1.
Example 1: Sample FSM.

 STATE next[NUM_STATES][NUM_INPUTS];
 while (state = next(state, input))
 if (state == ACCEPT)
 brake;

 else

 do_action(state);

The goal of programs like LEX and YACC is to produce the table "next." The
driver, which is the while (state = next(state, input)) loop, is generic and
changes little between programs. In other words, the "next" table really is a
program for a virtual machine.
Since LEX and YACC are themselves just compilers (they compile regular
expressions or grammars with associated C code into state tables), it's
interesting to see how they themselves are written. Holub builds LEX by hand,
using recursive descent. Once you have LEX, you can write LEX.LEX, which would
be a table-driven lexical analyzer. Holub leaves this as one of many excellent
exercises in the book. (I'm waiting for The Compiler Design in C Answer Book!)
With the Llama parser, Holub takes a classic bootstrap approach: First, he
uses a LEX file (with plenty of C actions) to implement a scaled-down version
of Llama; then he feeds the scaled-down version of Llama with a Llama input
file that creates a Llama parser, "like a snake eating its tail."
Even in a book close to a thousand pages long, many topics can't be covered in
such complete detail. Many efficiency considerations are avoided by the code,
though the text contains good pointers to the relevant literature. The chapter
on optimization is a good introduction to this subject, though the C compiler
itself is nonoptimizing. Many programmers are interested in compiler
optimizations (this, after all, is what compiler benchmarks measure, for want
of anything better), but popular topics such as peephole optimization and
common-subexpression elimination probably can't be fully understood without
first digesting the material in Holub's book.
The code generated by the C compiler is, oddly enough, not 80x86 or 680x0
assembler, much less binary object code, but something called C-Code. This is
an assembly language-like form of C. (For example, there are no if ,while, or
for statements; everything is reduced to its ultimate goto form.) While it
would perhaps have been more satisfying to have the compiler generate actual
assembler, C-Code is just as good for this book's purpose.
A disk is not included with the book, but is available separately from the
author for $60. I had the usual problems of getting the subdirectories and
make files straight, but these seem unavoidable when working with large
quantities of someone else's source code, even source code as well commented
and well organized as this. In any case, ready-to-run DOS.EXEs are provided.
The highlight of this package is a fascinating full-screen "visible" C
compiler that shows the compiler's operations. I do wish some screen shots of
this program had been included with the book, perhaps instead of the over 50
pages on curses. The full-screen C compiler--actually it's occs whose
operations can be made full-screen, and the C compiler just inherits this like
any other occs program--is built using curses, for which Holub provides a
really fairly irrelevant (in this context) description.
Even if you have the commercially supported MKS LEX/YACC, or Gnu FLEX/BISON,
you will want to get Holub's book to see how these programs--or at least very
similar programs--actually work. The Gnu software comes with source code, but
I wouldn't want to tackle it without first having read Holub's book.
I had a strange experience after finishing this book. In preparation for this
review, I took my copy of the Dragon Book down from the shelf yet again, just
to see if it really is such hard going. Oddly enough, I found that I actually
understand a lot of it past page 200. Clearly this is because I've worked
through Holub's book. If you, too, have been frustrated trying to read the
Dragon Book, first read Compiler Design in C (whose cover, incidentally, shows
four mice operating some sort of cheese-grater/mailbox contraption). Then try
to slay the dragon.

















October, 1992
OF INTEREST





VMData from Pocket Soft is a suite of memory-management libraries for C and
C++ that provides a platform-independent, virtual-memory method of managing
dynamically allocated data. This eliminates DOS memory restrictions and the
design of complex data-management routines for Windows. The library allows you
to port applications between operating systems, including OS/2, without
in-depth knowledge of individual OS memory environments.
Developers make calls to VMData functions instead of the operating-system
interface. The program is then linked to the appropriate VMData library for
the intended operating system. VMData manages the operating-system interface
and manipulates the data between all memory resources available to that
platform. Data is managed according to priority, which is user specified or
determined by VMData routines that monitor recency and frequency.
High-priority data is kept first in addressable memory, then fast-access
memory. When these have been exhausted, it swaps to and from disk. All
available memory resources for DOS, Windows, and OS/2 are accessed.
VMData costs $495.00 for the first platform and $295.00 for each additional
platform. Reader service no. 20.
Pocket Soft Inc. P.O. Box 821049 Houston, TX 77282 713-460-5600
ParcPlace Systems is shipping Objectbuilder\C++, a user-interface builder, and
Objectkit\C++ OI, a C++ graphics class library. These user-interface
development tools for Sun SPARCstations work with Objectworks\C++, ParcPlace's
UNIX C++ development environment.
The toolset allows you to rapidly create GUIs that allow runtime selection
between industry-standard OSF/Motif and OPEN LOOK. Thus, a single version of
an application can support either environment's standard user interface.
Objectbuilder\C++ generates code for GUI development automatically, but allows
you to access classes in order to control their user interfaces. Objectbuilder
features include: class hierarchy, inheritance, subclassing, accelerators,
localization, and access to X resources. Drag-and-drop techniques let you
select from palettes of objects, such as scrollbars, check boxes, and button
menus, facilitating rapid prototyping, design, implementation, and testing of
GUIs.
Also available from ParcPlace is Objectworks\Smalltalk 4.1 (supporting Windows
3.1 and Macintosh System 7) and Objectkit\Smalltalk C Programming, a Smalltalk
extension that simplifies access to C libraries and applications.
Objectkit\Smalltalk's new object-oriented memory manager allows specification
of objects within applications that are not subject to change, allowing its
memory manager to focus on changing objects.
Objectbuilder\C++ costs $2995.00; Objectkit\C++, $995.00; Objectworks\C++,
$1995.00; Objectworks\Smalltalk, $3500.00; Objectkit\Smalltalk C Programming,
$500.00. Reader service no. 21.
ParcPlace Systems 1550 Plymouth Street Mountain View, CA 94043 415-691-6700
GAWindows from Applied Neurogenetic Computing is a genetic algorithm DLL for
Windows. It facilitates development of genetic algorithm-based applications
for performing searches and optimization in a wide variety of areas such as
scheduling, design, routing, neural-network architectures, and more. Thus,
instead of attempting to directly solve certain problems, you can use
GAWindows to create and evolve solutions.
Thirteen functions are included for population management, pairing of mates,
mating, and mutations. The functions may be called from any language that
supports Windows DLL access.
The introductory price for GAWindows is $49.95. Reader service no. 22.
Applied Neurogenetic Computing P.O. Box 1711 Maple Grove, MN 55311
612-750-9805
TSRs and More, a library of TSR management functions for C++, has been
announced by TurboPower Software. The TSR manager enables TSRs to unload
themselves, use configurable hot keys, prevent being loaded twice, and more.
Only a 6K kernel remains in DOS memory, and functions are included for
swapping conventional TSRs and XMS and EMS 4.9 memory management. TSRs and
More supports 8087, huge arrays, DOS and BIOS access, and enhanced keyboard
support.
The price of TSRs and More, including source code, is $149.00. Reader service
no 23. TurboPower Software P.O. Box 49009 Colorado Springs, CO 80949-9009
719-260-6641
PowerBuilder 2.0, a client/server development environment for building
business applications, has been released by Powersoft. New to this version is
its object-oriented architecture, with features such as inheritance,
encapsulation, and user-defined objects. Other features include: an enriched
set of database portability and management functions; the ability to support
large-scale MIS projects, including report generation and object libraries
with check-in/check-out procedures; and a complete implementation of Windows
objects, events, functions, and communications, including OLE, MDI, DDE, and
DLL calls.
PowerBuilder runs under Windows and is server independent. It costs $1495.00
for SQLBase and XDB; $3395.00 for SQL Server, Oracle, and AllBase/SQL;
$3895.00 for DB2. Reader service no. 24.
Powersoft Corp. 70 Blanchard Road Burlington, MA 01803 617-229-2200
Geodyssey has released Hipparchus, a C and C++ object-module library for
developing applications that deal with geographic location. Data objects are
modeled as unrestricted sets of points, lines, or regions, and objects can be
related to one another using spatial operators.
Hipparchus works with direction cosines and three-demensional vector algebra
to calculate true distances and locations on an ellipsoidal model of the
Earth. It can deal with objects below the surface, in the atmosphere, or in
near space. A third-order orbit modeler lets you accurately predict the path
and coverage of remote-sensing satellites.
Hipparchus' spatial indexing, based on a spherical interpretation of the
Voronoi polygon principle, gives you speed of access but permits localized
reorganization if the geographic distribution of your data changes. You can
modify the spatial indexes and use them to access distributed multitheme
geographic databases.
Hipparchus sells for $475.00. Reader service no. 25.
Geodyssey Limited 300, 815 Eighth Avenue SW Calgary, AB Canada T2P 3P2
403-234-9848
tvPAK is a new C++ class library from Faison Computing that extends the
functionality of Turbo Vision. Over 25 classes provide support for validated
data-entry fields, currency fields, picture fields with mask characters, line
and box drawing, time and date fields, clock and calendar animated objects,
and aligned-text fields. Most of these classes support property inspection, a
feature that allows certain object attributes to be changed at run time.
DDJ spoke with Mike Dickason, a civil engineer with ASCG Inc. in Anchorage,
Alaska, whose reaction was favorable. "This is the only product of its kind
that works directly with Turbo Vision, and I particularly liked using property
inspection with the date and time functions." He also pointed out the
validated input fields as a positive feature.
The retail price is $49.95; source code is included. Reader service no. 26.
Faison Computing P.O. Box 17722 Irvine, CA 92713-7722 714-833-8410
Cobalt Blue has announced FOR_C++, a conversion package that offers automated
code translation from Fortran to C++. FOR-C++ uses C++ objects such as
manipulation of complex variables in algebraic form to handle complex data
types. Advanced C++ constructs like enhanced support for call-by-reference,
declarations after executable code, overloaded functions, and translation of
statement functions as C++ inline functions are included.
FOR_C++ generates complete C++ function prototypes, and function calls are
checked for consistent usage during translation and passed either by address
or by reference in C++. FOR_C++ translates parameters as C++ constants to help
program debugging, and class and structure types are user defined.
FOR_C++ for DOS costs $975.00; for SPARCstations, $1350.00. Runtime library
source code is included. Reader service no. 27.
Cobalt Blue Inc. 875 Old Roswell Road, Suite D-400 Roswell, GA 30076
404-518-1116
FIGt is a 3-D toolkit from Liant that combines object-oriented programming
with PHIGS+ and PEX, the industry standards for creating 3-D graphics apps and
running them on a network. FIGt combines PHIGS+ API's openness and portability
with object-oriented code. Two- and three-dimensional graphics apps can be
written with any standard PHIGS+ API products and run across a variety of
systems.
FIGt has a library of preprogrammed objects that contain information about
generating and manipulating graphics objects. Software is built by combining
and associating objects. FIGt takes advantage of underlying PHIGS and X-Window
system integration, applying consistent X programming methodologies such as
color management and allocation. The library will also use the PHIGS APIs to
drive PEX protocol, enabling development of distributed graphics applications
with a higher-level programming interface than that of PEXlib alone.
The first release supports Sun PHIGS and FIGARO+; subsequent versions will
support PHIGS APIs from Digital, HP, and IBM.
Prices begin at $1245.00. Reader service no. 28.
Liant Software Corp. 959 Concord Street Framingham, MA 01701-4613 508-872-8700


















October, 1992
SWAINE'S FLAMES


Dog Day Afternoon




Michael Swaine


This summer I spend my leisure hours sitting on the deck watching the workers
build the pool and I muse on matters political, philosophical, psychological,
and so forth.
Is my building a pool an implicit endorsement of the current administration in
this election year? Am I, as the Acting President once asked, better off today
than I was four years ago?
But no, this pool has been underway since the Carter administration, if I
remember correctly. I'm not sure that I do. The heat is affecting my
circuitry, slowing my thoughts and curling them up like newsprint in the sun,
or like this fly spiraling lazily over that pool worker like a buzzard over a
desicated corpse.
And the thermometer on the deck railing reads 100 degrees soporifically
Fahrenheit.
These are the days of summer reading, and the mail has brought mine. The
September issue of Scientific American has arrived, a special single-topic
issue on mind and brain. Not so many years ago it was still possible to
question whether mind and brain were a single topic. Now Francis Crick and
Christof Koch can present research findings on the most profound mental
puzzle, the nature of consciousness. Is consciousness a thing or a process?
Can there really be any objective data on consciousness? Is consciousness
observable? Another person's consciousness is not directly observable to me,
although I'm pretty sure that several of those workers down there are only
marginally conscious. But can I observe, or detect, my own consciousness? As
Jonathan Miller says in the same issue, "consciousness is not detected at all,
because that would imply that it could pass undetected, and that doesn't make
sense." Can you do science on something that is inherently undetectable?
The study of the mind is rife with paradox, in which fact I take paradoxical
comfort.
The fly lights on the flyswatter.
Other articles in the issue deal with mental disorders and the developing
brain and visual imagery. One article on sex differences in brain function
ought to be controversial, but my observations this summer tell me that
temperature is a much stronger influence than gender or even species or
natural vs. artificial being, as in Alfred Bester's classic science fiction
story, "Fondly Fahrenheit." If it weren't so hot I'd go get that Bester
collection and include an apt quote here. Geoffrey Hinton sketches an overview
of work in neural networks, but doesn't break any really new ground.
This issue of Scientific American is worth reading, a guide to the future, in
case you can't tell from my writing style, baked dry as it is of all
enthusiasm.
There is a lot of enthusiasm in Beyond Cyberpunk, another medium of summer
reading that the mail has brought. There is a lot of a lot of things in this
HyperCard stack set on five disks. It also is a guide to the future, but a
more twisted future than Scientific American's. Beyond Cyberpunk is, at least
in part, cultural criticism, touching, or rather landing with a grating noise,
on music, politics, sex, literature, technology, comic books, and, you know,
stuff like that.
The premise that its subjects, and its approaches to these subjects, is some
kind of "stuff like that" is what justifies taking Beyond Cyberpunk seriously.
Its authors are arguably dealing with one esthetic or ethic or style or let's
say cultural stance, and the essays, manifestos, reviews, and practical advice
in the stacks all, by that same argument, support or at least feed on, that
stance. Beyond Cyberpunk is a nonlinear, multimedia manifesto of a developing
cyberculture--an essay by Susan Sontag, if Susan Sontag were a character in a
William Gibson novel and her essay were an explication of the politics of a
plumbing manual written by Harry Tuttle.
Beyond Cyberpunk is from The Computer Lab, Rt. 4, Box 54C, Louisa, VA 23098.
As I sit here watching the workers take what I can only assume is a tea break,
I decide that I am working on writing the perfect C++ program, using Robert M.
Pirsig's method: "How to paint a perfect painting--make yourself perfect and
then just paint naturally."
I am perfecting myself.



































November, 1992
November, 1992
EDITORIAL


Much Ado Really About Nothing




Jonathan Erickson


It's not every day that a computer book--let alone a computer programming
book--gets bandied about on the pages of the Wall Street Journal. Undocumented
Windows, coauthored by David Maxey, Matt Pietrek, and DDJ contributing editor
Andrew Schulman, is a rare exception.
So what's the fuss all about? Not the accuracy or fairness of the book (which
Ray Duncan examines on page 179). No one--not even Microsoft--is questioning
the book's veracity. Instead, it's the confirmation that: 1. Microsoft
provides function calls in Windows that aren't acknowledged in the SDK; and 2.
that Microsoft application programs make use of these undocumented calls.
Not that any of this is a big surprise. When it comes to operating systems,
undocumented calls are pretty much the norm. (Michael Swaine recalls a
Software Entrepreneur's Forum a few years back when keynoter Bill Gates was
asked about undocumented DOS calls. Gates flatly stated there were no
undocumented DOS calls, then refused to discuss the subject further. For a
contrary point of view, you might take a peek at Schulman's earlier, 720-page
Undocumented DOS.)
As for the second item, anyone with a disassembler and a copy of Word for
Windows can determine whether or not Microsoft makes use of undocumented
calls. Ditto for software from Borland, Lotus, Wordperfect, and the like. It's
noteworthy that the authors of Undocumented Windows don't make a big deal of
Microsoft's use of these calls, noting that the use of most of these functions
serve little purpose. The authors also stress that the book is not about
whether or not Microsoft apps use undocumented functions (only five pages of
the book's 715 pages touch on this), but about undocumented functions and data
in Windows itself.
What's surprising, though, is Microsoft's swift reaction to the hubbub. At the
blink of an eye, the company let fly with an official corporate statement on
undocumented APIs, a Q&A paper discussing documented and undocumented APIs,
and a white paper entitled "Undocumented Functions." Although the Schulman et
al. book is central to the press release (the backgrounder actually starts
out, "This whitepaper [sic] discusses the "undocumented" functions that author
Andrew Schulman in his book Undocumented Windows claims are called by various
versions ... of Microsoft Windows-based applications...."), Microsoft is
careful to neither dispute nor criticize the book.
Still, as Schulman points out in a response of his own, the Microsoft press
release isn't as complete as it probably should be, either. For example, while
Microsoft used to say that it "very consciously avoids" using undocumented
functions, the company now acknowledges that it uses such calls, but justifies
this by saying that other companies do, too. Nevertheless, Microsoft takes
pains to state that its "use of undocumented calls provides no advantage
whatsoever to an application."
The press statement goes on to say that Microsoft has given at least 26 ISVs
the information on undocumented Windows functions, even though "the use of
undocumented APIs in applications is innocuous and represents old, out of date
[sic] code or functions that can just as easily be performed with the
documented API." Is Microsoft acknowledging that its applications are written
with old, out-of-date code? Or is it saying that it is giving ISVs (its
competitors) useless information? Schulman agrees that some undocumented
functions are innocuous and others should never be called, then counters that
some are indeed extremely useful.
Another big question is why Microsoft only shared information on undocumented
calls with 26 ISVs when the company has sold tens of thousands of Windows SDKs
to programmers over the past couple of years. What's so special about these
developers? Schulman notes that he's aware of many developers who've tried to
no avail to get this material from Microsoft.
Is Microsoft that scared about the Federal Trade Commission investigation into
the company's business practices, especially when it comes to the relationship
between the systems and applications groups and what this means in terms of
antitrust laws? Perhaps so. The question the FTC is pondering (and which
Microsoft competitors and yellow journalists in the trade press are fueling)
is whether the Microsoft applications group has an unfair advantage over other
software developers because of the inside skinny on what the operating-system
group is up to.
The Microsoft statement further claims that some functions in Undocumented
Windows--DirectedYield, for instance--are indeed documented in the SDK.
DirectedYield, as it turns out, is in the 3.1 SDK, but for nearly two years it
didn't appear in the 3.0 SDK. Like any attempt at rewriting history,
retroactive documentation doesn't change the facts.
Undocumented Windows never set out to make a big deal of Microsoft's use of
undocumented Windows functions. For its part, Microsoft didn't do anything
wrong in creating undocumented calls or in using them in its applications.
Microsoft, spurred on by the WSJ and InfoWorld articles, created the furor by
issuing a Nixonian response to a problem that didn't exist, then by not
keeping the story straight and not getting the facts right. As Schulman
concludes, "That Microsoft resorts to such a shoddy practice as claiming that
documentation in 3.1 is somehow retroactive, or means that these functions
were all along documented, troubles me."
Me too.




































November, 1992
LETTERS







Fletcher Lives


Dear DDJ,
Per John Kodis's "Fletcher's Checksum" in the May 1992 DDJ, I ran Fletcher's
algorithm through an ancient error-simulator program of mine, DATAERRS.
DATAERRS tests various checksums and CRCs by finding whether or not they miss
errors put into data packets. DATAERRS puts the errors into the packets in a
fashion similar to that of a bad phone line, i.e., errors are weighted toward
small data changes. Table 1 tells how many bad-data packets per 100,000 are
missed by each of several methods.
Table 1

 Packet size and type
 37-byte 233-byte
 Method ASCII 3.25~
 -----------------------------------------

 Parity bit 4500 4400
 8-bit XOR-sum 1300 1350
 8-bit checksum 900 900
 16-bit checksum 600 700
 Fletcher's 70 50
 super-cs 20 20
 16-bit CRC 1 1
 32-bit CRC 0 0
 Doubled 16-bit CRC 0 0

Whether the "super_cs" checksum's performance is only good under DATAERRS or
not, I don't know. Its grace is that on CPUs with parity flag bits (e.g. Intel
8008 through 486s), it is extremely easy to implement. By the way, slight
changes to it ruin its efficacy.
DATAERRS has never detected any missed errors (over a few hundred million
packets) when two different 16-bit CRCs (CCITT and CRC-16) are used in
combination. It has also never missed errors using 32-bit CRCs. Furthermore,
as the code in Example 1 suggests, DATAERRS has found the same "perfect" level
of error detection from a single type of 16-bit CRC done twice: first on the
data, then on the data shuffled (as you would shuffle a deck of cards). This
result may have implications with regard to digital signatures. Creating or
modifying a block of data to force a particular, known CRC should not be hard.
But creating one that, after shuffling, has a particular, other, known CRC
seems a bit more difficult--especially if the procedure is iterated a few
times.
Example 1

#define SWORD unsigned short
#define BYTE unsigned char
#define WORD unsigned int

/*** Do a Fletcher Checksum on a block of memory. ***/
SWORD mem_fletcher_cs(BYTE *mem, WORD how_many)

{
BYTE s1, s2;

s2 = s1 = 0;
while (how_many-) {
 s1 += *mem++;
 if (s1 == 255) s1 = 0;
 s2 += s1;
 if (s2 == 255) s2 = 0;
}
s2 += s1;
if (s2 == 255) s2 = 0:
s2 = 255 - s2;

s1 += s2;
if (s1 == 255) s1 = 0;

s1 = 255 - s1;

return(s1 * 256 + s2);
}

/*** Do a 16-bit checksum that DATAERRS says is very effective. ***/
SWORD super_cs (BYTE *xm, WORD how_many, SWORD cs)
{
while (how_many-){
 cs += *xm;
 cs = _rotl(cs, 1) + odd_parity(*xm++);
}
 return(cs);

B. Alex Robinson
Maple Valley, Washington
John responds: It's interesting to compare the results listed by Mr. Robinson
for the 16-bit checksum, Fletcher's checksum, the super_cs algorithm, and the
16-bit CRC. While these algorithms all have about a 1 in 65,000 chance of
missing an error when fed random data, the number of errors reported under
testing by the DATAERRS program varies by a factor of 700. The error-detection
ability of any of these algorithms' type depends on the types of errors which
occur most frequently in practice. Mr. Robinson's results dramatically show
the extent to which this is the case.


Detecting Outliers


Dear DDJ,
Statistics and data analysis require careful thought. Unfortunately, Roy
Kimbrell's "Finding Significance in Noisy Data" (June 1992) repeats some
common myths, fallacies, and pitfalls usually dealt with in beginning
statistics courses.
First, he states, "It won't hurt to assume a normal distribution for the
errors." The so-called "normal" distribution referred to here is the Gaussian
distribution with density proportional to exp(-((x-m)/s * * 2), where m and s
are the mean and standard deviation. Falsely assuming this distribution
matters a great deal when we test individual values and cannot appeal to the
central-limit theorem. Measurements with a minimum of 0 and an effective
maximum at least several times the typical value are often badly skewed.
Taking square roots of counts often give data with a distribution closer to
Gaussian. Concentrations are often improved by logarithms, as with pH values.
Second, the article mislabels column 1 of Table 1 as the "probability that the
value is significant." Each entry in this column is actually the probability
(expressed as a percent) that a nonsignificant random value from a standard
normal distribution will fall within the interval defined by the value in the
second column. A probability of significance for individual values requires
specification of both the fraction and distribution of values affected by the
putative nonrandom cause.
Third, it ignores the effect of multiple tests. If the residuals ("errors")
are independent, then random variation will result in an average of 8.4 days
(365 *.023) falsely declared significant when 365 days are tested against two
standard deviations. Serial correlation changes all probabilities in a
difficult-to-calculate manner. Standard time-series analysis uses lagged
correlations to model the data and give independent residuals.
Fourth, it blindly applies the proposed method to the flight data and fails to
examine the predictions for sensibility. The operation of lattice filtering is
opaque even after looking at the results in Roy's Figure 2. The predictions
are sometimes more jagged than the raw data instead of smoother, as from
December 2 to December 10. December 2 is predicted to be lower than any day
previous. December 3 (a Sunday, I believe) is called significant for being
medium instead of high, but a better prediction is that it should be
relatively low, as are December 10, 17, 24, and 31. There is no attempt to
show that these predictions are any better, in any sense, than the raw data
for this series.
Fifth, Roy pretends that an after-the-fact search for causes of "significant"
events has much of any meaning. One can almost always find something, which is
why astrology and the like are still alive. To properly investigate the
relationship between weather and plane flights, one must first code the
weather for each day by either formula or a human who is ignorant of the
flight data.
In summary, the contention that these adaptive filters "select significant
events with precision" is highly questionable. The dependence of
"significance" on the number of history days itself makes this clear. The
reader should be wary before using this method as presented.
Terry J. Reedy
Newark, Delaware
Roy responds: Terry's first point is that we should take care when assuming a
normal distribution. I do agree. However, he might have noticed that the
results of the experiment depend only on the mean and standard
deviation--which are distribution independent. The normal distribution was
only used as an example illustrating the idea of significance. Though I
neglected to say so in the article, I had done some checking. The differences
between the expected and the actual counts of aircraft flights did seem to fit
a normal distribution. (It is common for differences in values to fit a normal
distribution even though the values come from some other distribution.) Formal
tests of goodness of fit were unnecessary because the results of the
experiment didn't depend on the data fitting a normal distribution.
I also agree with the second point--that the table labeled "confidence values
from the normal distribution" could have been better labeled. But then, it was
only an example showing how the standard deviation is related to significance.
And to the third and fourth points: Terry is, understandably, approaching
adaptive filtering from a statistician's viewpoint. The process is amenable to
statistical analysis, but it must be approached in a different way. The point
of the adaptive filter discussed in the article is that large differences
between the expected and actual counts are correlated with changes in the
weekly pattern of high and low counts--not with the actual values of the
counts. Another point is that these changes in the weekly pattern are often
difficult to find using simple statistics.
Terry also suggests that I should have tried to rigorously prove that the
technique works as advertised. I'm afraid that would have been flogging a dead
horse. Adaptive filters have been in active use for at least ten years (in
speech processing, communications filtering, and the like). I was only
reporting on one of their applications.
Terry is correct in pointing out that after-the-fact searches for
"significance" can prove that virtually any technique works. But there is a
difference between searches for significance and a search for the reasons a
technique is showing certain results. I looked at the weather and other events
trying to find reasons for the technique's indications of significance. To do
this, I read the Omaha World-Herald for the 365 days corresponding to the
data. Where there were events that should have affected aircraft operations, I
checked for indications of significance. Where there were indications of
significance, I looked for events. (Though I didn't report them in the
article, there were weak correlations with problems United Airlines was having
with their aircraft during the spring of that year. Unfortunately, it's hard
to quantify such events so they can be analyzed statistically.) The look at
the weather and other events that might have affected aircraft operations was
casually reported because it was a matter of curiosity only. I said as much in
the article, though I suppose I should have been clearer.
I guess if this was to be a careful study of the application of a newly
invented technique, I'd want to perform the kind of study you suggest. But
because it would be boring to read about and would prove little, I am sure Dr.
Dobb's would hesitate to publish such a report. The idea is to bring new
techniques and tools to programmers. In this we've succeeded. Because of the
article, the technique is being investigated at the Center for Disease Control
in Atlanta to signal outbreaks of diseases, and at Texas A&I University to
help project carrying capacities of range land for early stock adjustments.


Just Checking


Dear DDJ,
You printed a letter in the May issue titled, "A Little History" from Denys
Tull. I am skeptical of several of Denys's conclusions, but take exception to
one in particular.
If C compilers are recompiling every object module by using the #include
statement, we had better tell the NSI committee and the compiler
manufacturers! This will take them by surprise. I do not know of any compilers
that recompile object libraries during each compilation. The existence of such
compilers is doubtful.
Clearly, Denys should find out what C compilers are all about--especially the
uses of the #include statement.
Gregory L. Filter
Battle Creek, Michigan


Swap Macro


Dear DDJ,
Regarding the running discussion of the ideal swap macro for C, we have been
debating this issue in the C User's Group (U.K.) journal recently. The
storage-free swap method using XORs, as described by Bill Wilder in the July
1992 issue, is given in Glassner's excellent Graphics Gems (Academic Press,
1990). However, the author erroneously states that the macro works for any
data type.
This is true in theory, but unfortunately ANSI C as it stands doesn't allow
you to XOR noninteger data types, so the macro will work with char, int, and
long, for example, but not with float or double.
Graham Mudd
Beckenham, England































































November, 1992
SIZING UP GUI TOOLKITS


Multiplatform, multilanguage, and more!


 This article contains the following executables: GUIXVT.ARC GUIISL.ARC
GUIAUT.ARC GUIWNX.ARC


Ray Valdes


Ray is senior technical editor at DDJ. He can be reached at 76704,51 on
CompuServe or at rayval@ well.sf.ca.us on the Internet.


This article picks up where we left off last month. In October, my article
"Sizing Up Application Frameworks and Class Libraries" presented the rationale
for comparing the "apples, oranges, and bananas" of application frameworks,
class libraries, and GUI toolkits.
Last month's article described in detail the challenge that Dr. Dobb's posed
to a number of vendors, so I'll only recap the highlights here. DDJ asked each
tool vendor to implement the same graphics application using their toolkit. My
previous article presented results using Borland's ObjectWindows Library
(OWL), Inmark's zApp application framework, Island Systems' object-Menu,
Liant's C++/Views, and Microsoft's Foundation Classes (MFC). This month
showcases Autumn Hill's Menuet/CPP, Island Systems' graphics-Menu, WNDX
Corp.'s WNDX, and XVT Software's XVT toolkit. Also mentioned here is DDJ's
homegrown version, which was implemented on both the DOS and Windows
platforms; see Figure 1.
The packages in this issue are a bit more diverse than those of October. The
tools in the October issue were all object oriented, all C++, and almost all
Windows based (although available for other platforms as well). This month's
selection includes two multiplatform-based GUI toolkits in C (XVT and WNDX),
one DOS-based class library in C++ (Menuet/CPP), and a DOS-based GUI toolkit
for C and Pascal (Island's graphics-Menu).
Our varied toolkit choices reflect the situation many programmers find
themselves in, in which a program must get done, and there's no pre-existing
dogmatic preference for a methodology. All levels of abstraction above the API
are fair game--from the ground-level approach of a graphics library (such as
our DOS-based implementation in C, which uses Borland's BGI library), to the
middle-level elevation of a GUI toolkit, to the higher altitudes of the
application frameworks discussed in our last issue. The toolkit selection is
meant to be representative rather than comprehensive. Our emphasis on concrete
results rather than methodological dogma is validated by today's application
marketplace, in which application packages implemented using a wide range of
tools compete head-to-head for buyers. Users neither know nor care if you used
C++ or assembler, as long as your software is ready, bug-free, and packs the
necessary features.
Our goal is to provide you with information necessary for choosing between
these various tools and technologies for program construction. The rationale
is that conventional product reviews can only go so far. To properly evaluate
a complex tool, it's essential for you to see what it can do in the hands of
an expert programmer familiar with that toolkit. By examining the complete
code of a nontrivial graphics application and comparing the code to similar
implementations using other packages, you can gain insight available in no
other way.
Many packages use lines of code as a metric to distinguish themselves from
competitors. Such rudimentary measurements are useful, but can be misleading
unless you examine the actual code in question to get a feeling for the
density and texture of the source material. You can obtain both the executable
and the full source code for each implementation from DDJ; see "Availability"
on page 5.
Also available electronically is the complete program spec, along with the
programmer's notes for each implementation. Briefly, the DDJ sample program,
known as HWX Browser, allows for interactive viewing and selection of samples
of digitized handwriting data. The data is basically a collection of vectors
(one group per letter) that can be displayed with MoveTo(), LineTo(), or
PolyLine() primitives. Our spec provided plenty of leeway for implementors to
use in making trade-offs between interface design, program functionality, and
ease of implementation. If nothing else, the results are fascinating from a
user-interface design point of view. Figure 1 through Figure 5 show the
different interpretations of our DDJ UI spec. Tables 1, 2, and 3 show which
features were implemented, the implementation sizes in lines of code, and the
size of the executables, respectively. The following sections discuss each of
this month's implementations, in alphabetical order.
Table 1: Feature sets in the different implementations of the DDJ HWX Browser
(missing features do not imply lack of support by product).

 Autumn Hill DDJ Island Systems WNDX Corp. XVT Software
 Menuet/CPP with BGI graphics-Menu WNDX XVT Toolkit

File-open
 dialog x -- x x x
Data window
 is
 resizable -- -- -- x x
Data window
 is
 scrollable -- -- -- x x
Access
 commands
 via menu x -- x x x
Access
 commands
 via
 toolbar
 or button x -- -- -- x
Show menu
 help in
 status
 pane -- -- -- -- --
Show
 general
 help in
 help
 window -- -- -- -- x
Select
 instance
 by
 pointing
 to cell -- -- -- x x

Select
 letter
 by
 pointing
 to cell -- -- -- x --
Select
 instance
 by
 keyboard -- x x -- --
Select
 letter
 by
 keyboard -- x x -- x
Select
 instance
 by
 scrollbars x -- -- x --
Select
 letter
 by
scrollbars x -- -- x --
Show all
 letters
 and
 instances -- -- x x --
Multiple
 kinds of
 views of
 instance -- -- -- x x
Display
 custom
 sequence
 of letters x -- -- --
Letters in
 scrolling
 graphic
 list x -- -- x --
Change
 line
 color
 of letter x -- x x x
Change
 background
 color of
 letter x -- x x x
Change
 color
 ensemble
 (palette) x -- -- -- --
Change
 line
 thickness
 of letter x -- x x --
Change
 scaling of
 letter -- -- -- x x
Print
 letter x -- -- -- --
MDI-style

 child
 windows -- -- -- x x
Tear-off
 menus -- -- -- -- --

Table 2: Source-code size (in lines of code) of different implementations fo
the DDJ HWX Browser.

 Lines Menuet/ DDJ graphics- WNDX XVT
 CPP Menu
 ------------------------------------------

 CPP 1672 -- -- -- --
 HPP 142 -- -- -- --
 C -- 1474 2215 1565 2257
 H -- 239 278 270 924
 RC -- -- -- -- --
 DEF -- -- -- -- --
 Total 1814 1713 2493 1835 3181

Table 3: Size of executable files (in bytes) of different implementations of
the DDJ HWX Browser.

 Autumn Hill Menuet/CPP 355,846
 DDJ with BGI 55,216
 Island Systems graphics-Menu 192,496
 WNDX Corp. WNDX 383,840{*}
 XVT Software XVT Toolkit 49,152{*}


{*} The WNDX and XVT implementations on Microsoft Windows require the usual
amount of support from the Windows environment (900K in runtime DLLs and 3-5
Mbytes of additional data and resources on disk). In addition, the XVT
executable requires approximately 200K of proprietary DLL support at runtime,
while the WNDX executable requires a proprietary runtime file (an RSC file)
approximately 37K in size.


Autumn Hill's Menuet/CPP


Autumn Hill's Menuet/CPP is a GUI class library written in C++ for the DOS
platform. It represents the next-generation version of an earlier package,
Menuet, which was written in C (and which is also still available). Like the
original, Menuet/CPP supports the usual range of CUA-compliant widgets or
controls: windows, menus, buttons, scrolling lists, combo boxes, spin buttons,
gauges, and so on. There are standard dialogs for file selection, color
selection, alerts, and queries. Figure 2 shows Autumn Hill's implementation of
our specification. Listing One (page 113) shows the code that creates the main
window and the string window.
Unlike the original, Menuet/CPP is object-oriented in design. It also relies
on C++ constructs such as operator overloading. For example, to attach
controls to a window, you use the left-shift operator as in the following
sequence: aWindow << aButton << anotherButton << aTextField. You also use the
same syntax to send a message to a window: aWindow << someMessage. The
compiler disambiguates between the two by means of the different argument
types. While these coding puns may jar the sensibilities of longtime C
programmers, they are considered elegant in the C++ world. Fortunately, those
whose sensibilities are offended can choose Autumn Hill's companion C-based
toolkit. Operator overloading is found in many other C++ frameworks, not just
Menuet/CPP. For example, object-Menu uses the + operator to add a window to
the event-responder queue, and to add a menu item to a menu.
One interesting class in Menuet/CPP is mQuilt which encapsulates the behavior
of composite rectangles. As you add an overlapping rectangle to an mQuilt (a
set of nonoverlapping rectangles), the class transforms the new addition into
the equivalent set of nonoverlapping rectangles. This class is used by
Menuet's window manager to compute the viewable and nonviewable regions
created by overlapping windows on the display.
Unlike standard Menuet, which runs on both DOS and UNIX, at the moment
Menuet/CPP is primarily for DOS, and there is a version that supports CIC's
pen extensions to DOS (known as PenDOS).
Regarding the level of abstraction, Menuet/CPP is located somewhere between an
application framework and a GUI class library. As defined in the October 1992
DDJ, an application framework is distinguished from a class library in that a
framework facilitates reuse not just of pieces of code and user-interface
components, but also of program design and overall structure. In addition, a
full-fledged application framework provides all the general-purpose
functionality common to most applications -- not just UI components, but also
support for undoing commands, printing, debugging, memory management, and so
on.
Unlike those application frameworks directly influenced by the MVC paradigm in
Smalltalk-80, Menuet/CPP seems to have a unique ancestry. There are also signs
of an X-Window influence, for example, the way in which the terms "window
manager" and "widgets" are used.
Compared to graphics environments such as Microsoft Windows, Menuet/CPP does
not provide the bottom-most layer of graphics-display primitives -- draw a
line, a polygon, bitblt, and so on -- equivalent to the GDI layer in Windows.
Rather, your program must link to a DOS-based graphics library such as
Borland's BGI or one of the optimized third-party products such as
FlashGraphics, Genus GX, or MetaWindow. These graphics libraries usually just
handle visual displays rather than hardcopy devices. In order to implement
printing in the DDJ sample application, Autumn Hill used a companion package,
BabyDriver, which is a printer-interface library that supports over 300
different printers.
Finally, it's important to note that, unlike some other toolkits, Autumn Hill
does not routinely license the source code to Menuet/CPP. It does, however,
make it available under special contract.


Island Systems' graphics-Menu


Island's graphics-Menu is a DOS-based GUI toolkit that is the precursor to
Island's object-Menu, the C++ DOS-based application framework covered in last
month's article. Figure 3 illustrates how Island Systems implemented the
specification. You can use graphics-Menu with either C or Pascal programs.
Like other GUI toolkits, graphics-Menu is at roughly the same level of
abstraction as pre-3.x Microsoft Windows. That is, there are the usual menus,
check lists, dialogs, buttons, icons, and so on -- without the Windows 3.x
non-GUI, system-level enhancements such as VMM, OLE, and DDE.
Listing Two (page 113) shows the C code for the main function and for drawing
the main window.
The UI components in graphics-Menu have a beveled, 3-D look, reminiscent of
but not the same as the look of object-Menu. The graphics-Menu package
includes two interactive resource-design tools, one for menus and the other
for icons. The menu designer generates C or Pascal code (as opposed to binary
resource files, as with Windows' RES format).
A big difference between graphics-Menu and other GUI toolkits is the extensive
support for forms-based data entry. An interactive tool (Data Entry Designer)
allows creation of forms with range checking, context-sensitive help, shortcut
keys, field-specific pop-up menus, and dBase-style data-entry templates
(picture strings).
As do other GUI toolkits, graphics-Menu requires the use of a graphics
library: either that native to your compiler (Borland's BGI or Microsoft's
graphics library) or one of the third-party libraries (Metagraphics MetaWindow
or Genus GX).
One nice feature of graphics-Menu, for those programs that have to permanently
inhabit real mode, is the automatic detection and use of expanded memory. One
disadvantage compared to object-Menu, is that it doesn't provide support for
32-bit compilers, such as Metaware C/C++.
With regard to speed, Island's implementation of the DDJ sample application in
C, using graphics-Menu, felt much faster and more responsive than last month's
C++ implementation using object-Menu( also a real-mode program). Also, the C++
implementation's memory requirements were such that we could only run it with
most of our usual TSRs unloaded. By contrast, this month's non-OOP version had
no such obstacles. The performance of last month's object-Menu implementation
can be improved by using a DOS extender or 32-bit compiler, or even a DOS
memory manager such as QEMM. But if you don't have your heart set on using
C++, it may just be simpler and easier to use graphics-Menu instead.


WNDX's WNDX



WNDX, from WNDX Corp., is a multiplatform GUI toolkit available for Windows,
DOS, Motif, OpenLook, and the Macintosh. The DOS-based version of WNDX
includes Metagraphics' MetaWindow as the underlying layer; this package is
known as MetaWNDX. As Figure 4 shows, we used the Windows-hosted version of
WNDX for this article.
More Details.
In the arena of portable C-based GUI toolkits, WNDX takes a middle ground
between the close-to-the-natives approach of XVT and the completely virtual
approach of Neuron Data's Open Interface. Each of these approaches has its
trade-offs. Open Interface supposedly reconstructs almost all GUI elements
(such as scroll-bars and buttons) from a small set of graphics primitives.
This "thick layer" approach, among other things, allows your program to sport
the look and feel of one platform, say the Macintosh, on another, such as
Windows. On the "thin layer" side, XVT's Portability Toolkit tries to remain
close to the underlying API; in some cases, the result is that certain native
functionality available on a given platform (for example, the Macintosh's List
Manager API) is not part of the abstract portable API. By contrast, WNDX
provides its own implementations of common functions such as the file-open
dialog, using where possible lower-level native elements such as scroll bars.
This provides your programs with a UI consistent across all supported
platforms but which may differ from the Windows API.
For example, in WNDX's implementation of the DDJ sample application, the WNDX
file-selection dialog, while similar to the common file dialog in Windows,
works a little differently--enough to disconcert or irritate the habituated
user. (See the accompanying textbox "How Does it Drive?") There also appears
to be a small performance penalty associated with this approach (on my
386/33). On the other hand, the payoff of WNDX's approach is shown by the
sample application, which packs a lot of functionality using a modest amount
of source. Using a UI component that is an extension of the Macintosh List
Manager functionality, the WNDX implementation displays a spreadsheet-like
matrix of cells, each of which contains a sample glyph. The relevant C code is
shown in Listing Three (page 113). The user can select a cell and resize its
width or depth; the corresponding column or row is scaled appropriately. By
double-clicking on a cell, the user can open any number of windows to display
individual characters. These windows can be resized to scale the characters
also. An additional wrinkle to the implementation is that the initial dialog
for selecting glyphs uses two scrolling lists of graphical objects (one
horizontal and the other vertical).


XVT Software's XVT


XVT Software's Portability Toolkit makes its purpose quite clear from the
start, by means of its plain-spoken name. Designed for multiplatform graphics
applications, the XVT Portability Toolkit is currently available for Windows,
Presentation Manager, Macintosh, X/Motif, X/OpenLook, and also in
character-mode versions for DOS, OS/2, UNIX, and VMS. (Figure 5 shows the
Windows version.) The range is so impressive you almost don't notice that
there is no support for graphics-mode DOS. Presumably this is next on the
list.
XVT was, for a long time, the only commercial multiplatform tool to support
the Macintosh. This is now changing; Neuron Data and WNDX are in the game,
with bigger players to follow (Symantec's Bedrock and Microsoft's Alar),
perhaps over the next year.
Now at version 3.0, XVT has evolved over the last five years from a paper-thin
portable layer of abstraction over multiple native APIs to a more
full-featured but still efficient toolkit that better covers all the corners
of an abstract GUI API -- for example, by adding a platform-independent
resource language and enhanced support for printing, debugging, and text
editing. The code in Listing Four (page 114) shows some of the principal event
handlers for the browser window.
Unlike WNDX, which attempts to offer the best of each native platform on all
platforms via emulation, XVT is less ambitious, opting for native
functionality where possible. An example of the file-selection dialog has
already been mentioned. Another example is the resource-definition language.
XVT uses URL (Universal Resource Language) files to specify application
resources such as menus and dialogs. You use the CURL utility to translate URL
source text files into native resource scripts. These native resource files
are then processed by native tools (such as the Windows resource compiler) to
produce binary resource files that can be bound to your executable. By
contrast, WNDX defines application resources using RSC files, text files in
WNDX's own format that are processed at run time by your executable.
Even with XVT's recent enhancements, its level of abstraction remains at a
middle level (roughly equivalent to pre-3.x Windows), compared to a class
library or application framework. As Island Systems and Autumn Hill have done,
XVT Software has added a C++ package to its product line. The XVT++ class
library, in keeping with the company's approach, is a thin layer between a
C-based GUI toolkit and a C++-based GUI application.


Conclusion


Bob Metcalfe, the inventor of Ethernet, said recently, "The operating system
of the mid-to-late 1990s will be somebody's class library." If you append the
words "or application framework" to that sentence, it sounds like a plausible
prediction. But right now, the operating system of the early '90s seems to be
somebody's GUI toolkit (namely, Microsoft Windows), competing with a number of
other toolkits that provide what Microsoft left out -- among other things,
portability, thrifty use of resources, and/or being able to run programs
directly on DOS. If these qualities are important to your application, you can
use the sample implementation here to help you choose among the alternatives.


How Does it Drive?


The challenge DDJ posed to GUI vendors resulted in some interesting lessons in
practical UI design. You're no doubt familiar with the usual tenets of
user-interface design--make the layout uncluttered, the functions transparent,
usage consistent, state visible, feedback immediate, and so on. But set all
this aside for the moment, and consider the question of how programs feel on
the first test drive.
By launching the many different implementations of the same DDJ sample program
for the first time, we've discovered a few simple rules of thumb to help with
the fact that you never get a second chance to make a first impression. First,
a little bit of color seems to go a long way in making that first impression
favorable. Your subjective mileage may of course vary, but to my eyes a color
background (for example, Liant's simple blue expanse in the last issue)
conveys the feeling of a substantial program, much more than the empty white
space which may be the only visible manifestation of thousands of lines of
clever programming.
As you place your hands on the wheel (or mouse, as the case may be), how do
the controls feel when they are moved? Silky smooth and directly connected to
what's on the screen? Or more like manipulating an object with a pole through
ten feet of water? Alas, this is not always something your application program
can directly control, but often a result of (in) efficiencies in the low-level
graphics library.
In buying a car, you can look at consumer reports all you want, but all too
often when you get in and drive the thing, your gut makes the actual decision,
bypassing the brain's careful deliberation. In the early years of Microsoft
Windows, any number of benchmarks showed the same or better elapsed times for
actions such as opening a file or drawing a line of text, compared to the
Macintosh versions of the same program (PageMaker or Microsoft Word).
Yet anyone who actually used the programs on both platforms can tell you that
the Windows version felt jerky and slow, even on a CPU twice as fast as that
of the Mac. (The situation has now changed, as a result of much effort in
optimizing the GDI.)
In the DDJ sample implementations, the controls that felt the smoothest to my
hands were those in Autumn Hill's version. I don't know if this results from
Menuet/CPP's use of the FlashGraphics library or whether it comes from clever
programming at higher levels of the system, but it bears further
investigation. At a minimum, I'd like to link the code with other graphics
libraries that have a reputation for speed, such as Metagraphics MetaWindow.
A final realization from working with the many implementations is how
irritating small discrepancies can be. For example, every implementation used
a file-selection dialog, which worked basically the same way. However, there
were tiny differences from one to the other, not always consciously perceived
until later, that often contributed to an overall prickly feeling about the
implementation.
For example, as a result of being drummed into my fingers by many Windows
programs, I'm now accustomed to the convention that the escape key is
equivalent to the Cancel button, and that the spacebar selects whatever button
has the focus. I don't know whether these particular choices make sense in the
realm of UI design theory. But when using a custom-made file dialog such as in
WNDX, Autumn Hill, or Island, there are little pinpricks of frustration when
these components don't work as expected. Moral: If your program is targeting a
particular population of users, its worth spending time to nail down every
last one of the UI conventions.
--R.V.


_SIZING UP GUI TOOLKITS_
by Ray Valdes

[LISTING ONE]

//=========AUTUMN HILL'S MENUET/CPP Excerpt=========================

mWindow * stview_window( mFont * sysfnt )
{
 mWindow *wn = new mWindow( "View String", 0, sysfnt,
 560, 370, wBDRFIXED );
 wn->setstatus( wsMODALwsDESTROY, 1 );

 mRect r( 20, 145, 510, 320 );
 mWnCtlAperture *ap = new mWnCtlAperture( r );
 ap->getnodes()->rgn->setbrush( hwxbrush );
 ap->settask( draw_st_task );
 ap->setstatus( xPOSTDRAW, 1 );
 *wn << *ap;

 r.ymax = r.ymin - 10;
 r.ymin = r.ymax - 20;

 mWnCtlHScrollBar
 *hsb = new mWnCtlHScrollBar( r, butRIGHTRIGHT );
 hsb->settask( st_hsb_task );
 hsb->set( st_hsb_reading );
 *wn << *hsb;

 r.set( 520, 145, 540, 320 );
 mWnCtlVScrollBar
 *vsb = new mWnCtlVScrollBar( r, butDOWNDOWN );
 vsb->settask( st_vsb_task );
 vsb->set( st_vsb_reading );
 *wn << *vsb;

 r.set( 25, 60, 95, 90 );
 mWnCtlSpinBut
 *sp = new mWnCtlSpinBut( r, sysfnt, "Instance",
 instance_selector );
 sp->settask( inst_st_task );
 *wn << *sp;

 r += mPoint( 146, 0 );
 sp = new mWnCtlSpinBut( r, sysfnt, "Scale (PC)",
 scale_selector );
 sp->settask( scale_st_task );
 *wn << *sp;

 r += mPoint( 146, 0 );
 mWnCtlButton
 *bt = new mWnCtlButton( r, sysfnt, "Print" );
 bt->settask( print_task );
 *wn << *bt;

 r += mPoint( 146, 0 );
 bt = new mWnCtlButton( r, sysfnt, "Exit" );
 bt->settask( exit_task );
 *wn << *bt;

 r.set( 105, 25, 455, 45 );
 mWnCtlField
 *fi = new mWnCtlField( r, sysfnt, 0, 40 );
 fi->settask( st_fld_task );
 fi->put( view_str );
 CurHwxStrField = fi->get();
 *wn << *fi;

 return wn;
}

//---------------------------------------------------------//

// create alphabet view window

mWindow * alview_window( mFont * sysfnt )
{
 mWindow *wn = new mWindow( "View Alphabet", 0, sysfnt,
 560, 350, wBDRFIXED );
 wn->setstatus( wsMODALwsDESTROY, 1 );

 mRect r( 20, 105, 510, 300 );

 mWnCtlAperture *ap = new mWnCtlAperture( r );
 ap->getnodes()->rgn->setbrush( hwxbrush );
 ap->settask( draw_al_task );
 ap->setstatus( xPOSTDRAW, 1 );
 *wn << *ap;

 r.ymax = r.ymin - 10;
 r.ymin = r.ymax - 20;
 mWnCtlHScrollBar
 *hsb = new mWnCtlHScrollBar( r, butRIGHTRIGHT );
 hsb->settask( al_hsb_task );
 hsb->set( al_hsb_reading );
 *wn << *hsb;

 r.set( 520, 105, 540, 300 );
 mWnCtlVScrollBar
 *vsb = new mWnCtlVScrollBar( r, butDOWNDOWN );
 vsb->settask( al_vsb_task );
 vsb->set( al_vsb_reading );
 *wn << *vsb;

 r.set( 25, 20, 95, 50 );
 mWnCtlSpinBut
 *sp = new mWnCtlSpinBut( r, sysfnt, "Instance",
 instance_selector );
 sp->settask( inst_al_task );
 *wn << *sp;

 r += mPoint( 146, 0 );
 sp = new mWnCtlSpinBut( r, sysfnt, "Scale (PC)",
 scale_selector );
 sp->settask( scale_al_task );
 *wn << *sp;

 r += mPoint( 146, 0 );
 mWnCtlButton
 *bt = new mWnCtlButton( r, sysfnt, "Print" );
 bt->settask( print_task );
 *wn << *bt;

 r += mPoint( 146, 0 );
 bt = new mWnCtlButton( r, sysfnt, "Exit" );
 bt->settask( exit_task );
 *wn << *bt;

 return wn;
}

//---------------------------------------------------------//

// create character view window

mWindow * chview_window( mFont * sysfnt )
{
 mWindow *wn = new mWindow( "View Character", 0, sysfnt,
 530, 330, wBDRFIXED );
 wn->setstatus( wsMODALwsDESTROY, 1 );

 mRect r( 25, 25, 280, 280 );

 mWnCtlAperture *ap = new mWnCtlAperture( r );
 ap->getnodes()->rgn->setbrush( hwxbrush );
 ap->settask( draw_ch_task );
 ap->setstatus( xPOSTDRAW, 1 );
 *wn << *ap;

 r.set( 310, 210, 390, 250 );
 mWnCtlSpinBut
 *sp = new mWnCtlSpinBut( r, sysfnt, "ASCII Code",
 ascii_code_selector );
 sp->settask( asc_ch_task );
 *wn << *sp;

 r -= mPoint( 0, 85 );
 sp = new mWnCtlSpinBut( r, sysfnt, "Instance",
 instance_selector );
 sp->settask( inst_ch_task );
 *wn << *sp;

 r -= mPoint( 0, 85 );
 sp = new mWnCtlSpinBut( r, sysfnt, "Scale (PC)",
 scale_selector );
 sp->settask( scale_ch_task );
 *wn << *sp;

 r.set( 425, 125, 490, 150 );
 mWnCtlButton
 *bt = new mWnCtlButton( r, sysfnt, "Print" );
 bt->settask( print_task );
 *wn << *bt;

 r -= mPoint( 0, 85 );
 bt = new mWnCtlButton( r, sysfnt, "Exit" );
 bt->settask( exit_task );
 *wn << *bt;

 return wn;
}

//---------------------------------------------------------//

// create "about" window

mWindow * about_window( mFont * sysfnt )
{
 mWindow *wn = new mWindow( "About HWX Browser", 0,
 sysfnt, 350, 250, wBDRFIXED );
 wn->setstatus( wsMODALwsDESTROY, 1 );

 mWnCtlIcon *logoicon = new mWnCtlIcon( mPoint(20,130),
 60, 60, ahs_logo,
 rgnFLAT );
 *wn << *logoicon;

 mRect r = mRect( 145, 25, 215, 48 );
 mWnCtlButton
 *bt = new mWnCtlButton( r, sysfnt, "Exit" );
 bt->settask( exit_task );
 *wn << *bt;


 mWnCtlText
 *tx = new mWnCtlText( mPoint(135, 175), sysfnt,
 "H W X B r o w s e r" );
 *wn << *tx;
 tx = new mWnCtlText( mPoint(215, 145), sysfnt,
 "by" );
 *wn << *tx;
 tx = new mWnCtlText( mPoint(125, 115), sysfnt,
 "Autumn Hill Software, Inc." );
 *wn << *tx;
 tx = new mWnCtlText( mPoint(150, 100), sysfnt,
 "1145 Ithaca Drive" );
 *wn << *tx;
 tx = new mWnCtlText( mPoint(132, 85), sysfnt,
 "Boulder, Colorado 80303" );
 *wn << *tx;

 return wn;
}

//---------------------------------------------------------//

// create main application window and its menu system

mWindow * app_window( mFont * sysfnt )
{
 // create main window
 mRect r = mGdMgr::getdisprect();
 int w = r.delx() + 1;
 int h = r.dely() + 1;
 mWindow *wn = new mWindow( "HWX Browser", &main_menu,
 sysfnt, w, h, wBDRSIZABLE );

 // get window's main menu
 mWnCtlBarMenu *mm = wn->getwnmenu();

 // attach pulldowns
 mWnCtlBoxMenu *
 sm = new mWnCtlBoxMenu( sysfnt, &file_menu );
 sm->settask( file_task );
 mm->setpulldown( sm, 1 );

 sm = new mWnCtlBoxMenu( sysfnt, &view_menu );
 sm->settask( view_task );
 mm->setpulldown( sm, 2 );

 sm = new mWnCtlBoxMenu( sysfnt, &optn_menu );
 sm->settask( optn_task );
 mm->setpulldown( sm, 3 );

 return wn;
}

//---------------------------------------------------------//

// perform app initialization

void init_app( void )

{
 SetDefaultPalette( pSky );
 strcpy( hwxpath, getcurrentdir() );
 strcat( hwxpath, "\\*.DAT" );
 memset( hwxfile, 0, PATHSPECLENGTH );
 CurHwxTbl = 0;
 CurHwxChar = 0;
 CurHwxStr = 0;
 CurHwxStrLen = 0;
 CurHwxCharSet = 0;
 CurHwxAlphabet = 0;
}

//---------------------------------------------------------//

// perform app termination

void term_app( void )
{
}

//---------------------------------------------------------//

int main( int argc, char *argv[] )
{
 init_app();
 mWindowManager *WM = new mWindowManager;
 MainWn = app_window( &WM->systemfont() );
 *WM << *MainWn;
 WM->run();
 term_app();
 delete WM;
 printf( "Availble memory = %ld bytes\n", farcoreleft() );
 return 0;
}



[LISTING TWO]

//==========================ISLAND SYSTEMS GRAPHICS-MENU=============

int main(int argc, char *argv[])
{
 GM_init(argc, argv);

 InitMenus();
 PrepareW1( &HM ); // W1 what we call the main window
 userproc1 = DisplayTime; // this procedure displays the time
 DrawW1();
 FreezeWin( &W1 );
 do {
 PollUser();
 DoW1();
 } while (err==err); /* forever */
 GM_close();
 return 0;
}
//------------------------------------------------draw W1

void DrawW1( void )
{
 HideCursor();
 DrawWindow( &W1,
 true /*DoBevel*/, gSaveWindow /*DoSave*/, true /*HasHmenu*/ );
 setcolor(gBackColor);
 PaintRect (&W1.WorkingR);

 gDrawArea.Xmin = W1.WorkingR.Xmin+8; // set inner drawing rectangle
 gDrawArea.Xmax = W1.WorkingR.Xmax-8;
 gDrawArea.Ymin = W1.WorkingR.Ymin;
 gDrawArea.Ymax = W1.WorkingR.Ymax;

 setScrollRange();

 DrawScrollRect( &(W1.SbarR) );
 DrawDragBoxV( &(W1.SbarR), &(W1.DragBoxR), W1.PcntVert, W1.PcntR );
 DrawScrollRect( &(W1.SbarB) );
 DrawDragBoxH( &(W1.SbarB), &(W1.DragBoxB), W1.PcntHorz, W1.PcntB );
 ShowCursor();
}
//--------------------------------------------------------draw sample
void drawSample(int xOffset,
 int yOffset, boolean holdposx, boolean holdposy)
{
 Rect tmpR;
 static int xLastOffset=0, yLastOffset=0;

 if (holdposx==true) xOffset=xLastOffset;
 if (holdposy==true) yOffset=yLastOffset;

 HideCursor();
 tmpR.left = gDrawArea.Xmin; tmpR.top =gDrawArea.Ymin;
 tmpR.right= gDrawArea.Xmax; tmpR.bottom=gDrawArea.Ymax;
 setcolor(gBackColor);

 PaintRect(&W1.WorkingR);

 setviewport( gDrawArea.Xmin, gDrawArea.Ymin,
 gDrawArea.Xmax, gDrawArea.Ymax, 1 );

 setlinestyle(gLineStyle, 1, gThickness);

 if (gDrawWhat==INSTANCE_DRAW) //display one sample
 {
 omShowSetOfInstances(tmpR, gCurrChar, gCurrScale, LIGHTGRAY,
 gDrawColor, (int)xOffset, (int)yOffset);
 }
 else if (gDrawWhat==ALPHABET_DRAW) // display alphabet for a sample
 {
 omShowAlphabet( tmpR, gCurrSample-1, gCurrScale, LIGHTGRAY,
 gDrawColor, (int)xOffset, (int)yOffset);
 }
 else
 {
 omShowAll( tmpR, gCurrScale, LIGHTGRAY, gDrawColor,
 (int)xOffset, (int)yOffset);
 }
 setlinestyle(SOLID_LINE, 1, NORM_WIDTH);

 xLastOffset = xOffset; yLastOffset = yOffset;
 setviewport( sR.Xmin, sR.Ymin, sR.Xmax, sR.Ymax, 1 );
 ShowCursor();
}
//-----------------------------------------------------display about box
int displayAbout( void )
{
 rect R;
 MoveTo(MidPt.X-(12*StringWidthX), MidPt.Y-(5*FontHeight)); /*screen ctr*/
 HideCursor();
 DrawTextRect(10,26,10,10,LIGHTGRAY,LIGHTRED,true,&R,&err);
 PenColor(BLUE); /*text color */
 BackColor(LIGHTGRAY); /*same as fill color*/
 DrawStringLN(" graphics-Menu");
 DrawStringLN(" Island Systems");
 DrawStringLN(" (617) 273-0421");
 ShowCursor();
 WaitForUser();
 if (Button)
 WaitForNot(Button);
 PopRect(&err);
 return 0;
}


[LISTING THREE]


//============= WNDX CORP.'S WNDX Excerpt ================================

void HWB_BigBrowseList( int lMessage ,
 int lSelect ,
 WX_rect *lRect ,
 WX_point *lCell ,
 int lType ,
 void *lData ,
 int lLen ,
 LST_Ptr lHandle )

 {
 WX_rect R;
 lpList pInstance;

 if ( lMessage == LST_DrawMsg )
 {
 WX_Mode( WX_REP );
 R = *lRect;
 WX_InsetRect( &R , 2 , 0 );
 R.top++;
 R.left++;

 pInstance = HWX_GetInstanceData( ( lpList * ) lHandle->userHandle , lCell->Y
, lCell->X );

 HWB_RenderInstance( lHandle->window , pInstance , &R );

 WX_MoveTo( lRect->left , lRect->bottom );
 WX_LineTo( lRect->right , lRect->bottom );
 WX_LineTo( lRect->right , lRect->top );


 if ( lSelect & LST_Maybe )
 WX_FrameRect( lRect );
 }
 }

int CloseBigBrowser(DLG_Ptr dp)
 {
 free(DLG_GetDp(dp, HWB_DRAWOPTIONS));
 return TRUE;
 }

int HWB_BigBrowserView( struct WND_Record *dp , int itemNo, struct EVNT_Record
*ev,
 int action, int msg, void *data)
 {
 char *fileName;
 WX_point whichOne;

 if ( action == DLG_hadDoubleClick )
 {
 DLG_Get( dp , dlg_filename , NULL , &fileName );

 DLG_GetItemCopy( dp , itemNo , itmList_lastClick , sizeof( WX_point ) ,
&whichOne );

 HWB_Instance( (lpList*) DLG_GetDp(dp, HWB_CHARDATA),
 fileName ,
 DLG_GetDp(dp, HWB_DRAWOPTIONS),
 whichOne.Y , whichOne.X );

 DLG_SetIItem( dp, itemNo, itm_highlight, FALSE );
 }
 return TRUE;
 }

DLG_Ptr HWB_BigBrowser( lpList CharData[] , char *fileName , DrawOptions
*draw_opt , int defChar )

 {
 DLG_Ptr dp;
 LST_Handle LH;
 WX_rect R;
 DrawOptions *drw_opt;
 WX_point defCell;

 dp = DLG_New( dlg_title , fileName ,
 dlg_centered_xy , 215 , 215 ,
 dlg_visible , FALSE ,
 dlg_filename , fileName ,
 dlg_close, CloseBigBrowser,
 dlg_menuname , "Options" ,
 0 );

 DLG_SetDp( dp , HWB_CHARDATA , CharData );

 drw_opt = malloc(sizeof( DrawOptions ));
 *drw_opt = *draw_opt;
 DLG_SetDp(dp, HWB_DRAWOPTIONS, drw_opt);

 DLG_GetCopy( dp , dlg_content , sizeof( WX_rect ) , &R );

 DLG_AddItem( dp ,

 listItm , "test" ,
 itm_action , HWB_BigBrowserView ,
 itm_placement , 0 , 0 , 1000 , 1000 ,
 itmList_top , 0 ,
 itmList_left , 0 ,
 itmList_defWidth , 50 ,
 itmList_defHeight , 50 ,
 itmList_verticalScroll , TRUE ,
 itmList_horizontalScroll , TRUE ,
 itmList_ReframeCellV , TRUE ,
 itmList_ReframeCellH , TRUE ,
 itmList_active , TRUE ,
 itmList_DoDraw , TRUE ,
 itmList_OnlyOne , TRUE ,
 itmList_rows , 128 , /* the defined maximum number of
 characters */
 itmList_columns , 20 ,
 0 );

 DLG_SetPpItem( dp , 1 , itmList_defProc , ( iFunc ) HWB_BigBrowseList );

 LH = ( LST_Handle ) DLG_GetDpItem( dp , 1 , itmList_handle );

 LH->userHandle = CharData;
 LST_CalcVisible( LH );

 if ( ( defCell.Y = defChar ) >= 0 )
 {
 defCell.X = 0;

 LST_MakeVisible( &defCell , TRUE , FALSE , LH );
 }

 DLG_Draw( dp );

 return dp;
 }



[LISTING FOUR]

//=======================XVT SOFTWARE'S XVT Excerpt =======================
PRIVATE void BW_ev_register (BW_DATA *BW_data_p)
{
 EV *ev;
 int count = 0;
 ev = EV_create (BW_SIZE_EV);
 BW_data_p->ev = ev;
 EV_set_eh (ev, count++, E_CREATE, 0, BW_create_eh);
 EV_set_eh (ev, count++, E_FOCUS, 0, BW_focus_eh);
 EV_set_eh (ev, count++, E_UPDATE, 0, BW_update_eh);
 EV_set_eh (ev, count++, E_DESTROY, 0, BW_destroy_eh);
 EV_set_eh (ev, count++, E_COMMAND, M_FILE_OPEN, BW_propagate_eh);
 EV_set_eh (ev, count++, E_COMMAND, M_FILE_QUIT, BW_propagate_eh);
 EV_set_eh (ev, count++, E_COMMAND, M_FILE_PRINT,BW_command_file_print_eh);
 EV_set_eh (ev, count++, E_COMMAND, M_OPT_ATTRS, BW_command_opt_attrs_eh);
 EV_set_eh (ev, count++,E_CONTROL,BW_BUTTON_ZOOMIN,BW_control_zoomin_eh);
 EV_set_eh (ev, count++,E_CONTROL,BW_BUTTON_ZOOMOUT,BW_control_zoomout_eh);

 EV_set_eh (ev, count++,E_CONTROL,BW_EDIT_CHAR, BW_control_editchar_eh);
}
//-----------------------------------------------------handle create event
PRIVATE long BW_create_eh (WINDOW win, EVENT *ev_p)
{
 FILE_SPEC fs;
 FILE *file_p;
 char title[SZ_TITLE], path[SZ_PATHNAME];
 BW_DATA *BW_data_p = (BW_DATA*)get_app_data (win);
 lplpList hwx_data;

 fs = BW_data_p->fs;
 file_p = BW_data_p->file_p;

 // Set the window title using the data file name.
 dir_to_str (&(fs.dir), path, sizeof (path));
 sprintf (title, "Browser - %s\\%s", path, fs.name);
 set_title (win, title);

 if ((hwx_data = BW_load_data (win, file_p)) == NULL) {
 close_window (win);
 return (EV_CONSUMED);
 }
 BW_init (win, hwx_data, 'A');

 // Enable/disable BW-specific menu items.
 win_menu_enable (win, M_FILE_OPEN, TRUE);
 win_menu_enable (win, M_FILE_PRINT, TRUE);

 /* Clear the window, set the starting values of the controls
 * in the BW window, and create the 3 child windows for the
 * graphics output.
 */
 clear_window (win, (COLOR)get_value (NULL_WIN, ATTR_BACK_COLOR));
 update_window (win);

 BW_data_p->BWZ_win = BWZ_new (win, 'A', 0);
 BW_data_p->BWI_win = BWI_new (win, 'A', 0);

 BW_do_status_update (win);

 return (EV_CONSUMED);
}
//---------------------------------------------------handle focus event
PRIVATE long BW_focus_eh (WINDOW win, EVENT *ev_p)
{
 BW_DATA *BW_data_p = (BW_DATA*)get_app_data (win);

 if ((ev_p->v.active == TRUE) &&
 (BW_data_p->init_flag == TRUE))
 set_front_window (get_ctl_window (win, BW_EDIT_CHAR));
 else
 BW_data_p->init_flag = TRUE;

 return (EV_CONSUMED);
}
//-----------------------------------------------------handle update event
PRIVATE long BW_update_eh (WINDOW win, EVENT *ev_p)
{

 BW_paint (win);
 return (EV_CONSUMED);
}
//-----------------------------------------------------handle destroy event
PRIVATE long BW_destroy_eh (WINDOW win, EVENT *ev_p)
{
 BW_DATA *BW_data_p = (BW_DATA*)get_app_data (win);

 /* Free the HWX data structure (if it exists), and then free the
 * BW data object itself.
 */
 BW_free_data (win);
 xvt_free ((char*)BW_data_p);

 return (EV_CONSUMED);
}
//-----------------------------------------------------handle propagate event
PRIVATE long BW_propagate_eh (WINDOW win, EVENT *ev_p)
{
 return (EV_PROPAGATE);
}
//------------------------------------------------handle attribute set evt
PRIVATE long BW_command_opt_attrs_eh (WINDOW win, EVENT *ev_p)
{
 AD_new (win);
 return (EV_CONSUMED);
}
//------------------------------------------------handle zoomin control evt
PRIVATE long BW_control_zoomin_eh (WINDOW win, EVENT *ev_p)
{
 BW_DATA *BW_data_p = (BW_DATA*)get_app_data (win);

 enable_window (get_ctl_window (win, BW_BUTTON_ZOOMIN), FALSE);
 enable_window (get_ctl_window (win, BW_BUTTON_ZOOMOUT), TRUE);

 BWZ_do_zoom (BW_data_p->BWZ_win, BWZ_ZOOMIN_FACTOR);
 BW_do_status_update (win);

 return (EV_CONSUMED);
}
//----------------------------------------------handle zoomout control evt
PRIVATE long BW_control_zoomout_eh (WINDOW win, EVENT *ev_p)
{
 BW_DATA *BW_data_p = (BW_DATA*)get_app_data (win);

 enable_window (get_ctl_window (win, BW_BUTTON_ZOOMIN), TRUE);
 enable_window (get_ctl_window (win, BW_BUTTON_ZOOMOUT), FALSE);

 BWZ_do_zoom (BW_data_p->BWZ_win, BWZ_ZOOMOUT_FACTOR);
 BW_do_status_update (win);

 return (EV_CONSUMED);
}





































































November, 1992
PIE MENUS FOR WINDOWS


Circular menus give a new look to old Windows


 This article contains the following executables: PIEMENU.ARC


Carl Rollo


Carl is a freelance programmer in the Washington, DC area. He can be contacted
at 4431 Lehigh Road, Suite 104, College Park, MD 20740.


Last year in Dr. Dobb's, Don Hopkins described a form of circular menus dubbed
"pie menus" that he originally developed at the University of Maryland's
Human-Computer Interaction Laboratory. (See "The Design and Implementation of
Pie Menus," DDJ, December 1991.) Although circular menus were around before
pie menus, Don increased the selection area for a single item from a small
target in a circle of small targets to the much larger wedge of a pie diagram.
This is not a trivial improvement. According to "An Empirical Comparison of
Pie and Linear Menus" (see "References"), pie menus are faster (15-20 percent)
than linear menus and generally produce less errors when used to make
selections. A subsequent study by Mills and Prime found that, "The fastest and
least error prone amongst the menu styles proved to be the circular menu." As
you may remember, Don committed his concept to the public domain.
In this article, I describe an implementation of pie menus for Microsoft
Windows. I'm using this code as a test-bench to compare the use of pie menus
to the conventional menus of a current (and very popular) GUI design. The
basic aims of the work are to keep the implementation simple while conforming
as closely as possible to Windows' existing menu logic. While this discussion
is limited to what's involved in implementing pie menus for Windows, I've also
written three Windows programs which demonstrate the use of pie menus: One
uses only Windows menu styles; another has a layer of conventional Windows
menus that call a layer of pie menus; and the third is the same application,
using only pie menus. Figure 1 shows the output for this last program. These
programs, along with detailed notes about them, are available electronically;
see "Availability," page 5.


A Pie for Your Window


If we are going to add a new menu concept to Windows, we want the technique to
be quick to implement, while not adding a large amount of overhead to the
existing storage requirements. Since storage and speed are often mutual
trade-offs, this implementation leans in the direction of speed at the cost of
some extra storage.
What do we need to bake a pie menu for our Window? First, look at piemenu.h
(Listing One, page 118), which lists the header file, and piemenu.c (Listing
Two, page 118), which lists the five new functions.
The header file consists of two structure definitions, the function
prototypes, and a few miscellaneous constant definitions. The PIE_OPTION
structure definition defines data on each selection option in a pie menu. This
includes the option's label (if any), the value to be returned if the option
is selected, the background color for the option's pie wedge, the angle (from
0) of the wedge's upper boundary, the rectangle enclosing the option label, a
switch indicating if the option is currently selected, and so on. All this
data is stored, rather than calculated when needed, in order to make drawing a
pie menu as speedy as possible.
The structure definition PIE_MENU defines an array of eight PIE_OPTION
structures plus parameters common to a single pie menu. These include the
radius of the pie chart in the menu, the value to be returned if no selection
is made (the user hits Escape), the number of options in the menu (1-8), and
switches to indicate whether labels are to be inside or outside the pie as
well as whether or not to check the currently selected option. Eight is the
maximum number of options offered in a single pie menu. You can change this by
modifying the source code, but be aware that experts in human-computer
inter-action have determined that eight options is the optimum. After eight,
you should split the choices up into multiple or cascading menus. The pie-menu
header file also defines a far pointer to a PIE_MENU structure in the Windows
style of long pointers as LPPIEMENU-STRUCT. We can use this definition when
supplying our pie-menu definitions to the functions.
To create a pie menu, call CreatePieMenuIndirect, which is modeled after such
Windows functions as CreateBitmapIndirect. The idea is to let the applications
programmer create a PIE_MENU structure wherever it is most convenient in
storage, load those elements of the structure needed to define the menu, and
let the function fill out the rest of the parameters. CreatePieMenuIndirect
takes a far pointer to a PIE_MENU structure (LPPIEMENUSTRUCT) and completes
the structure based on the information already loaded. We need to tell the
function how many options we want in the pie menu, what value is to be
returned for each option, what value to return if no choice was made, whether
we want labels and where to put them, and whether to check the current
selection. If we want labels and colors for each option, we will also have to
load these before calling CreatePieMenuIndirect.
The other two arguments to CreatePieMenuIndirect are the width and height of
the window used to display our pie menu. If you look at regular Windows menus,
you'll see they are actually displayed in a rectangular area with a border,
and that the cursor responds to the menu only in this area. This is a
specialized child window created and operated by Windows. Normally, we define
our menu using a script file for the resource compiler, and Windows determines
how big this window will be. But the method we will use to implement our pie
menus requires that we specify our window size beforehand. (The dimensions are
in pixels.)
Let's take a quick look at what CreatePieMenuIndirect does to complete our
pie-menu structure. First, the function performs some basic error checking on
the values we have supplied. Then it calculates the angular displacement of
each wedge in our pie by dividing 360 degrees by the number of options we have
specified. The end coordinates of the upper bound of each wedge are calculated
at the same time. Why these parameters?
If you look at the description of the Windows Pie function in the SDK, you'll
see that these parameters are exactly what is needed to make a call to this
function. With everything calculated in advance, the function to draw our pie
menu is mostly a call to the Windows Pie function. The next bit of code builds
brushes to fill in the background of each wedge in our pie and set labels on
or off, depending upon our original choices. The remainder of the code in
CreatePieMenuIndirect concerns itself with placing labels for the options.
Properly labeling a pie chart is more a matter of aesthetics than programming.
A pie with only two divisions is labeled differently than one with six. Also,
our pie menus can be labeled either inside or outside the pie, and the
problems involved in placing labels are completely different between the two
formats.
Since we normally label pie charts inside the wedges, why provide the ability
to label outside? Remember, we are trying to create a menu that looks like a
pie chart, not a pie chart itself. We don't have or want exploded segments in
our pie--they would just confuse the user. Short labels inside the pie and
longer labels outside have the same problem. UI designers tell us that such
changes of menu style are a basic no-no.
So where do the outside labels come in? If you have several very long labels,
you will find that a standard Windows menu can cover quite a bit of your
client window. Because Windows uses only horizontal text for menus and because
outside labeling on a pie menu is clustered around a central region (the pie),
the resulting pie menu will be wider but take less vertical space in the
client window. If you implement the code for pie menus on your PC, you can try
this out, comparing pie menus to regular menus when they both contain long
labels.
By now experienced Windows programmers will have said, "Wait a minute, it's
fine to calculate coordinates for your arcs and labels, but what coordinate
system are you using? In what mapping mode?" The answer is that the arc
end-points don't have to be in any coordinate system as yet because they are
functions of the radius, which--we will assume--fits into the window
requested. The labels are another matter, however. Here, we create a memory
device context (DC), set up the MM_ISOTROPIC mapping mode, and scale our
window-viewport coordinate systems contingent on whether the programmer wants
inside or outside labels. For inside labels, we scale the pie menu to the size
of the menu's window dimensions; for outside labels, we determine the leftmost
and rightmost label positions and scale that range to the window dimensions.
Note that all this positioning must be done with regard to a DC.
CreatePieMenuIndirect creates a DC compatible with the display, but you might
ask yourself "How do we know where the menu is going to show up when it gets
displayed?" we don't. The technique that we use to implement pie menus in
Windows actually relies on a DC created by Windows when the user requests the
menu. At that point we scale and redraw everything.
A final word on CreatePieMenuIndirect and labels. If labels are requested to
be inside the pie and a label won't fit in its wedge, the label is shut off.
So, if you get a pie menu with some labels missing, either expand your menu
dimensions or shorten the offending labels.
The DrawPieMenu function takes a DC and our far pointer to a PIEMENUSTRUCT and
calls the Windows Pie function for each wedge in our pie menu. The code is
straightforward, and only two points are really worth noting. First, if we
have requested a check mark to be displayed by the current selection, it
appears that DrawPieMenu inverts the rectangle around the label instead of
writing a check mark. This is because pie menus generally use a lot more color
than Windows' linear menus, and the check mark gets lost in that much color.
Inverting the label area is much more noticeable.
Second is the business of centering the mouse cursor at the center of the pie.
You have to know that, if you are going to take over the cursor in Windows,
you both get and set its current position in screen coordinates. This is fine,
but we are going to draw pie menus using logical coordinates and a DC that we
know next to nothing about. Fortunately the Windows function Get-DCOrg enables
us to get the origin of our DC in device coordinates (relative to the screen
origin); another function, LPtoDP, lets us convert logical coordinates to
device coordinates. We can then use the device coordinates to set the cursor
in the center of our pie menu.
PieMenuProc is basically the inverse function to DrawPieMenu. We get the
current location of the cursor, convert the screen coordinates to logical
coordinates, and calculate the central angle of the radius that runs through
this point. If the distance to the cursor is not greater than the radius (the
cursor is still in the pie), we hunt through the list of angles for each wedge
until we find the wedge in which the cursor rests. This is the option value
that will be returned to the calling code. There is a simple matter of setting
the rectangle-inversion switch if required, and we are done.
Two other functions are needed to provide information for PieMenuProc about
the device context of the menu. It turns out that by the time we are ready to
convert the cursor location, we will have lost this information. DP_to_LP is a
function that replaces the Windows function DPtoLP, which converts device
coordinates to logical coordinates, given a device context. DP_to_LP performs
the same task with information supplied by the remaining function,
GetViewportWindowParms. This function simply collects information from a DC
and stores it in a privately defined data structure.


Implementation


To implement pie menus in our Windows program we need to accomplish three
tasks:
Attach the pie menus to the existing Windows menu system.
Display the pie menus when requested.
Obtain and process the user's selection from a menu.
Attaching pie menus to the Windows menu system is not complex. For one pie
menu, simply:
Declare a data structure of type PIE_MENU.
Load the data structure with the parameters of the pie menu, including: number
of options, "no-selection" value, option names, codes, colors, and so on.
Call CreatePieMenuIndirect to finish creating the menu data structure.
Link the pie menu into the Windows menu system.
To link the pie menu into the Windows-menu system, we design our resource file
as usual for a Windows program. Layout POPUP menus and put INACTIVE MENUITEMs
into the POPUP menus in which we want to install pie menus. Then load the
POPUP menus in WinMain using the LoadMenu function, and append the pie menus
to the INACTIVE items using AppendMenu. At this point we use the "Owner Draw"
menu feature. In the call to AppendMenu, declare the menu style to be
MF_OWNERDRAW. When Windows requires our pie menu to be displayed, it will call
on us to provide the graphics.
Displaying an Owner Draw menu requires that our Windows program process two
additional messages: WM_MEASUREITEM and WM_DRAWITEM. To handle WM_MEASUREITEM,
we provide Windows with dimensions of a rectangle that declare how much screen
space the pie menu requires. The message WM_DRAWITEM is passed when Windows
wants us to draw the pie menu (using DrawPieMenu) or when Windows lets us know
that the pie menu has been selected and needs processing. If the menu has been
selected, we save its menuID.
Obtaining and processing the user's selection from a pie menu requires our
program to process two messages: WM_COMMAND and WM_MENUSELECT. The WM_COMMAND
message is standard for processing menus in Windows. Pie menus simply require
additional layers of C switch statements to handle each layer of menus. For
example, to handle a two-level pie-menu system we must have two levels of case
analysis. At the upper level, options that have a second level of pie menus
must call the function TrackPopupMenu to display the appropriate second-level
pie menu. Options without dependent menus must be processed with code to
perform their function.
The WM_MENUSELECT message is required to process a pie menu but is not usually
required to process a regular Windows menu. In handling this message we should
first be sure that the menu called was a pie menu and that an option was
selected. We then get the code associated with the user's choice by calling
PieMenuProc. Pie menus require that we process the menuID value. If no further
values depend on this value, we simply assign the value returned from
PieMenuProc to the appropriate variable and continue.

If other menus are dependent on the menuID, the logic must be split on the
basis of the value obtained from PieMenuProc. If this choice has no further
menus under it, we post ourselves a WM_COMMAND message with the value from
PieMenuProc as the selection value (wParam) and process the choice as usual.
If menus are dependent on this choice, we let Windows send us a WM_COMMAND
with the menuID. This message will trigger our call to TrackPopupMenu and take
us to the next level of pie menus.
If this quick description leaves any questions unanswered, look at the sample
code that's available electronically, which includes a step-by-step
implementation of pie menus starting with a simple Windows program with no pie
menus (chords.c), followed by the same program with only a bottom layer of pie
menus (piedemo1.c), and finally the "chords" program with two layers of pie
menus (piedemo2.c). The extrapolation from two to three or more layers of pie
menus follows the same pattern.


Summary


When designing pie menus for an application, you'll find that they're
particularly suitable to option suites already connected in the human mind
with circular designs. Presenting colors in the form of a color wheel was used
long before the digital computer came around, making PieDemo's menus seem very
appropriate. Other option suites that come to mind are: options that deal with
orientation around the compass, options connected with time, and any
business-graphics pie chart. (Use a pie menu to present tax-dollar breakdowns
and explode each selection into a detailed report.)
In using pie menus, take this bit of advice: Don't chase multiple pie menus.
What do I mean? Well, if you are used to regular Windows menus, you have
probably gotten into the habit of moving the mouse as the next menu in a
sequence is displayed. But pie menus move the cursor for you, right to the
center of the menu, after which it is only a short move to pick your choice.
You might consider using pie menus that pop up wherever the cursor is when
say, the right mouse button, is pushed. If you are interested in implementing
this technique take a look at the POPMENU program in Petzold's Programming
Windows.
Finally, the real benefit of pie menus is that they are faster and more
accurate to use when selecting up to eight options (which means for most
menus). On top of that, all the work is in the public domain, free of
copyright restrictions, available for any developer to use.


Acknowledgments


A special note of thanks to Ben Shneiderman, director of the Human-Computer
Interface Laboratory at the University of Maryland, College Park, for
reviewing this article.


References


Callahan, J., D. Hopkins, M. Weiser, and B. Shneiderman. "An Empirical
Comparison of Pie and Linear Menus." Human Factors in Computing Systems,
SIGCHI Proceedings: ACM Press, 1988.
Kurtenbach, G. and W. Buxton. "Issues in Combining Marking and Direct
Manipulation Techniques." UIST '91 Proceedings: ACM Press, 1991.
Mills, Z. and M. Prime. "Are All Menus the Same?--An Empirical Study."
Human-Computer Interface, INTERACT '90 Proceedings: Elsevier Science
Publishers, 1990.
Newman, W.M. and R.F. Sproull. Principles of Interactive Computer Graphics,
first edition. Berkeley, CA: McGraw-Hill, 1973.
Petzold, Charles, Programming Windows, second edition. Redmond, WA: Microsoft
Press, 1990.
_PIE MENUS FOR WINDOWS_
by Carl Rollo


[LISTING ONE]


/* --PIEMENU.H-header file for pie menu routines-Carl C. Rollo, 1991,1992 --
*/

#define MAXOPTIONS 8
#define PI 3.14159265
#define TWO_PI (2 * PI)

#define WHITE RGB (255, 255, 255)
#define BLACK RGB (0, 0, 0)

typedef struct
 {
 char *name; /* label for this option */
 BOOL showlabel; /* indicates whether label is to be displayed */
 BOOL checked; /* indicates when option is to be checked */
 short value; /* value to be returned if option is chosen */
 COLORREF color; /* color of pie wedge for this option, default = WHITE */
 HBRUSH brush; /* handle to brush for filling wedge with color */
 double maxangle; /* angle of upper bound of option wedge */
 POINT endarc; /* coordinates of upper bound of option wedge arc */
 RECT txt; /* coords. of rectangle bounding text label */
 } PIE_OPTION;

typedef struct
 {
 short radius; /* radius of pie */

 short no_choice; /* value to be returned if no choice is made */
 short n; /* number of options in menu */
 PIE_OPTION option[MAXOPTIONS]; /* data structures for each option */
 BOOL InsideLabels; /* switch to indicate if labels are to be
 ** placed inside or outside the wedge */
 BOOL UseChecks; /* indicates when selection is to be checked */
 } PIE_MENU;
typedef PIE_MENU FAR * LPPIEMENUSTRUCT;

typedef struct
 {
 DWORD DCOrg;
 RECT ClientRect;
 DWORD ViewportExt;
 DWORD ViewportOrg;
 DWORD WindowExt;
 DWORD WindowOrg;
 } VIEWWINDATA;
typedef VIEWWINDATA FAR * LPVIEWWINDATA;

extern BOOL CreatePieMenuIndirect (LPPIEMENUSTRUCT, short *, short *);
extern void DrawPieMenu (HDC, LPPIEMENUSTRUCT);
extern short PieMenuProc (LPVIEWWINDATA, LPPIEMENUSTRUCT);
extern LPVIEWWINDATA GetViewportWindowParms (HDC hDC);
extern BOOL DP_to_LP(LPVIEWWINDATA, LPPOINT, int);





[LISTING TWO]

#include <windows.h>
#include <math.h>
#include <stdarg.h>
#include "piemenu.h"
/* ----- Functions for creating, drawing, and operating Pie Menus ------
 CreatePieMenuIndirect() - creates a Pie Menu data structure
 DrawPieMenu() - draws the Pie Menu given a Device Context
 PieMenuProc() - determines what option was selected
 GetViewportWindowParms() - gets Device Context parameters for Menu
 DP_to_LP() - converts cursor coordinates without a Device Context
 ----- copyright 1991, 1992 by Carl C. Rollo ----- */

BOOL CreatePieMenuIndirect (LPPIEMENUSTRUCT p, short *pwidth, short *pheight)
 /* Initializes a PIE_MENU structure. Several more time-consuming calculations
 ** are made in this routine to ensure that the pie menu is displayed with
 ** minimum delay. This routine must be called before a pie menu is displayed!
 ** The allocation of the PIE_MENU structure is left to code outside this
 ** routine so that the programmer may choose to store the structure in local
 ** heap, data segment, or global memory. Routine returns TRUE if no error
 ** was encountered, FALSE otherwise. */
 {
 short i, j, xc, yc;
 TEXTMETRIC tm;
 HDC hdc, hdcMem;
 double incvalue, angle, halfangle, h, rsquare;
 DWORD dwsize;
 short tx[4], ty[4], halfwid, halfht;


 if ( (p != NULL) && (0 < p->n) && (p->n <= MAXOPTIONS) &&
 (pwidth != NULL) && (pheight != NULL) && (*pwidth > 0) &&
 (*pheight > 0) )
 {
 if (p->radius <= 0)
 if (p->InsideLabels)
 p->radius = min (*pwidth, *pheight)/2;
 else
 return (FALSE); /* must have a radius for outside labels */
 rsquare = p->radius * p->radius;
 incvalue = TWO_PI / p->n; /* value in radians of each wedge */
 halfangle = incvalue/2.0;

 /* Calculate the angle of the upper bound of each wedge
 ** and the endpoint coordinates of the radius at this angle. */
 for (i = 0; i < p->n; i++)
 {
 p->option[i].maxangle = incvalue * (i + 1);
 p->option[i].endarc.x = p->radius * cos (p->option[i].maxangle) + .5;
 p->option[i].endarc.y = p->radius * sin (p->option[i].maxangle) + .5;
 }
 /* Build brush handles needed to fill the pie menu. */
 for (i = 0; i < p->n; i++)
 p->option[i].brush = CreateSolidBrush(p->option[i].color);
 /* Determine the label positions for each wedge. */
 hdc = CreateIC ("DISPLAY", NULL, NULL, NULL);
 hdcMem = CreateCompatibleDC (hdc);
 SetMapMode (hdcMem, MM_ISOTROPIC);
 if (p->InsideLabels)
 SetWindowExt (hdcMem, p->radius, p->radius);
 else
 SetWindowExt (hdcMem, *pwidth/2, *pheight/2);
 SetViewportExt (hdcMem, *pwidth/2, -(*pheight)/2);
 SetViewportOrg (hdcMem, *pwidth/2, *pheight/2);

 SelectObject (hdcMem, GetStockObject(SYSTEM_FONT));
 GetTextMetrics (hdcMem, &tm);
 /* Determine label positions for each wedge. */
 if (p->InsideLabels)
 {
 /* Labels are to be positioned inside the pie diagram.
 ** Calculate the label position for each wedge. */
 h = 0.7 * (p->radius * cos (halfangle));
 for (i = 0; i < p->n; i++)
 {
 if (p->option[i].name == NULL)
 {
 p->option[i].showlabel = FALSE;
 continue;
 }
 /* Positioning varies with number of wedges. */
 switch (p->n)
 {
 case 1 :
 xc = 0; yc = 0;
 break;
 case 2 : if (i == 0)
 {

 xc = 0; yc = p->radius/2;
 }
 else
 {
 xc = 0; yc = -(p->radius)/2;
 }
 break;
 case 3 : angle = p->option[i].maxangle - halfangle;
 xc = p->radius/2 * cos (angle);
 yc = p->radius/2 * sin (angle);
 break;
 default :
 angle = p->option[i].maxangle - halfangle;
 xc = h * cos (angle);
 yc = h * sin (angle);
 }
 /* Determine the dimensions of the label string using SYSTEM_FONT. */
 dwsize = GetTextExtent (hdcMem, p->option[i].name,
 lstrlen (p->option[i].name));
 /* Get coordinates of corners of the rectangle bounding the label */
 halfwid = (double) (LOWORD (dwsize))/2.0 + 0.5;
 halfht = (double) (HIWORD (dwsize))/2.0 + 0.5;
 tx[0] = xc - halfwid;
 ty[0] = yc + halfht;
 tx[1] = xc + halfwid; ty[1] = ty[0];
 tx[2] = tx[1]; ty[2] = yc - halfht;
 tx[3] = tx[0]; ty[3] = ty[2];
 /* Calculate the counter-clockwise total angles for lines connecting
 ** the center of the pie to each of the corners of rectangle bounding
 ** label, and make sure that each line lies within wedge boundaries. */
 for (j = 0; j < 4; j++)
 {
 if ( (tx[j] * tx[j] + ty[j] * ty[j]) > rsquare)
 break; /* point is outside the pie */
 if (tx[j] == 0)
 if (ty[j] > 0)
 angle = PI / 2.0;
 else
 angle = -PI / 2.0;
 else
 angle = atan ( (double) ty[j] / (double) tx[j] );
 if ( (tx[j] >= 0) && (ty[j] < 0) )
 angle += TWO_PI;
 else
 if (tx[j] < 0)
 angle += PI;
 if ( (angle <= ((i == 0) ? 0.0 : p->option[i-1].maxangle)) 
 (angle >= p->option[i].maxangle) )
 break;
 }
 /* If one of the corners lies outside the wedge, set switch to
 ** indicate label is not to be shown. */
 if (j < 4)
 p->option[i].showlabel = FALSE;
 else
 {
 /* Set the location for the label */
 p->option[i].txt.left = tx[0];
 p->option[i].txt.top = ty[0];

 p->option[i].txt.right = tx[2];
 p->option[i].txt.bottom = ty[2];
 p->option[i].showlabel = TRUE;
 }
 }
 }
 else
 {
 /* Labels are to be positioned outside the pie diagram. */
 for (i = 0; i < p->n; i++)
 {
 if (p->option[i].name == NULL)
 {
 p->option[i].showlabel = FALSE;
 continue;
 }
 /* Get the midpoint of the wedge. */
 if (i == 0) angle = 0.0;
 else angle = p->option[i-1].maxangle;
 angle = (p->option[i].maxangle - angle) / 2.0 + angle;
 xc = p->radius * cos (angle);
 yc = p->radius * sin (angle);
 /* Determine the dimensions of the label string using SYSTEM_FONT. */
 dwsize = GetTextExtent (hdcMem, p->option[i].name,
 lstrlen (p->option[i].name));
 /* Position text label based upon quadrant in which wedge is located */
 if (angle >= 0.0 && angle < PI/2.0)
 {
 /* First quadrant */
 p->option[i].txt.left = xc;
 p->option[i].txt.top = yc + HIWORD(dwsize);
 p->option[i].txt.right = xc + LOWORD(dwsize);
 p->option[i].txt.bottom = yc;
 }
 else if (angle >= PI/2.0 && angle < PI)
 {
 /* Second quadrant */
 p->option[i].txt.left = xc - (LOWORD(dwsize) + tm.tmAveCharWidth);
 p->option[i].txt.top = yc + HIWORD(dwsize);
 p->option[i].txt.right = xc;
 p->option[i].txt.bottom = yc;
 }
 else if (angle >= PI && angle < 3.0 * PI/2.0)
 {
 /* Third quadrant */
 p->option[i].txt.left = xc - (LOWORD(dwsize) + tm.tmAveCharWidth);
 p->option[i].txt.top = yc;
 p->option[i].txt.right = xc;
 p->option[i].txt.bottom = yc - HIWORD(dwsize);
 }
 else
 {
 p->option[i].txt.left = xc;
 p->option[i].txt.top = yc;
 p->option[i].txt.right = xc + (LOWORD(dwsize) + tm.tmAveCharWidth);
 p->option[i].txt.bottom = yc - HIWORD(dwsize);
 }
 p->option[i].showlabel = TRUE;


 }
 }
 DeleteDC (hdc);
 DeleteDC (hdcMem);
 return (TRUE);
 }
 return (FALSE);
 }
/* ------------------------------------------------------------------ */
void DrawPieMenu (HDC hdc, LPPIEMENUSTRUCT p)
 /* Draws a Pie Menu on the Device Context pointed to by "hdc", using
 ** information given in the structure pointed to by "p". */
 {
 short i;
 POINT center;
 DWORD dwOrg;
 HFONT hCurrFont;
 if (p != NULL)
 {
 hCurrFont = SelectObject (hdc, GetStockObject (SYSTEM_FONT));
 SetTextColor (hdc, BLACK);

 for (i = 0; i < p->n; i++)
 {
 SelectObject (hdc, p->option[i].brush);
 Pie (hdc, -(p->radius), p->radius, p->radius, -(p->radius),
 (i == 0) ? p->radius : p->option[i-1].endarc.x,
 (i == 0) ? 0 : p->option[i-1].endarc.y,
 p->option[i].endarc.x, p->option[i].endarc.y);
 if (p->option[i].showlabel)
 {
 if (p->option[i].color == BLACK)
 SetBkColor (hdc, WHITE);
 else
 SetBkColor (hdc, p->option[i].color);
 DrawText (hdc, p->option[i].name, -1, &(p->option[i].txt), DT_CENTER);
 }
 if (p->UseChecks != NULL)
 if (p->option[i].checked)
 InvertRect (hdc, &(p->option[i].txt));
 }
 /* Center the cursor in the Pie Menu */
 dwOrg = GetDCOrg(hdc);
 center.x = 0; center.y = 0;
 LPtoDP (hdc, (LPPOINT) &center, 1);
 center.x += LOWORD (dwOrg); center.y += HIWORD (dwOrg);
 SetCursorPos (center.x, center.y);

 SelectObject (hdc, hCurrFont);
 }
 }
/* --------------------------------------------------------------------- */
short PieMenuProc (LPVIEWWINDATA vwd, LPPIEMENUSTRUCT p)
 {
 /* Retrieves the position of the cursor from within the Pie Menu and converts
 ** this into a selection value. If position can't be converted, "no_choice"
 ** value is returned. */
 short selection;
 short x, y, i, j;

 POINT position;
 double angle;
 RECT rect;

 /* Try to convert position coordinates into a selection */
 GetCursorPos( (LPPOINT) &position);
 position.x -= LOWORD(vwd->DCOrg); position.y -= HIWORD(vwd->DCOrg);
 DP_to_LP (vwd, (LPPOINT) &position, 1);
 x = position.x; y = position.y;
 selection = p->no_choice; /* set result at "no selection" */
 if ( ((double)x * (double)x + (double)y * (double)y) >
 ((double)p->radius * (double)p->radius) )
 return (selection); /* outside the circle */
 if (x == 0)
 if (y > 0)
 angle = PI/2.0;
 else
 angle = -PI/2.0;
 else
 angle = atan( (double) y / (double) x);
 if ((x >= 0) && (y < 0))
 angle += TWO_PI;
 else if (x < 0)
 angle += PI;
 for (i = 0; i < p->n; i++)
 if ( (angle >= ((i == 0) ? 0.0 : p->option[i-1].maxangle)) &&
 (angle < p->option[i].maxangle))
 {
 selection = p->option[i].value;
 break;
 }
 /* If we are using labels and inverting the previously selected
 ** option, shut the previously inverted label off here. */
 if (p->UseChecks)
 {
 for (j = 0; j < p->n; j++)
 if (p->option[j].checked)
 {
 p->option[j].checked = FALSE;
 }
 /* ...and turn the new one on here */
 if (selection != p->no_choice)
 p->option[i].checked = TRUE;
 }
 return(selection);
 }
/* --------------------------------------------------------------------- */
 BOOL DP_to_LP (LPVIEWWINDATA vd, LPPOINT p, int ncount)
 /* A private version of the Windows SDK function, "DPtoLP", that does not
 ** use a Device Context to convert points from device coordinates to
 ** logical coordinates. DP_to_LP requires a pointer to a structure of type,
 ** VIEWWINDATA, that has been previously loaded with the window-viewport
 ** parameters required to convert the points. The remaining arguments are
 ** the same as those for "DPtoLP". The function returns TRUE if the points
 ** can be converted, FALSE otherwise. */
 {
 if (vd != NULL && ncount > 0)
 {
 while (ncount--)

 {
 p->x = (p->x - LOWORD((vd->ViewportOrg)) ) *
 (double) ((int) LOWORD((vd->WindowExt))) /
 (double) ((int) LOWORD((vd->ViewportExt)))
 + LOWORD((vd->WindowOrg));
 p->y = (p->y - HIWORD((vd->ViewportOrg)) ) *
 (double) ((int) HIWORD((vd->WindowExt))) /
 (double) ((int) HIWORD((vd->ViewportExt)))
 + HIWORD((vd->WindowOrg));
 p++;
 }
 return (TRUE);
 }
 else return (FALSE);
 }
/* --------------------------------------------------------------------- */
 LPVIEWWINDATA GetViewportWindowParms (HDC hDC)
 /* Gets the origins and extents of the viewport and window definitions
 ** specified for the Device Context whose handle is passed in hDC. Also
 ** stores the origin of the Device Context. All results are stored in a
 ** structure of type, VIEWWINDATA, whose address is returned. A NULL
 ** handle passed to the routine returns a NULL pointer. */
 {
 static VIEWWINDATA vwd;
 if (hDC != NULL)
 {
 vwd.DCOrg = GetDCOrg(hDC);
 vwd.ViewportExt = GetViewportExt(hDC);
 vwd.ViewportOrg = GetViewportOrg(hDC);
 vwd.WindowExt = GetWindowExt(hDC);
 vwd.WindowOrg = GetWindowOrg(hDC);
 return ( (LPVIEWWINDATA) &vwd);
 }
 else
 return (NULL);
 }
/* --------------------------------------------------------------------- */

























November, 1992
DYNAMIC DIALOG BOXES AND C++


Making complex objects accessible to Windows applications




Robert Sardis


Bob is a professional C++ programmer with a PhD in mathematics and experience
in business and manufacturing applications. He can be reached at 20 E. Scott,
Chicago, IL 60610.


Recently, I had to write a Windows front end for a collection of database
tables, with records consisting of varying numbers of strings of varying
lengths. A dialog box was to be used to edit selected records of these tables.
Because the record specification was not constant, and in fact could only be
determined while the program was running, the dialog box had to be dynamic;
that is, it had to create itself at run time, rather than be created from a
resource file.
My solution starts with a published example of programming a dynamic dialog
box in C, and consists of two C++ classes: DialogTemplate, a class for a
Windows dialog-box template, and StringsBox, a class for a dialog box used to
edit strings, as required by my application. The result demonstrates how even
a complicated Windows object, once incorporated into a C++ class, can be
easily used by an application program. It also demonstrates a new technique
for associating an instance of a predefined Windows class--in this case a
dialog box--with a C++ object.


The Dialog Template Class


To create a dialog box, it is first necessary to allocate a block of memory
called a "dialog template," described in the Microsoft Windows Programmer's
Reference (Microsoft Press, 1990) on pages 7-31 through 7-35, and to fill it
with information defining the dialog box and its controls. Once the dialog
template is prepared, a handle to it is passed to DialogBoxIndirect() or
DialogBoxIndirectParam(), which realizes the information in the dialog
template as a modal dialog box. In most applications, the two steps of
preparing the dialog template and realizing the dialog box occur together in a
single call to DialogBox(), which creates a dialog box specified as a
resource. To create a dialog box dynamically, the application must follow
these two steps explicitly; Jeffrey Richter shows how to do this in a C
program in his Windows 3: A Developer's Guide (M&T Books, 1991) on pages
159-170.
The DialogTemplate class in Listing One(page 122) is a straightforward
adaptation of Richter's technique, although the added power of C++ makes the
process somewhat simpler. For example, Richter temporarily stores the size of
the dialog template inside the template itself; with C++, it is more natural
to define a field nbytes inside the DialogTemplate class instead.
The dialog template is built in steps, with one call to the class constructor,
one call to SpecifyDialogBox(), and, optionally, one call to SpecifyFont();
omitting this last step causes the dialog box to use the system font.
AddItem() is then called once for each control to be added. Finally, the
dialog box is realized with a call to DialogTemplate:: DialogBoxIndirect() or
DialogTemplate:: DialogBoxIndirectParam().
A pointer to a DialogTemplate object is included in the class definition of
StringsBox, discussed next.


The StringsBox Class


Listing Two(page 122) contains StringsBox, a class used by the example program
in Listing Three (page 124) to create the dialog box. To create the dialog
box, an application first fills in an array of StringSpec structures to define
the title and maximum length of each string. It then passes this array and the
number of its elements to SetUp(), which creates a dialog template for the
dialog box. Calling GetStrings() displays the dialog box.
StringsBox stores the strings for its edit windows contiguously in a block of
memory; the public field hStrings is a handle to this block, which the
application uses to initialize or access the strings. To display the dialog
box with edit windows initialized, the application locks hStrings, fills the
block, unlocks hStrings, and calls GetStrings() with parameter InitializedFlag
set to True; to display the box with edit windows initially blank, the
application calls GetStrings() with InitializedFlag set to False. GetStrings()
returns True if the user selects OK and False if the user selects "Cancel;" on
the return of True, the application can access the user's string values by
locking hStrings, reading the block, and unlocking hStrings again. (Note that
the hStrings block is allocated by StringsBox rather than by the application,
and that its contents are only changed when the user selects OK or when
altered by the application; successive calls to GetStrings() with
InitializedFlag set to True will continue to display the dialog box with edit
windows as last set by the user.)


Two Friend Functions and a Linked List


A universal problem in writing Windows applications in C++ is that, since
Windows is written in C, a Windows procedure can be at best a friend or static
member of a C++ class, rather than a true member. This means that, during run
time, an instance of a Windows procedure does not automatically possess a this
pointer to its corresponding C++ object. In the case of StringsBox, the dialog
procedure StringsBox_DlgProc needs a pointer to the proper StringsBox object
in order to access the strings in its hStrings block.
For Windows classes defined from scratch, the standard solution is to allocate
enough extra bytes in the Window class to hold the this pointer. However, this
technique will not work for predefined Windows classes, such as a dialog box.
StringsBox uses a different technique of a linked list of pointers to
StringsBox objects with its first element embedded in the class as a static
member; being static means that there will be one copy of this element, shared
by all instances of the StringsBox class. Each object contains two additional,
nonstatic pointers to the objects preceding and succeeding it in the list. The
class constructor and destructor manage the list by resetting these pointers
appropriately.
The lookup field for the linked list, the handle hDlg, can only be determined
once the dialog box is displayed. To do this,
DialogTemplate::DialogBox-IndirectParam() is sent the StringsBox object's this
pointer as its last parameter, which allows the dialog-box procedure
StringsBox_DlgProc() to recover it (but only during WM_INITDIALOG).
StringsBox_DlgProc() then sets the hDlg field of the object referenced by this
pointer. The lookup function GetStringsBox() is provided to let
StringsBox_DlgProc() find the this pointer on subsequent calls;
GetStringsBox() is also a friend rather than a member function, since it is
called from a nonmember function.


Example Program


Listing Three contains dboxdemo.cpp, a simple program that demonstrates the
StringsBox class. This program declares a global StringsBox object EditBox,
which it displays when the user double-clicks on the main window.
StringsBox::GetStrings() is called with the InitializedFlag set to True, so
the contents of the dialog box's edit controls are preserved between calls.
These strings are also displayed in the main window by the WM_PAINT command to
demonstrate how the application accesses the contents of the StringsBox's
hStrings block.


Conclusion


The two classes and example presented here demonstrate how C++ can make
complex objects easily accessible from a Windows application; and the
technique of embedding a linked list in the StringsBox class shows how to make
the predefined Windows class for a dialog box available to a C++ object.
All of the code can be made more object oriented. The dboxdemo program in
particular looks more like a C program than a C++ program; this is mainly to
show how a library of C++ classes can add power even to a traditional-style
Windows program.
It would be natural to expand the StringsBox class into a collection of many
customized or dynamic dialogbox classes. In this case, it would be appropriate
to embed the linked-list members in a base class, and make StringsBox and the
other dialog-box classes into derived classes of this base class. However, it
is probably not a good idea to include the dialog template in the base class.
Leaving DialogTemplate as a separate class and having StringsBox contain a
pointer to a DialogTemplate object allows a constructor of the form
StringsBox(const StringsBox &x), which would let several StringsBox objects
use the same dialog template.
DialogTemplate and StringsBox were written with the future possibility of a
DLL in mind, so all pointers passed to public member functions are far. The
second parameter of StringsBox::Setup() is a far pointer to a structure
containing a near pointer as one of its fields; the near pointer is converted
to a far pointer by the MakeFarPointer macro, defined at the top ofListing
Two.
_DYNAMIC DIALOG BOXES IN C++_

by Robert Sardis


[LISTING ONE]

/***** DialogTemplate class ******/

#include <windows.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <mem.h>

int ErrorMessage(LPSTR);
class DialogTemplate
{
 public:
 DialogTemplate(void);
 ~DialogTemplate(void);
 void SpecifyDlgBox(long Style, int x, int y, int width, int height,
 LPSTR MenuName, LPSTR ClassName, LPSTR CaptionText);
 void SpecifyFont(short int PointSize, LPSTR TypeFace);
 void AddItem(int x, int y, int width, int height, int ID, long Style,
 LPSTR Class, LPSTR Text, BYTE DataBytes, LPBYTE Data);
 int DialogBoxIndirect(HANDLE hInstance, HWND hWndParent,
 FARPROC lpDialogProc);
 int DialogBoxIndirectParam(HANDLE hInstance, HWND hWndParent,
 FARPROC lpDialogProc, DWORD dwInitParam);
 private:
 HANDLE hMem;
 int nBytes;
 int nItems;
};
typedef struct
{
 long Style;
 BYTE nItems;
 int x;
 int y;
 int width;
 int height;
 // char MenuName[];
 // char ClassName[];
 // char CaptionText[];
} DLGTEMPLATE, FAR *LPDLGTEMPLATE;
typedef struct
{
 short int PointSize;
 // char TypeFace[];
} FONTINFO, FAR *LPFONTINFO;
typedef struct
{
 int x;
 int y;
 int width;
 int height;
 int ID;
 long Style;
 // char Class[];

 // char Text[];
 // BYTE Info;
 // PTR Data;
} DLGITEMTEMPLATE, FAR *LPDLGITEMTEMPLATE;
DialogTemplate::DialogTemplate(void)
{
 nBytes = 0;
 nItems = 0;
 hMem = 0;
}
DialogTemplate::~DialogTemplate(void)
{
 GlobalFree(hMem);
}
void DialogTemplate::SpecifyDlgBox(long Style, int x, int y,
 int width, int height, LPSTR MenuName, LPSTR ClassName, LPSTR CaptionText)
{
 LPDLGTEMPLATE lpDlg;
 LPSTR lpText;
 int MenuNameBytes = lstrlen(MenuName) + 1; // sizes of strings,
 int ClassNameBytes = lstrlen(ClassName) + 1; // including null
 int CaptionTextBytes = lstrlen(CaptionText) + 1; // terminator

 nBytes = sizeof(DLGTEMPLATE) + MenuNameBytes + ClassNameBytes +
 CaptionTextBytes;

 nItems = 0;
 GlobalFree(hMem);
 hMem = GlobalAlloc(GHND, nBytes);
 if (hMem == NULL)
 {
 ErrorMessage("Memory allocation error creating dialog template");
 return;
 }
 lpDlg = (LPDLGTEMPLATE) GlobalLock(hMem); // add the "fixed size"
 lpDlg->Style = Style; // fields of the template
 lpDlg->nItems = 0;
 lpDlg->x = x;
 lpDlg->y = y;
 lpDlg->width = width;
 lpDlg->height = height;

 lpText = ((LPSTR) lpDlg) + sizeof (DLGTEMPLATE); // append the three
 _fmemcpy(lpText, MenuName, MenuNameBytes); // null-terminated text
 lpText += MenuNameBytes; // strings
 _fmemcpy(lpText, ClassName, ClassNameBytes);
 lpText += ClassNameBytes;
 _fmemcpy(lpText, CaptionText, CaptionTextBytes);
 GlobalUnlock(hMem);
}
void DialogTemplate::SpecifyFont(short int PointSize, LPSTR TypeFace)
{
 LPDLGTEMPLATE lpDlg;
 LPFONTINFO lpFont;
 LPSTR lpText;
 int OldnBytes = nBytes;
 int TypeFaceBytes = lstrlen(TypeFace) + 1;

 nBytes += sizeof(FONTINFO) + TypeFaceBytes;

 hMem = GlobalReAlloc(hMem, nBytes, GHND);
 if (hMem == NULL)
 {
 ErrorMessage("Memory allocation error adding dialog font");
 return;
 }
 // add DS_SETFONT to style to indicate font template is being added
 lpDlg = (LPDLGTEMPLATE) GlobalLock(hMem);
 lpDlg->Style = DS_SETFONT;
 // append font template to dialog template
 lpFont = (LPFONTINFO) (((LPSTR) lpDlg) + OldnBytes);
 lpFont->PointSize = PointSize; // append fixed-size field of font info
 lpText = ((LPSTR) lpFont) + sizeof(LPFONTINFO); // append null-termi-
 _fmemcpy(lpText, TypeFace, TypeFaceBytes); // nated text string
 GlobalUnlock(hMem);
}
void DialogTemplate::AddItem(int x, int y, int width, int height,
 int ID, long Style, LPSTR Class, LPSTR Text, BYTE DataBytes, LPBYTE Data)
{
 LPDLGTEMPLATE lpDlg;
 LPDLGITEMTEMPLATE lpItem;
 LPSTR lpText;
 int OldnBytes = nBytes;
 int ClassBytes = lstrlen(Class) + 1;
 int TextBytes = lstrlen(Text) + 1;

 nBytes += sizeof(DLGITEMTEMPLATE) + ClassBytes + TextBytes + sizeof(BYTE)
 + DataBytes;
 hMem = GlobalReAlloc(hMem, nBytes, GHND);
 if (hMem == NULL)
 {
 ErrorMessage("Memory allocation error adding dialog item");
 return;
 }
 nItems++;
 lpDlg = (LPDLGTEMPLATE) GlobalLock(hMem);
 lpDlg->nItems = nItems; // update # items
 // append item template to template block
 lpItem = (LPDLGITEMTEMPLATE) (((LPSTR) lpDlg) + OldnBytes);
 lpItem->x = x; // append fixed-size
 lpItem->y = y; // fields
 lpItem->width = width;
 lpItem->height = height;
 lpItem->ID = ID;
 lpItem->Style = Style WS_CHILD WS_VISIBLE;

 lpText = ((LPSTR) lpItem )+ sizeof(DLGITEMTEMPLATE); // append variable
 _fmemcpy(lpText, Class, ClassBytes); // length portion:
 lpText += ClassBytes; // two strings,
 _fmemcpy(lpText, Text, TextBytes); // one byte, and a
 lpText += TextBytes; // data block.
 *lpText = DataBytes;
 lpText += sizeof(BYTE);
 _fmemcpy(lpText, Data, DataBytes);

 GlobalUnlock(hMem);
}
int DialogTemplate::DialogBoxIndirect(HANDLE hInstance, HWND hWndParent,
 FARPROC lpDialogProc)

{
 return ::DialogBoxIndirect(hInstance, hMem, hWndParent, lpDialogProc);
}
int DialogTemplate::DialogBoxIndirectParam(HANDLE hInstance, HWND hWndParent,
 FARPROC lpDialogProc, DWORD dwInitParam)
{
 return ::DialogBoxIndirectParam(hInstance, hMem, hWndParent,
 (FARPROC) lpDialogProc, dwInitParam);
}




[LISTING TWO]

/****** StringsBox class ******/

#include <windows.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <mem.h>
#include <math.h>

int ErrorMessage(LPSTR);
#define max(A,B) ((A) > (B) ? A : B)

#define MakeFarPointer(N,F) \
 (sizeof(PSTR)==sizeof(LPSTR) ? (LONG)N : MAKELONG((WORD)N,HIWORD((LONG)F)))
 // macro to convert near pointer "N" to far -- "F" is a
 // reference far pointer, in the same segment as N. Used for DLLs
struct StringSpec
{
 char *Title;
 int MaxLength;
};
class StringsBox
{
 public:
 StringsBox(void);
 ~StringsBox(void);
 void SetUp(LPSTR Caption, struct StringSpec far *Items, int NumItems);
 int GetStrings(HANDLE hInstance, HWND hwnd, BOOL InitializedFlag);
 friend BOOL FAR PASCAL _export
 StringsBox_DlgProc(HWND hDlg, WORD msg, WORD wParam, LONG lParam);
 HANDLE hStrings;
 private:
 DialogTemplate *DT;
 int nItems;
 int *ItemLengths;
 static StringsBox *FirstBox; // linked list parameters so the
 StringsBox *PrevBox; // function DlgProc() can
 StringsBox *NextBox; // determine the StringsBox
 HWND hDlg; // corresponding to the handle hDlg
 friend StringsBox *GetStringsBox(HWND hDlg);
 BOOL InitFlag;
};
StringsBox *StringsBox::FirstBox = NULL; // initialize linked list
StringsBox::StringsBox(void)

{
 DT = new DialogTemplate;
 ItemLengths = NULL;
 nItems = 0;
 hStrings = 0;
 hDlg = 0;
 if (FirstBox == NULL) // insert new object in the linked list--
 { // either at the beginning, if the list
 FirstBox = this; // is empty ...
 PrevBox = NULL;
 NextBox = NULL;
 }
 else // or else at the end, after the first
 { // StringsBox whose NextBox pointer is
 StringsBox *pBox; // NULL
 for (pBox = FirstBox; pBox->NextBox != NULL; pBox = pBox->NextBox);
 PrevBox = pBox;
 pBox->NextBox = this;
 NextBox = NULL;
 }
}
StringsBox::~StringsBox(void)
{
 delete DT;
 GlobalDiscard(hStrings);
 delete ItemLengths;
 if (this == FirstBox) // take object out of linked list
 FirstBox = NextBox;
 if (PrevBox != NULL)
 PrevBox->NextBox = NextBox;
 if (NextBox != NULL)
 NextBox->PrevBox = PrevBox;
}
#define LINE_HEIGHT 2*cy
#define LINE_TEXT_HEIGHT 1.5*cy
#define HSPACE 2*cx
#define IDD_ITEM(A) A+101
void StringsBox::SetUp(LPSTR Caption,struct StringSpec far *Items,int
NumItems)
{
 int i, y;
 LPSTR lpStringsBlock;
 int MaxTitleWidth = 0;
 int MaxEditWidth = 0;
 int hStringSize = 0;
 int cx = 4; // character average width and height,
 int cy = 8; // in dialog box units
 // get max dimensions of titles and edit windows -- hdc needed for call to
 // GetTextExtent(), tm needed to convert return value of GetTextExtent()
 // from logical units to dialog box units
 HDC hdc = CreateDC("DISPLAY", NULL, NULL, NULL);
 TEXTMETRIC tm;
 HFONT hFont = GetStockObject(SYSTEM_FONT);
 SelectObject(hdc, hFont);
 GetTextMetrics(hdc, &tm);

 for (i = 0; i < NumItems; i++)
 {
 MaxTitleWidth = max(MaxTitleWidth, LOWORD(GetTextExtent(hdc,
 (LPSTR)MakeFarPointer(Items[i].Title, Items),

 lstrlen((LPSTR)MakeFarPointer(Items[i].Title, Items)))));
 MaxEditWidth = max(MaxEditWidth, cx*(Items[i].MaxLength+1));
 }
 // convert MaxTitleWidth from logical units to dialog box units:
 // multiply by ratio of (ave char width in dialog box units) to
 // (ave char width in logical units), and round up to next integer
 MaxTitleWidth = ceil(((double)MaxTitleWidth * (double)cx) /
 (double)tm.tmAveCharWidth);
 DeleteDC(hdc);
 // calculate locations of controls
 int ItemTitleX = HSPACE;
 int ItemEditX = ItemTitleX + MaxTitleWidth + HSPACE;
 int FirstItemY = LINE_HEIGHT;
 int ButtonWidth = 10*cx;
 int ButtonY = FirstItemY + NumItems*LINE_HEIGHT + LINE_HEIGHT;

 int BoxX = 1;
 int BoxY = 1;
 int BoxWidth =
 max(MaxTitleWidth + MaxEditWidth + HSPACE + 2*HSPACE,
 2*ButtonWidth + 4*HSPACE);
 int BoxHeight = ButtonY + LINE_HEIGHT;
 int CenterX = BoxWidth / 2;
 // set up dialog template
 DT->SpecifyDlgBox(WS_CAPTION WS_SYSMENU WS_VISIBLE WS_POPUP,
 BoxX, BoxY, BoxWidth, BoxHeight, "", "", Caption);
 // DT->SpecifyFont() not called -- use default font
 for (i = 0; i < NumItems; i++)
 {
 y = FirstItemY + i*LINE_HEIGHT;
 // Item title
 DT->AddItem(ItemTitleX, y, MaxTitleWidth, LINE_TEXT_HEIGHT,
 -1, SS_LEFT WS_GROUP, "STATIC",
 (LPSTR) MakeFarPointer(Items[i].Title, Items), 0, NULL);
 // Item edit
 DT->AddItem(ItemEditX, y, cx*(Items[i].MaxLength+1), LINE_TEXT_HEIGHT,
 IDD_ITEM(i), ES_LEFT ES_AUTOHSCROLL WS_BORDER WS_TABSTOP, "EDIT",
 "", 0, NULL);
 }
 // 'OK' button
 DT->AddItem(CenterX - ButtonWidth - HSPACE, ButtonY, ButtonWidth,
 LINE_TEXT_HEIGHT, IDOK, BS_DEFPUSHBUTTON WS_TABSTOP WS_GROUP,
 "BUTTON", "OK", 0, NULL);
 // 'CANCEL' button
 DT->AddItem(CenterX + HSPACE, ButtonY, ButtonWidth,
 LINE_TEXT_HEIGHT, IDCANCEL, BS_PUSHBUTTON WS_TABSTOP WS_GROUP,
 "BUTTON", "Cancel", 0, NULL);
 // set class parameters
 nItems = NumItems;
 if (ItemLengths != NULL)
 delete ItemLengths;
 ItemLengths = new int[nItems];
 for (i = 0; i < nItems; i++)
 hStringSize += (ItemLengths[i] = Items[i].MaxLength);
 // allocate hStrings block and initialize it to nulls
 if (hStrings == 0)
 hStrings = GlobalAlloc(GHND, hStringSize);
 else
 hStrings = GlobalReAlloc(hStrings, hStringSize, GHND);

 lpStringsBlock = GlobalLock(hStrings);
 _fmemset(lpStringsBlock, '\0', hStringSize);
 GlobalUnlock(hStrings);
}
int StringsBox::GetStrings(HANDLE hInstance, HWND hwnd, BOOL InitializedFlag)
{
 FARPROC lpDialogProc;
 int RetVal;
 InitFlag = InitializedFlag;

 lpDialogProc = MakeProcInstance((FARPROC) StringsBox_DlgProc, hInstance);
 // pass "this" pointer so dialog box procedure
 // can recover it during WM_INITDIALOG
 RetVal = DT->DialogBoxIndirectParam(hInstance, hwnd, (FARPROC)lpDialogProc,
 (DWORD) this);
 FreeProcInstance(lpDialogProc);
 return RetVal;
}
BOOL FAR PASCAL _export
StringsBox_DlgProc(HWND hDlg, WORD msg, WORD wParam, LONG lParam)
{
 int i;
 LPSTR lpText;
 // find "this" pointer of corresponding StringsBox object --
 // on WM_INITDIALOG, it is passed as lParam; on subsequent
 // calls, use GetStringsBox() to look it up in the linked list
 StringsBox *pBox =
 (msg == WM_INITDIALOG ? (StringsBox *) lParam : GetStringsBox(hDlg));
 switch(msg)
 {
 case WM_INITDIALOG:
 pBox->hDlg = hDlg; // insert "this" pointer in linked list -- set
 // Box->hDlg so GetStringsBox() can find it on
 // subsequent calls from this dialog procedure
 if (pBox->InitFlag)
 {
 lpText = GlobalLock(pBox->hStrings);
 for (i = 0; i < pBox->nItems; i++)
 {
 SetDlgItemText(hDlg, IDD_ITEM(i), lpText);
 lpText += pBox->ItemLengths[i];
 }
 GlobalUnlock(pBox->hStrings);
 }
 return TRUE;
 case WM_COMMAND:
 switch(wParam)
 {
 case IDOK:
 lpText = GlobalLock(pBox->hStrings);
 for (i = 0; i < pBox->nItems; i++)
 {
 GetDlgItemText(hDlg, IDD_ITEM(i), lpText,
 pBox->ItemLengths[i]);
 lpText += pBox->ItemLengths[i];
 }
 GlobalUnlock(pBox->hStrings);
 pBox->hDlg = 0; // set pBox->hDlg back to
 // an invalid value

 EndDialog(hDlg, TRUE);
 return TRUE;
 case IDCANCEL:
 pBox->hDlg = 0;
 EndDialog(hDlg, FALSE);
 return TRUE;
 }
 break;
 }
 return FALSE;
}
StringsBox * GetStringsBox(HWND hDlg)
{
 StringsBox *pBox;
 for (pBox = StringsBox::FirstBox; pBox != NULL; pBox = pBox->NextBox)
 {
 if (pBox->hDlg == hDlg)
 return pBox;
 }
 return NULL;
}



[LISTING THREE]

/***** dboxdemo.cpp -- C++ dynamic dialog box example *****/

#include <windows.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include "wbdialog.hpp" // header file containing DialogTemplate and
 // StringsBox class definitions
StringsBox EditBox;
struct StringSpec StringFieldSpec[] = // field spec for EditBox
{
 "Name", 20,
 "Address", 30,
 "Telephone", 15,
};
int NumStrings = sizeof(StringFieldSpec) / sizeof(struct StringSpec);
struct
{
 HANDLE hInstance;
} InsGlobs;
struct
{
 HWND hwnd;
 short cxChar;
 short cyChar;
} WndGlobs;
long FAR PASCAL _export WndProc(HWND, WORD, WORD, LONG);
int ErrorMessage(char *msg);
char szAppName[] = "dboxdemo";
int PASCAL WinMain(HANDLE hInstance, HANDLE hPrevInstance,
 LPSTR lpszCmdLine, int nCmdShow)
{
 HWND hwnd;

 MSG msg;
 WNDCLASS wndclass;
 if(!hPrevInstance)
 {
 wndclass.style = CS_HREDRAW CS_VREDRAW CS_DBLCLKS;
 wndclass.lpfnWndProc = WndProc;
 wndclass.cbClsExtra = 0;
 wndclass.cbWndExtra = 0;
 wndclass.hInstance = hInstance;
 wndclass.hIcon = LoadIcon(NULL, IDI_APPLICATION);
 wndclass.hCursor = LoadCursor(NULL, IDC_ARROW);
 wndclass.hbrBackground = GetStockObject(WHITE_BRUSH);
 wndclass.lpszMenuName = szAppName;
 wndclass.lpszClassName = szAppName;
 RegisterClass(&wndclass);
 }
 hwnd = CreateWindow(szAppName, "C++ Dynamic Dialog Box Demo",
 WS_OVERLAPPEDWINDOW WS_CLIPCHILDREN,
 CW_USEDEFAULT, CW_USEDEFAULT,
 CW_USEDEFAULT, CW_USEDEFAULT,
 NULL, NULL, hInstance, NULL);
 WndGlobs.hwnd = hwnd;
 InsGlobs.hInstance = hInstance;
 ShowWindow(WndGlobs.hwnd, nCmdShow);
 UpdateWindow(WndGlobs.hwnd);
 while (GetMessage(&msg, NULL, 0, 0))
 {
 TranslateMessage(&msg);
 DispatchMessage(&msg);
 }
 return msg.wParam;
}
long FAR PASCAL _export WndProc(HWND hwnd, WORD message, WORD wParam,
 LONG lParam)
{
 HDC hdc;
 TEXTMETRIC tm;
 PAINTSTRUCT ps;
 LPSTR lpString;
 int i;
 switch (message)
 {
 case WM_CREATE:
 hdc = GetDC(hwnd);
 GetTextMetrics(hdc, &tm);
 WndGlobs.cxChar = tm.tmAveCharWidth;
 WndGlobs.cyChar = tm.tmHeight + tm.tmExternalLeading;
 ReleaseDC(hwnd, hdc);
 EditBox.SetUp("Edit Fields", StringFieldSpec, NumStrings);
 return 0;
 case WM_LBUTTONDBLCLK: // display dialog box on double-click
 // StringsBox::GetStrings() returns TRUE if
 // user selects OK button; in this case,
 // cause window to be repainted to display
 // latest contents of EditBox.hStrings block
 if ((EditBox.GetStrings(InsGlobs.hInstance, WndGlobs.hwnd, TRUE)
 == TRUE))
 InvalidateRect(WndGlobs.hwnd, NULL, TRUE);
 return 0;

 case WM_PAINT:
 hdc = BeginPaint(WndGlobs.hwnd, &ps);
 if (EditBox.hStrings != 0)
 {
 lpString = GlobalLock(EditBox.hStrings);
 for (i = 0; i < NumStrings; i++)
 {
 TextOut(hdc, 0, i*WndGlobs.cyChar, lpString,
 _fstrlen(lpString));
 lpString += StringFieldSpec[i].MaxLength;
 }
 GlobalUnlock(EditBox.hStrings);
 }
 EndPaint(WndGlobs.hwnd, &ps);
 return 0;
 case WM_DESTROY:
 PostQuitMessage(0);
 return 0;
 }
 return DefWindowProc(hwnd, message, wParam, lParam);
}
int ErrorMessage(LPSTR msg)
{
 MessageBox(WndGlobs.hwnd, msg, szAppName, MB_ICONINFORMATIONMB_OK);
 return 0;
}




































November, 1992
GARBAGE COLLECTION FOR C PROGRAMS


There's no need to explicitly free memory


This article contains the following executables: GARBAGE.ARC


Giuliano Carlini and Susan Rendina


Giuliano and Susan, the authors of Alloc-gc, run the Codewright's Toolworks,
where they can be contacted at 310-514-3151.


Garbage collection liberates you from needing to explicitly free memory. This
leads to faster development cycles, cleaner code, and (hopefully) fewer bugs.
Garbage collection has been used for years by languages that depend on
interpreters (Lisp and Smalltalk), specialized hardware (the Lisp machine), or
a carefully controlled runtime environment (Algol-68). However, "conservative"
collection techniques make it possible for you to use garbage collection even
when the environment does not provide support.
Programs that allocate blocks of memory must return unused blocks so that they
may be used again. Most often, they do so by explicitly deallocating blocks
with calls to a deallocation routine. An alternative is implicit deallocation
using a garbage collector. In this case, a block is implicitly available to be
returned whenever there are no references to it. From time to time, a garbage
collector scans memory, looking for unreferenced blocks and returning them.
The primary impediment to using garbage collection is that most garbage
collectors require help from the programming language, operating system,
and/or runtime system. Many environments don't (or can't) provide such help.
Conservative garbage collectors, however, require no help. The allocator of an
existing program may be replaced with a conservative collector, usually with
no other changes required.
Another reason for the infrequent use of garbage collectors is the mistaken
belief that they are too slow. While once true, this is no longer the case.
The very best garbage collectors use about 3 to 5 percent of the CPU.
Conservative collectors use more of the CPU, but are still reasonably
efficient.
The design and code of explicitly deallocating programs are convoluted by the
need to deallocate blocks both under normal conditions and when erroneous or
unusual conditions happen. The design of a garbage-collected program is
simple, and the code is clear. The result is an easier-to-understand program
which takes less time to design, code, and debug. The program is more likely
to be correct, since fewer errors will have been made, debugging them will
have been easier, and formal proofs (if used) are easier to construct. Lastly,
maintenance will be easier.
The basic elements of this garbage-collecting replacement for C's malloc()
will run on 80x86s under DOS. The code, which has been compiled with Microsoft
C 5.1, works only with small model programs. However, we do discuss how to
extend it to work with large-model programs.


Theory


An allocator satisfies dynamic requests for memory by returning the address of
a block of memory, which can be used by the client to store data. A
deallocator returns a block of memory to the allocator, which may reuse the
memory to satisfy a later request. A garbage collector determines which
previously allocated memory blocks are still in use and returns the rest to
the allocator.
There are two principal methods of garbage collection: Mark and Sweep, and
Stop and Copy. Both begin collecting by starting at locations known to contain
references to allocated blocks. These locations are called "roots," and
usually include the stack and hardware registers.
Mark and Sweep examines the roots. When it finds a reference to an allocated
block, it marks the block as referenced. If the block was unmarked, the block
is recursively examined for references. When all referenced blocks have been
marked, a linear scan of all allocated memory is made, sweeping unreferenced
blocks into the allocator's free list(s).
Stop and Copy compacts memory by copying referenced blocks to lower memory
locations that were occupied by unreferenced blocks. It then updates
references to point to the new locations for the allocated blocks.
Nonconservative garbage collectors receive help in recognizing a value in
memory as a reference to a block. Often this is done by tagging values. Some
subset of the bits in a value indicates the type of the value. For example,
the low-order two bits might be used for the tag, with the value 00
representing a memory reference, 01 an integer, and 02 a float.
A conservative garbage collector requires no such help. Rather, it slithers
through memory looking at values, operating on the "conservative" assumption
that if a value, when interpreted as an address, refers to an allocated block,
then the block must be treated as referenced and not be collected.
Occasionally, random values may be incorrectly interpreted as a block
reference. In these cases, an unreferenced block is not collected. An example
may be helpful. On the PC, a C integer is two bytes, and a long address is two
bytes of segment plus two bytes of offset. Suppose two successive integers in
memory contain the following as their values: an integer which happens to be a
valid segment, and an integer which happens to be the offset within the
segment of an allocated block. When the garbage collector examines memory, the
two integers appear to be a reference to the block, and so the block is
treated as referenced. Such cases of mistaken identity are rare, and their
effects usually innocuous: A limited amount of memory is not reclaimed.
Sometimes the effects are more severe, but generally there are workarounds.
Usually, however, as the misidentified value changes, subsequent collections
will pick up the previously uncollected block.
Because random values are being examined, the conservative collector must be
careful. It has the following constraints:
The garbage collector must not change a value. It may not actually be a
pointer, but instead a value of some other type. This constraint eliminates
Stop and Copy as the basis of conservative garbage collection.
Before treating the value as a pointer, the garbage collector must validate
the value. On 80x86s this means verifying that the segment of a far pointer is
valid, and that the offset corresponds to a previously allocated block. If the
segment check isn't done, we might generate an address error in a
protected-mode program.
Until the value is validated, the garbage collector may only change its own
data, and not, for example, the data at the value interpreted as an address.
The offset check implies that the allocator's clients must also be careful.
They must keep a pointer to the block's start address until they are done with
the block. Example 1, for instance, could lead to disaster. If a garbage
collection occurs during process(), there is no reference to the start of the
allocated block, and so the collector will move the block to the allocator's
free list. If process() allocates memory, the block may end up being
reallocated. Very likely the strain would be too much for process(), which
would soon die.
Example 1: A pointer to the front of the block must be maintained.

 for ( p = malloc(Nbr0fBytes); p < p + Nbr0fItems; p++ )
 {
 process ( p );
 }



Design


In designing the garbage-collection library presented here, we've followed an
object-oriented design philosophy. The two principal classes are the
allocation segment and garbage collector. The segment class knows about how to
manage memory blocks in an 80x86 segment. Since the code runs only under small
model, there is only a single-segment object. This segment class must
therefore be aware that global data and the stack are also located within the
segment. The garbage collector is responsible for marking all blocks reachable
from the root. The garbage collector knows nothing about the internals of an
allocation segment. The allocation segment provides routines to: validate that
a value represents a valid block; return the length of a block; and sweep all
unused blocks within it, so they can be reused.
Other major parts of the library are the replacements for malloc() and its
related routines. The malloc() function, see Listing Five, Page 129, asks the
segment for a block. If there is no free block large enough, the segment will
return null; malloc() will subsequently request the garbage collector to run,
then again ask the segment for a block. If a large enough unused block now
exists, the segment will return some part of it. If not, it will again return
NULL, which malloc() will in turn return to its client.
In the segment, memory is allocated in units of eight bytes. In addition to
the memory used for blocks to return to clients, free list heads and a flag
vector are stored at the end of the segment. Each element in the free
list-head array points to the first free block of a particular size. The flag
vector maintains two flags per unit: allocated block start and referenced.
Allocated is set True for the unit which starts a block returned by the
allocator. Allocated is cleared if the block starting at the unit is swept up.
Referenced is set True if allocated is True, and the garbage collector asks
the segment to validate a value which points to the unit. When sweeping, the
segment looks for allocated blocks with clear referenced flags. Rather than
free each such block in turn, the allocator merges contiguous free and
unreferenced blocks, and then places the merged blocks onto their free lists.
The allocation segment uses a simple buddy-system allocator. All blocks --
both allocated and free -- are a power of two in size. A separate free list is
kept for each block size. At startup, the free space in the segment is broken
up into blocks of the appropriate size, and these are placed onto the free
lists. To allocate we find the first free block of size greater or equal to
that requested. We then split off any unneeded space from the end of this
block, break it up into power-of-two sized pieces, and attach these to the
free lists.


Code Discussion



The garbage collector's header files do the following: array.h defines some
useful macros for dealing with arrays; bool.h defines an enumeration to
represent Booleans rather than the usual usage of macros for True and False;
power2.h declares some tables used to quickly perform calculations, the result
or operand of which is a power of two. The input of the function is used as
the array's index. The value of that array element is the function's result.
The input must be in the range 1 to 255. The file pwr2gen.c is the source of a
program which generates power2.h; BumpUp computes the first power of two
greater or equal to its input, and Log2 computes the ceiling of the log base 2
of its input.
The source code that implements the garbage collector consists of Gc.h and
Gc.c ( Listings One and Two, page 128). Calling GcPickup() runs the garbage
collector. It in turn calls ASegClearMarks() to clear marks, GcMark() to set
the mark of used blocks, followed by ASegSweep() to sweep up unused blocks. To
start, GcMark() is called first to mark the global data and then again to mark
the stack. It steps through the segment, passing each value in memory to
ASegMarkValue(). If the value corresponds to a block that should be examined
for pointers, ASegMarkValue() returns the size of that block. In this case,
GcMark() calls itself recursively.
Aseg.h ( Listing Three, page 128) and ASeg.c ( Listing Four, page 128) are the
interface and implementation of allocation segments, respectively. These
functions are all prefixed with ASeg. Most of them are straightforward; only
ASegVegamatic() is tricky. A free block may not be just any power of two. It
must be less than or equal to the greatest power of two that can evenly divide
the start address. The expression Size=FreeOffset & ( -(int)FreeOffset ); sets
"Size" to the maximum size of a block starting at FreeOffset. On a two's
complement machine, -X=!X+ 1--the definition of two's complement. Due to this
definition, X and -X have the same least significant bits up to and including
the first 1 bit. More significant bits are cleared. ASegMarkValue() first
validates that its argument value could be a pointer returned by the
allocator. If it is, and the block pointed to has not been marked, then it
marks it and returns the size of the block. Otherwise it returns 0 for the
size.
Supplemental test files that put the system through its paces, as well as
array.h, bool.h, power2.h, and pwr2gen.c are available electronically; see
page 5.


Enhancements


While the code is limited to running under small model with a single segment,
we've designed the interfaces to support multiple segments so that compact,
large, and mixed-memory model programs can be supported. The major problem is
that when presented with a value, we must validate that its segment portion
corresponds to a valid segment before calling ASegMarkValue(). The easiest way
to do this is to maintain a bit table in Gc.c. When we get a segment from the
operating system, we turn on the corresponding bit in the table. To validate a
value we extract the segment portion and look it up in the table. If the bit
is off, we know the value can't be a pointer and proceed. Otherwise, we pass
the value on to ASegMarkValue() for further validation and processing. Logic
must be added in several other places. We need to allocate and initialize
segments. We need to traverse all allocated segments to clear their marks and
sweep them. Lastly, malloc() becomes more complicated since it needs to have a
policy for deciding when to garbage collect to try to free up some space vs.
allocating another segment to create more space.
We've argued that garbage collecting is usually better than explicit
deallocation -- and it is. Just as you often program in a high-level language
while using assembler to write critical code, you can use garbage collection
in most places, but when necessary, call a deallocator. But to do this, the
garbage collector must supply one. While most deallocators try to coalesce
adjacent blocks, it's probably better to just add the block to its free list
and let the next garbage collection handle the coalescing.
No single allocation strategy is best all the time. Buddy-system allocators
are fast, but they waste memory. Next fit is usually slower, but doesn't waste
as much memory. Best fit is slow, but wastes very little memory. If we support
multiple segments, we can have each support a different allocation scheme.
Then, based on requested block size, or some other hint, malloc() can ask the
segment with the best strategy for a block.
It's sometimes useful for the client program to know when a block is about to
be reclaimed. The garbage collector could do this by calling a client callback
function when it finds an unused block. For example, if the file I/O library
stored the state data for open files in dynamic memory, then I/O clients would
not need to call close. If the client ever let go of all references to a file,
it would eventually be garbage collected. This would trigger a call of the
callback function, which could then close the file.


_GARBAGE COLLECTION FOR C PROGRAMS_
by Giuliano Carlini and Susan Rendina


[LISTING ONE]

/* Garbage Collector - Free memory blocks that aren't referenced. A garbage
collector deduces which memory blocks are in use. When memory runs low, it
reclaims those blocks which are no longer used. That memory can then be reused
to satisfy further requests for memory. */

#ifndef GC_Defn
#define GC_Defn

void GcPickUp( void );
/* Reclaim blocks of unused memory from the segments being watched. */

#endif




[LISTING TWO]

/* GC - Garbage Collector */

#include "aseg.h"
#include "array.h"
#include "bool.h"
#include "gc.h"

#ifndef BitsPerByte
#define BitsPerByte 8
#endif

void GcMark(
 unsigned* Block,
 unsigned Length
) {
 unsigned Bit;
 unsigned Idx;
 unsigned* Last;
 unsigned* Next;

 unsigned Value;
 Last = (unsigned*)((char*)Block + Length ) - 1;
 for ( Next = Block; Next <= Last; Next = (unsigned*)((char*)Next+1) ) {
 Value = *Next;
 Length = ASegMarkValue(
 0,
 (unsigned*)Value
 );
 if ( Length != 0 ) {
 GcMark( (unsigned*)Value, Length );
 }
 }
}
void GcPickUp()
{
 extern unsigned end;
 AllocSeg Seg;
 /* Algorithm: Clear the mark bits; mark blocks reachable from roots
 (the stack and global data); Sweep all watched segments. */
 Seg = 0;
 ASegClearMarks(Seg);
 GcMark( 0, end );
 GcMark(
 (unsigned*)&Seg,
 (unsigned)&Seg - (unsigned)Seg->FirstFreeOfSize[0]
 );
 ASegSweep(Seg);
}




[LISTING THREE]

/* AllocSeg - Segments used for memory allocation. Allocation segments are
80x86 segments used for memory allocation. Clients may request or return
pointers to blocks of memory. */
#ifndef ASeg_Defn
#define ASeg_Defn
#include "stdio.h"
typedef struct AllocSegTg* AllocSeg;
void* ASegAllocBlock(
 AllocSeg Seg,
 unsigned Size
);
/* Returns block of <Length> bytes from <Seg>. Returns NULL if no block. */
void ASegDumpSeg(
 AllocSeg Seg,
 FILE* F
);
/* Write a debugging dump of <Seg> to <F>. */
void ASegInitSeg(
 AllocSeg Seg
);
/* Initialize <Seg> for use. */
/* GARBAGE COLLECTOR INTERFACE: Only a garbage collector should use this. */
void ASegClearMarks(
 AllocSeg Seg
);

/* Clear all marks from <Seg>. */
unsigned ASegMarkValue(
 AllocSeg Seg,
 void* Value
);
/* If <Value> corresponds to a block allocated from <Seg> mark it as being in
use. Return the size of the block (not the actual size, but the size requested
by creator). Returns 0 if Value isn't valid, or if block was already marked.
*/
void ASegSweep(
 AllocSeg Seg
);
/* Sweep all unmarked blocks in <Seg> into <Seg's> free lists. */
/* INTERNALS: An Allocation Segment is an 80x86 segment. The programs global
data and its stack are at the start. The middle is used for client memory
requests. It's tail is used for bookkeeping. A client may request a block of
any size. Internally however, all blocks lengths are a power of two between 8
bytes and 32K bytes. Block lengths are represented by their Log base 2, which
is the number of 0 bits to the right of the 1 bit in the length. */
typedef unsigned UnsignedLog2;
 /* UnsignedLog2 - An unsigned power of 2 represented by it's Log base 2. */
#define BitsPerByte 8
#define SegSize 0x10000L /* 64K byte segments */
#define UnitSize 8 /* Allocations are in units of 8 bytes */
#define UnitMask 0xFFF8
#define UnitsPerSeg (SegSize/UnitSize) /* Nbr of Units/segment */
#define FlagsPerUnit 2 /* 2 flags for each unit */
 #define UnitAlloc 1 /* Unit is the start of an alloc'd block */
 #define UnitMark 2 /* Mark bit for garbage collection */
#define FlagBytesPerSeg (UnitsPerSeg * FlagsPerUnit / BitsPerByte)
#define ASegOverhead FlagBytesPerSeg
#define UnitsPerFlagByte ( BitsPerByte / FlagsPerUnit )
#define BFUnused (FlagBytesPerSeg / UnitSize * FlagsPerUnit / BitsPerByte)
 /* Flag bytes are at tail of the segment; will never be allocated. Flags that
 represent flag bytes are needed. Calculate how much of the tail isn't needed
 for flag bytes. UnitsInFlagBytes = FlagBytesPerSeg/UnitSize = 512;
 BFUnused = UnitsInFlagBytes * FlagsPerUnit/BitsPerByte */
typedef unsigned BlockFlags[ (ASegOverhead - BFUnused) / sizeof(int) ];
#define NbrOfBlockSizes 15
 /* Number of block lengths supported. */
#define ASegPadSize ( BFUnused - NbrOfBlockSizes * sizeof(FreePtr) - 2 )
 /* Number of bytes to pad segment structure to 2 bytes less than 64K. */
typedef struct FreeBlockTg* FreePtr;
 /* Pointer to a free block */
/* Representation of an AllocSeg */
typedef struct AllocSegTg {
 int Well[ (SegSize - ASegOverhead) / sizeof(int) ];
 BlockFlags Flags;
 #define ASegFlagIdx(S, P) ( (unsigned)P / ( UnitSize *
 BitsPerByte * sizeof(S->Flags[0]) / FlagsPerUnit ) )
 #define ASegFlagAddr(S, P) ((unsigned*)&S->
 Flags[ ASegFlagIdx(S, P) ])
 #define ASegFlagShift(S, P) ( ( (unsigned)P / UnitSize ) %
 ( UnitsPerFlagByte * sizeof(S->Flags[0]) ) * FlagsPerUnit)
 #define ASegFlagAllocBits 0x5555
 #define ASegFlagMarkBits 0xAAAA
 #define ASegAllocClr(S, P) \
 *ASegFlagAddr(S, P) &= ~( UnitAlloc << ASegFlagShift(S, P) )
 #define ASegAllocSet(S, P) \
 *ASegFlagAddr(S, P) = ( UnitAlloc << ASegFlagShift(S, P) )

 #define ASegIsAllocSet(S, P) \
 ( *ASegFlagAddr(S, P) & ( UnitAlloc << ASegFlagShift(S, P) ) )
 #define ASegMarkClr(S, P) \
 *ASegFlagAddr(S, P) &= ~( UnitMark << ASegFlagShift(S, P) )
 #define ASegMarkSet(S, P) \
 *ASegFlagAddr(S, P) = ( UnitMark << ASegFlagShift(S, P) )
 #define ASegIsMarkSet(S, P) \
 ( *ASegFlagAddr(S, P) & ( UnitMark << ASegFlagShift(S, P) ) )
 FreePtr FirstFreeOfSize[ NbrOfBlockSizes ];
 char Pad[ ASegPadSize - 1 ];
};
/* FirstFreeOfSize[0] is not used. */
/* Representation of an allocated block */
typedef struct BlockTg* BlockPtr;
typedef struct BlockTg {
 UnsignedLog2 Size; /* The actual size */
 unsigned Used; /* The size requested */
} AllocBlock;
/* Note: Block returned by ASegAllocBlock must have struct BlockTg. */
/* Representation of a free block */
typedef struct FreeBlockTg {
 UnsignedLog2 Size;
 FreePtr Next;
};
#endif




[LISTING FOUR]

/* AllocSeg -- Allocation Segment -- ALGORITHM & DATA STRUCTURES: This is a
buddy system allocator; see KNUTH, Vol 1, pg 442. */

unsigned Junk;

#include "array.h"
#include "aseg.h"
#include "bool.h"
#include "power2.h"
#include "stdio.h"

#ifndef NULL
#define NULL 0
#endif

#define NULL_OFS 0xFFFF
void* ASegAllocBlock(
 AllocSeg Seg,
 unsigned Size
) {
 BlockPtr Block;
 UnsignedLog2 BlockSize;
 FreePtr Buddy;
 FreePtr Free;
 UnsignedLog2 FreeSize;
 unsigned long* ZeroPtr;
 Size += 4;
 /* Add in the space for the block header */

 /* Set BlockSize to the first power of 2 equal or larger to Size */
 if ( Size < 256 ) {
 BlockSize = Log2[ BumpUp[Size] ];
 } else {
 BlockSize = (Size + 255) >> 8;
 BlockSize = Log2[ BumpUp[BlockSize] ] + 8;
 }
 /* Set Free to the first free block of length >= Size */
 /* If there is no such block return NULL */
 Free = NULL;
 FreeSize = BlockSize;
 while (True) {
 if (NbrOfBlockSizes < FreeSize) {
 return NULL;
 }
 Free = Seg->FirstFreeOfSize[FreeSize];
 if (Free != NULL) break;
 FreeSize++;
 }
 Block = (BlockPtr)Free;
 /* Returned block will be split from Free. */
 /* Unlink Free from its list */
 Seg->FirstFreeOfSize[FreeSize] = Free->Next;
 /* Split Free until it is of the requested size. */
 while (FreeSize != BlockSize) {
 FreeSize--;
 Buddy = (FreePtr)( (char*)Free + (1 << FreeSize) );
 Buddy->Size = FreeSize;
 Buddy->Next = Seg->FirstFreeOfSize[FreeSize];
 Seg->FirstFreeOfSize[FreeSize] = Buddy;
 }
 Free->Size = BlockSize;
 ASegAllocSet(Seg, Block);
 Block->Used = Size - 4; /* Subtract off header length that we added */
 /* Zero out memory before returning it */
 for (
 ZeroPtr = (unsigned long*)(Block + 1);
 ZeroPtr < (unsigned long*)( (char*)Block + (1 << BlockSize) );
 ZeroPtr++
 ) {
 *ZeroPtr = 0L;
 }
 return Block + 1;
}
void ASegVegamatic(
 AllocSeg Seg,
 FreePtr First,
 FreePtr End
) {
 FreePtr Free;
 unsigned FreeSize;
 unsigned Size;
 UnsignedLog2 SizeLog2;
 for (
 Free = First;
 Free < End;
 Free = (FreePtr)( (char*)Free + Size )
 ) {
 /* Calculate size for block. */

 FreeSize = (char*)End - (char*)Free;
 Size = MaxPwr2Div((unsigned)Free);
 if ( (unsigned)Free & 0xFF ) {
 SizeLog2 = Log2[Size];
 } else {
 SizeLog2 = Log2[Size>>8];
 SizeLog2 += 8;
 }
 while ( FreeSize < Size ) {
 Size >>= 1;
 SizeLog2--;
 }
 Free->Size = SizeLog2;
 /* Link Free into the free list for blocks of Size */
 Free->Next = Seg->FirstFreeOfSize[SizeLog2];
 Seg->FirstFreeOfSize[SizeLog2] = Free;
 }
}
void ASegInitSeg(
 AllocSeg Seg
) {
 extern unsigned _atopsp; /* Start of heap and stack */
 UnsignedLog2 BlockSize; /* The size of Free */
 FreePtr Free; /* A free block */
 unsigned Idx;
 /* Notes: atopsp is set by C startup to start of stack. The stack
 grows down from there, and the heap (that's us) up. */
 _atopsp = (_atopsp + UnitSize - 1) & UnitMask;
 Seg->FirstFreeOfSize[0] = (FreePtr)_atopsp;
 for (Idx = 1; Idx < ArrayLength(Seg->FirstFreeOfSize); Idx++) {
 Seg->FirstFreeOfSize[Idx] = 0;
 }
 for (Idx = 0; Idx < ArrayLength(Seg->Flags); Idx++) {
 Seg->Flags[Idx] = 0;
 }
 ASegVegamatic( Seg, (FreePtr)_atopsp, (FreePtr)(Seg->Flags) );
}
/* GARBAGE COLLECTING OPERATIONS */
void ASegClearMarks(
 AllocSeg Seg
) {
 int Pos;
 /* Notes: This is faster than traversing chain of blocks in segment, and
 then computing the bit to clear. This executes loop about 2K times; block-
 by-block could take 8K times. */
 for (Pos = 0; Pos <= ArrayLast(Seg->Flags); Pos++) {
 Seg->Flags[Pos] &= ~ASegFlagMarkBits;
 }
}
unsigned ASegMarkValue(
 AllocSeg Seg,
 void* Value
) {
 unsigned* FlagPtr;
 unsigned Shift;
 unsigned* SizePtr;
 if ( (FreePtr)Value < Seg->FirstFreeOfSize[0] ) return 0;
 /* Value represents an address in the global data or stack, and
 couldn't have been returned by the allocator. */

 if ( (unsigned)Value % UnitSize != 4 ) return 0;
 /* Allocator always returns address 4 bytes after unit which starts
 block. If value doesn't start 4 bytes after a Unit, it can't have
 been returned from allocator. */
 FlagPtr = ASegFlagAddr(Seg, Value);
 Shift = ASegFlagShift(Seg, Value);
 if (
 (
 ( (*FlagPtr) >> Shift ) & (UnitAllocUnitMark)
 ) != UnitAlloc
 ) {
 /* Unit is Unallocated or already marked */
 return 0;
 }
 *FlagPtr = UnitMark << Shift;
 /* HACK: Assumption that requested size is 1 word before the value. */
 SizePtr = (unsigned*)Value - 1;
}
 return *SizePtr;
}
void ASegSweep(
 AllocSeg Seg
) {
 FreePtr End;
 FreePtr First;
 UnsignedLog2 SizeLog2;
 FreePtr WellWall;
 WellWall = (FreePtr)&Seg->Well[ ArrayLength(Seg->Well) ];
 /* Zap the free list headers */
 for ( SizeLog2 = 1; SizeLog2 <= NbrOfBlockSizes; SizeLog2++) {
 Seg->FirstFreeOfSize[ SizeLog2 ] = NULL;
 }
 First = (FreePtr)Seg->FirstFreeOfSize[0];
 while ( True ) {
 /* Find the start of a free region */
 while (
 First < WellWall
 /* end the loop when we get past the segments memory well */
 && ASegIsMarkSet(Seg, First)
 /* or when we find an unreferenced block */
 ) {
 SizeLog2 = First->Size;
 First = (FreePtr)( (char*)First + ( 1 << SizeLog2 ) );
 }
 if ( WellWall <= First ) break;
 /* end the loop when we get past the segments memory well */
 ASegAllocClr(Seg, First);
 /* First is unreferenced, but may be allocated. It's about
 to be swept into the free list, so clear it's alloc flag. */
 /* Find the end of the free region */
 SizeLog2 = First->Size;
 End = (FreePtr)( (char*)First + ( 1 << SizeLog2 ) );
 while (
 End < WellWall
 /* end the loop when we get past the segments memory well */
 && ! ASegIsMarkSet(Seg, End)
 /* or when we find a referenced block */
 ) {
 ASegAllocClr(Seg, End);

 /* About to be swept up. Clear alloc flag */
 SizeLog2 = End->Size;
 End = (FreePtr)( (char*)End + ( 1 << SizeLog2 ) );
 }
 ASegVegamatic( Seg, First, End );
 /* Split free region into free blocks and put into free lists */
 First = End;
 /* Set First to End rather than block following End. */
 }
}




[LISTING FIVE]

/* Alloc -- A garbage collecting memory allocator. */
#include "stdio.h"

#include "array.h"
#include "aseg.h"
#include "alloc.h"
#include "gc.h"

void free(
 void* Ptr
) {
 return;
}
void* malloc(
 unsigned Length
) {
 void* Result;
 Result = ASegAllocBlock( 0, Length );
 if ( Result == NULL ) {
 GcPickUp();
 Result = ASegAllocBlock( 0, Length );
 }
 return Result;
}






















November, 1992
DESIGNING C++ CLASSES


C++ behind your back




Steven Sinofsky


Steven is a software-design engineer at Microsoft and project leader for the
Microsoft Foundation Classes C++ Application Framework for Windows. He can be
contacted at One Microsoft Way, Redmond, WA 98052 or via
stevesi@microsoft.com.


C++ is rapidly gaining acceptance as the language of choice among professional
software developers. Unlike the transition we made from Pascal to C about ten
years ago (which was essentially a syntactic change), the transition from C to
C++ poses a unique set of challenges, one of which is the understanding of
what the C++ compiler does behind your back. For instance, to maintain the
consistent semantics of the C++ language, a number of functions are required
for each C++ class. If you do not provide an implementation for each of these
functions, a C++ compiler is required to generate default implementations.
Unfortunately, most compilers generate these functions silently, and often the
default behavior is not adequate for a user-defined class. This article
details the four functions that C++ generates when your program does not
provide definitions for them: default constructor, copy constructor (or copy
initializer), destructor, and assignment operator.
As you know, a constructor for a class is a special function with the same
name as its class that is used to initialize the member variables of a class,
for example CMyClass::CMyClass(). A constructor is called whenever an object
needs to be created (such as for a global variable, a local variable, a
dynamically created variable, or an implicit or explicit temporary object, as
well as when an object is part of another object via membership or
inheritance). A constructor can have any number of formal parameters. The
special case of no formal parameters (or any number of formals, each with a
default value) is called the default constructor. If you do not supply any
user-defined constructors, the compiler will generate a public default
constructor for your class. This constructor does nothing, but it will invoke
the constructors for any base classes and embedded objects. Usually, a
compiler-generated default constructor is harmless. If you supply a
constructor with arguments, and you wish consumers of your class to be able to
create arrays of objects, then you'll need to supply a default constructor.
Along with a constructor, your classes will also get a destructor, a special
function with the same name as your class but prefaced with a tilde (~), for
example CMyClass:: ~CMyClass(). As with the default constructor, the compiler
will generate a default-destructor implementation that will invoke the
destructors for any embedded objects and base classes. If you implement a
destructor, you should reverse the effects of your constructor. The tricky
part about destructors is that your destructor should almost always be made
virtual. If you don't make your destructor virtual, then, as Example 1
illustrates, bad things will happen. In Example 1, the destructor is not
virtual, and as a result, the wrong destructor is invoked when the object is
deleted.
Example 1: If destructors aren't virtual, bad things will happen.

 class CBase {
 public:
 CBase (); // user-supplied constructor
 ~CBase (); // user-supplied destructor

 // other functions and variables
 };
 class CMyClass : public CBase {
 public:
 CMyClass (); // user-supplied constructor
 ~CMyClass(); // user-supplied destructor
 // other functions and variables
 };
 void main () {
 CBase* pBase = new CMyClass;
 delete pBase;
 // wrong destructor invoked because ~CBase () is not virtual
 }

The assignment operator CMyClass& CMyClass::operator=(const CMyClass& src) and
copy constructor CMyClass::CMyClass(const CMyClass& src) are similar and often
confused. To illustrate the difference between them, the code in Example 2
shows where each is called. Even though the declaration in line 4 looks like
an assignment, it differs from the true assignment operation on line 3 because
it is a variable initialization (mInit is being initialized to m0). The
default implementation of a compiler-generated assignment operator or copy
constructor is to perform a member-wise assignment or copy of the object. In
the case of the assignment operator, the default implementation returns a
non-const reference to the destination object so that assignments may be
chained together. (You may return a const reference, or even have a void
function if you desire.) If your class has any dynamically allocated data in
it, then you must always implement both an assignment operator and a copy
constructor. If you don't, then again, strange things will happen.
Example 2: Differentiating between the assignment operator and copy
constructor.

 /* 1 */ extern CMyClass m0;
 /* 2 */ CMyClass m1; // invokes default constructor
 /* 3 */ m1 = m0; // invokes assignment operator
 /* 4 */ CMyClass mInit = m0; // invokes copy constructor

For example, consider a string class that maintains a char* pointer to
dynamically allocated information. If you assign one string object to another
and rely on the default implementation of the assignment operator, the char*
member variable will be copied to the destination. The result will be two
objects with character points that point to the same string. Undoubtedly, one
will be destroyed and free the string memory (in the destructor), while the
other string will continue to live, causing memory-trashing bugs at run time!
The correctly implemented string class will make a copy of the string data (by
dynamic allocation and performing strcpy, for example) in a user-defined
assignment operator.
One other thing to consider when writing these functions is to be sure to test
for assigning to yourself, as in m1 = m1. Though most of us do not write code
like that, it can often result from a complex expression.
Each of these four functions can be modified by the standard C++
access-protection keywords: public, protected, and private. (The generated
functions are always public). Thus if you wish to prevent assigning one object
to another, all you need to do is make the assignment operator and copy
constructor private members. One useful trick, which we used in the Microsoft
Foundation Classes (MFC, the C++ application framework for Windows) was to
make the copy constructor and assignment operator private members in the
common base class. This will always prevent the compiler from generating these
functions in all derived classes. As a matter of fact, a compile-time error
will be generated, which is always preferable to a runtime error.
This brings us to what is called the "canonical class form." In MFC, all our
classes follow a standard template, unless we have a reason to limit the
functionality of a class. If you follow this canonical class form, then
classes you write will behave much like intrinsic types (int, char, and so
on), since they can be created, assigned, and copied just like variables of
intrinsic types.
But as stated, MFC implements a private assignment operator and a private copy
constructor in the CObject class; see Example 3. This means that derived
classes can safely not supply implementations of these functions, especially
since it often doesn't make sense to permit two objects to be equal (for
example, a class that maintains a reference to some system-allocated resource,
such as a file).
Example 3: A private assignment operator and a private copy constructor in the
CObject class.

 class CAnyClass : public CObject {
 CAnyclass ();
 virtual ~CAnyClass ();
 CAnyClass (const CAnyClass& src);
 CAnyClass& operator=(const CAnyClass& src);

 };

C++ is a great engineering tool and most certainly makes it easier to write
more type-safe and maintainable programs. Unlike its predecessor C, however,
C++ does a number of things behind your back. If you don't take this into
account when designing your classes, they'll be less useful and not as robust
as they could be.



























































November, 1992
DESIGNING PORTABLE USER INTERFACES


Moving from DOS to UNIX




John L. Bradberry


John is development manager at Scientific Concepts, 1033 Franklin Road, Suite
11-295, Marietta, GA 30067.


Numerous software-design issues must be considered when porting applications
from one platform to another, among them the question of how to handle the
user interface. Luckily, software-engineering techniques exist to help reduce
the cost, effort, and maintenance of user-interface (UI) designs and ports. To
illustrate some of these techniques, I discuss in this article an application
originally written in C for DOS using a text-based menu package I also
authored. Eventually, I had to port the program to UNIX, using the XView
toolkit to do so. Here I'll focus on how the design of the application's UI
made such a port feasible.
If you take an idealized approach to moving application software across
platforms, the portability solution seems obvious:
1. Move the source code for the application from platform A to platform B.
2. Compile and link the application on platform B.
3. Run the new application on platform B.
While this type of software "reuse" can be done in almost any language, the
implicit requirement is that applications not communicate with a device nor
perform any action not completely specified by the language. This usually
means your code can't stray outside strict ANSI specifications.
A more realistic view of portability is the higher-level, port-and-fix method
shown in Figure 1. The dotted box (domain-dependency region) contains all the
low-level device- and machine-specific code that must be reproduced (in
principle) for the new environment. The size of the device-dependency region
varies from platform to platform. If you're porting in the right direction,
you may actually have less work to do!
Moving from DOS to UNIX involves other issues aside from application-code
reuse and device interfaces. In going from the single-user/single-task world
to the multiuser/multitask world, for instance, you must adapt to significant
differences in programming mentality. In the DOS world, you can communicate
directly (almost) with any device by reading and writing from the
physical-address map. Other than an occasional conflict with TSRs, tasks can
assume uncontested control over devices and memory regions in fixed locations.
For example, many DOS window libraries (like the one we're porting to UNIX)
bypass BIOS services and read/write directly to the video pages.
Also, the idea of protected memory is, in many DOS configurations, wishful
thinking. Contrast this to what can happen in a system where your task is one
of dozens competing for the same resources. In multitasking environments, the
OS must protect tasks from each other as well as from themselves. Figure 2
shows the contrast between the task control offered in single-tasking DOS to
the extended environment of multitasking in operating systems such as UNIX. As
the left portion of the figure illustrates, DOS programmers can access
device-memory maps by simply writing to a fixed location in memory. A single
task can even write to areas reserved for DOS itself. The "static" nature of
address locations is both a blessing and a curse to many developers.
In contrast, the right side of Figure 2 illustrates a drastically different
task-environment structure. Not only are multiple tasks "sharing" one or more
CPUs, but a common network-protocol mechanism such as remote procedure calls
(RPCs) can allow tasks to communicate from one location to another. It's
possible for more than one task to communicate with more than one workstation
screen at a time. The nature of this virtual mapping of video services renders
hardcoded techniques both undesirable and dangerous in this type of
environment.
This contrast of environments seems to make our chances of porting the
application without difficulty quite slim indeed. Fortunately, many of the
added (implied) "requirements" in the multitasking environment are
automatically "handled" for us.


Overview of the DOS-based Library


A few years ago, I wrote a custom, text-based UI library for DOS-based
application work. (Because this article focuses on porting the application,
the library is not presented here. If you're interested in a copy, contact me
at the address on the first page of this article.) Like many other library
tools, including a wide variety of X-Window look-alikes developed since then,
my text-based window package follows a common basic design.
Text-based systems use video pages residing at locations starting at the fixed
segment address of B0000h through B0F9Fh. Using text video maps readily
exploits the advantages of relative speed and simplicity (provided your
language allows you to write/read directly from memory locations). The effect
of real-time window popping is achieved by simply switching between video
planes.
You can expand the basic text-based, window-support library from block
read/write operations by adding low-level functions like the following:
GetVideoPage()/SetVideoPage() to form the foundation for hiding the details of
addressing from the application; GetSetAttributeByte() to get or set the
attribute of a character at some row or column location on the current page;
GetSetAttributeCharacter() to read or write a character at some row or column
location on the current page; and ReadWriteString() to use previous
lower-level calls, in which a character string is read or written to a
specified row and column on the current video page.
Built on top of these routines are higher-level calls for drawing boxes using
DOS graphics characters, color control, and so on. To complete the DOS-based
UI, add routines for mouse control, keyboard monitoring, and extended prompt
routines that allow keyboard editing. Finally--several thousand lines of code
later--you have your text-based UI!


The Application


As a test of the issues discussed up to this point, let's examine a personal
phone-directory application I ported from DOS to UNIX. Using an address-field
layout common to many wordprocessors and labeling packages, the program reads
in a database file and displays records that allow the user to "flip" through
the files in either direction. In addition, the user can enter characters in
the name, phone number, or address fields to search for a particular record.
The DOS version of this program is divided into two small modules: one
containing the "generic" portion of the C code for manipulating the record
data, the other containing the DOS-specific menu information we hope to
replace with its UNIX counterpart later on Listings One (page 130) and Two
(page 132) illustrate the contents of the modules. At this point, I won't
bother with the header-file contents that are custom or non-ANSI standard C
since they'll be replaced in the UNIX version.
Note that Listings One and Two follow the high-level portability model
introduced in Figure 1. Both listings contain some of the device- and
machine-specific implementations of the DOS video services. However, Listing
Two is much more closely bound to the DOS-based library.
In Listing Two, the MENU... keywords represent macros used by lower-level
video functions in defining the size and number of lines required for the
window box. By definition, the first line following the keyword MENUITEM
defines a scroll bar, and the second line represents the help message to be
scrolled at the bottom of the display. The quoted character at the end of the
second line represents a keyboard character that can be recognized instead of
a mouse click to "select" the menu operation. Therefore, this window package
will work regardless of whether or not a mouse is present (unlike some other
commercially available packages).
You could insist that this menu representation be maintained in the UNIX
environment and write lower-level support to "attach" it to the X-Window
package. However, this would be equivalent to putting one overcoat on top of
another. The calls box_menu_start("Directory Utility", PhoneMain, VFBRWHITE,
VFCYAN, VFBRWHITE, DOUBLEBAR, MenuLines, VFBLUE); and
Midx=box_select(PhoneMain, VFBRWHITE, VFCYAN, LBUTTON, MenuLines, Marker,
VFBLUE); from Listing One illustrate the setup and processing preamble for the
window system.
In the first call to box_menu_start(), we specify a title for the menu header,
the name of the menu structure, color assignments for the box sides, and the
number of lines to be displayed in the menu. If you specify fewer lines to
display than defined by the structure, the menu is scrolled within the box.
While the first call to box_menu_start() is displaying the information, a
second call deals with event-handling issues. The call to box_select() tells
the lower-level event-handling routines which structure to examine and what
type of user action other than a key-press to signal the caller about. In this
call, the LBUTTON parameter specifies that the left mouse button be recognized
along with the specific letter keys noted in the structure.


Overview of the XView Toolkit


In principle, DOS text-based window libraries and X-Window systems share
common elements. X applications are much more extensive, however. Here is a
summary of the issues pointed out in Dan Heller's book, XView Programming
Manual, (O'Reilly & Associates, 1992).
In X a display is not a fixed-size entity located at one location in memory.
In X the server receives the protocols necessary to control one or more
screens at any location. Instead of writing to a location to render a graphics
or text box, communication protocols are used in a client/server-based scheme.
Xlib is the lowest-level library used to translate data structures and events.
XView, short for "X Window-System-based Visual/Integrated Environment for
Workstations," is a UI toolkit developed by Sun Microsystems as a higher-level
interface into the Xlib library. It enables inexperienced X-Window programmers
to develop interfaces compatible with the OPEN LOOK GUI, so that all window
operations have a similar standard look-and-feel. The form of window structure
used in XView is slightly different. Instead of simply forming a box of
scrolling lines and waiting for a key-press, you have hundreds of options for
configuring buttons, boxes, lists, text, and so on. Consequently, the first
thing a new user may notice about XView is that finding an option and setting
the associated attributes correctly can be a frustrating experience.
Configuring a menu in this context means picking from a shopping list of
objects--windows, panels, frames, and the like.
Part of the attribute setting for objects involves specifying the
event-handling sequence. This usually translates into setting up a function to
process the result of an event registered with another task on your behalf.
This may all sound a little confusing, but for the most part you simply use
the format of the functions used in the manual examples.



DOS-to-UNIX Portability in C


A couple of important differences between DOS and UNIX must be addressed
before restructuring the code for XView. First, the compiler differences
between the PC and UNIX can be quite significant. Many DOS programmers assume
that code that compiles without errors or warnings will automatically work on
UNIX and other environments. This is a mistaken assumption for a variety of
reasons, among them compiler warning level and header-file structure conflicts
between platforms.
In addition, code that compiles and links on UNIX without errors or warnings
is more likely to fail than on DOS. Why? Because, as illustrated in Figure 2,
your DOS task is allowed to write almost anywhere (even if you didn't plan it
that way). However, memory-protection schemes of OSs such as UNIX deal with
such rude program behavior by aborting the task, not rebooting the OS.
The second difference deals with the issue of function prototypes. Many
programmers use the ANSI function declarations and prototypes exclusively for
functions such as int foo(char c, int Val, char *Mstring);. The older style,
quite common in UNIX or with older-generation C programmers, leads to limited
error checking and a myriad of other problems. Sun workstations provide two
versions of the C compiler (cc or gcc) as choices for the old and new style,
respectively. UNIX does provide the lint code-verifier utility for finding
pointer and type-coercion bugs. I used the old style (as in the examples in
the XView book).


Making the Port


A part of Listing One, which is available electronically (see "Availability"
on page 5), contains the code modified to utilize the features of XView. Note
that the generic module of Listing One is represented in the XView version
with very few alterations. The routines BOOL ReadList(), BOOL SearchField(),
and BOOL NextLine() ported without changes.
Also note the XView-specific header files and the definition of global XView
"objects" (in the Window Related Control section) used to build the main-menu
structure. In the main program section, xv_create is used to register (using
the PANEL_NOTIFY_PROC attribute) the menu items with our application by
specifying the object type, attributes, and functions used to service the
events.
After the panel items are registered with the event handler (notifier), the
functions wait patiently to be informed of an event. In this case, unlike in
the DOS version, we don't have to worry about actually detecting the event!
The line (near the end of main) does this: xv_main_loop(PhoneFrame);.
The PhoneSelect function handles any of the "text" events by asking for the
name of the event (xv_get) and checking the string returned. In a similar
fashion, the ChoiceSelect function gets an integer code representing a button
pushed to execute the desired function.


Conclusions


While the desired principles of window operations remained fairly common
between the two platforms, the results obtained were drastically different. We
didn't write significantly more code in the move to UNIX, but made use of more
of the OS's built-in features. Although I barely scratched the surface in
terms of what could be done in XView (albeit painfully), it should be clear
that this type of effort is at least feasible.
_DESIGNING PORTABLE USER INTERFACES_
by John L. Bradberry


[LISTING ONE]

/*+=========================================================================
== Personal Phone Directory Utility ==
== author: john l. bradberry creation date: jan 30,1992 ==
== e-mail: jbrad@cc last modified: ==
==========================================================================*/

#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <string.h>
#include <search.h>
#include <dos.h>
#include <io.h>
#include <stdio.h>
#include <math.h>
#include <time.h>

#include "mstrlib.h"
#include "doskbio.h"
#include "filelib.h"
#include "genlib.h"
/*-------------------------- macros / constants --------------------------*/
#define MAXLABELS 50 /* maximum address records allowed*/
typedef enum {VOID, NAME, ADDRESS, PHONE} KEYTYPE;
typedef enum {OFF, ON} ONOFF;
typedef enum {FALSE, TRUE} BOOL;
/*--------------------------- global variables ---------------------------*/
 static char *MARKER = "[]" ;/* record seperator */
 static char PHONEFILE[30] ;/* current data base file name */
/*------------------------------ structures ------------------------------*/
typedef struct
 {

 char Name[40] ;/* name of this person */
 char Address[4][40] ;/* maximum of four address fields */
 char PhoneNumber[40] ;/* phone number (parse later) */
 char Greeting[20] ;/* as in Dear Mr/Ms: */
 }PREC;
struct
 {
 PREC Info[MAXLABELS] ;/* data base to be read in */
 int Size ;/* number of records in phone list*/
 char SearchString[40] ;/* string used in search */
 int SearchKey ;/* search index into phone list */
 } List;
/*------------------------ function prototypes ---------------------------*/
BOOL ReadList();
BOOL SearchField();
BOOL NextLine();
void DispRecord();
void DispLine();
/*----------------------- WINDOW RELATED CONTROL -------------------------*/
#include "dosmenu.h"
/*+=========================================================================
==program main: Phone Directory Utility... ==
==========================================================================*/
int main()
{
 int Ival ;/* temporary variable */
 BOOL NoExit ;/* indicates end of menu mode */

 char Stemp[80] ;/* temporary string */
 char NameSearch[40] ;/* string used in name search */
 char AddressSearch[40] ;/* string used in address search */
 char PhoneSearch[40] ;/* string used in phone search */
 KEYTYPE LastType ;/* last type of search performed */

 List.SearchKey = 0;
 strcpy(PHONEFILE,"genlist.dat");
 List.SearchString[0]='\0';
 NameSearch[0]='\0';
 AddressSearch[0]='\0';
 PhoneSearch[0]='\0';
/*++++ Display main menu mask... ++++*/
 ClearMain();
/*++++ Sub menu control loop... ++++*/
 NoExit=TRUE;
 if (ReadList(PHONEFILE) == FALSE)
 errout("Data Base File Read Error!");
 DispLine(PHONEFILE, 5, 39, 28, VFBRWHITE, VFBLUE<<4);
 if (List.Size > 0) DispRecord();
 Midx=0;
 while (NoExit)
 {
 Midx=box_select(PhoneMain,VFBRWHITE,VFCYAN,LBUTTON,MenuLines,
 Marker,VFBLUE);
 switch (Midx)
 {
 case 0:
 List.SearchKey = 0;
 get_sval("Enter Database File Name: ",PHONEFILE);
 strim(PHONEFILE);

 DispLine(PHONEFILE, 5, 39, 28, VFBRWHITE, VFBLUE<<4);
 if (ReadList(PHONEFILE) == FALSE)
 errout("Data Base File Read Error!");
 else
 DispRecord();
 break;
 case 1:
 if (List.Size > 0)
 {
 List.SearchKey = (List.SearchKey < List.Size - 1 ?
 List.SearchKey + 1 : 0);
 DispRecord();
 }
 else
 errout("No Valid Data Base!");
 break;
 case 2:
 if (List.Size > 0)
 {
 List.SearchKey = (List.SearchKey > 0 ?
 List.SearchKey - 1 : List.Size - 1);
 DispRecord();
 }
 else
 errout("No Valid Data Base!");
 break;
 case 3:
 if (List.Size > 0)
 {
 List.SearchKey = 0;
 get_sval("Enter Name Search Chars: ",NameSearch);
 strcpy(List.SearchString,NameSearch);
 LastType = NAME;
 SearchField(LastType, List.SearchString);
 DispLine(NameSearch, 8, 39, 28, VFBRWHITE, VFBLUE<<4);
 DispLine(List.SearchString, 11, 39, 28,
 VFBRWHITE, VFBLUE<<4);
 DispRecord();
 }
 else
 errout("No Valid Data Base!");
 break;
 case 4:
 if (List.Size > 0)
 {
 List.SearchKey = 0;
 get_sval("Enter Phone Search Chars: ",PhoneSearch);
 strcpy(List.SearchString,PhoneSearch);
 LastType = PHONE;
 SearchField(LastType, List.SearchString);
 DispLine(PhoneSearch, 9, 39, 28, VFBRWHITE, VFBLUE<<4);
 DispLine(List.SearchString, 11, 39, 28,
 VFBRWHITE, VFBLUE<<4);
 DispRecord();
 }
 else
 errout("No Valid Data Base!");
 break;
 case 5:

 if (List.Size > 0)
 {
 List.SearchKey = 0;
 get_sval("Enter Address Search Chars: ",AddressSearch);
 strcpy(List.SearchString,AddressSearch);
 LastType = ADDRESS;
 SearchField(LastType, List.SearchString);
 DispLine(AddressSearch, 10, 39, 28,
 VFBRWHITE, VFBLUE<<4);
 DispLine(List.SearchString, 11, 39, 28,
 VFBRWHITE, VFBLUE<<4);
 DispRecord();
 }
 else
 errout("No Valid Data Base!");
 break;
 case 6:
 if (List.Size > 0)
 {
 SearchField(LastType, List.SearchString);
 DispRecord();
 }
 else
 errout("No Valid Data Base!");
 break;
 case 7:
 Ival=question("Exit this program to DOS: Y(es)");
 if ((Ival=='y')(Ival=='Y')(Ival=='\r'))
 {
 NoExit = FALSE;
 }
 break;
 default:
 cur_posit(MenuRow,MenuCol);
 box_menu_start("Directory Utility",PhoneMain,VFBRWHITE,
 VFCYAN,VFBRWHITE,DOUBLEBAR,MenuLines,VFBLUE);
 break;
 }
 }
/*+++++ Exit and restore CRT to main video page... +++++*/
 setvpage(0);
 clear();
 cur_posit(21,0);
}/* end of main */
/*+=========================================================================
==BOOL ReadList: Open user phone data base and read into structure... ==
==========================================================================*/
BOOL ReadList(Dbase)
char *Dbase;
{
 BOOL Stcode ;/* status code returned */
 char Stemp[80] ;/* temporary string */
 BOOL NewRecord ;/* indicates beginning new field */
 FILE *FileHandle ;/* pointer to pipe file */
 List.Size = -1;
 Stcode = FALSE;
 NewRecord = FALSE;
 FileHandle=fopen(Dbase,"rb");


 if (FileHandle != NULL)
 {
 while ((NextLine(Stemp, FileHandle)) && (List.Size < MAXLABELS -1))
 {
 if (spos(Stemp, MARKER) > 0)
 {
 List.Size++;
 if ((NextLine(List.Info[List.Size].Name, FileHandle)) &&
 (List.Size < MAXLABELS -1))
 {
 NextLine(List.Info[List.Size].Address[0], FileHandle);
 NextLine(List.Info[List.Size].Address[1], FileHandle);
 NextLine(List.Info[List.Size].Address[2], FileHandle);
 NextLine(List.Info[List.Size].Address[3], FileHandle);
 NextLine(List.Info[List.Size].PhoneNumber, FileHandle);
 NextLine(List.Info[List.Size].Greeting, FileHandle);
 }
 }
 }
 }
 if (List.Size > 0) Stcode = TRUE;
 return(Stcode);
}/* end of ReadList */
/*+=========================================================================
==BOOL NextLine: Read next line in file... ==
==========================================================================*/
BOOL NextLine(String, FileHandle)
char *String;
FILE *FileHandle;
{
 BOOL Stcode ;/* status code returned */
 char Stemp[80] ;/* temporary string */
 char *Sptr ;/* pointer to string */
 Stcode = FALSE;
 String[0] = '\0';
 if (fgets(Stemp, sizeof Stemp, FileHandle) != NULL)
 {
 Sptr = strrchr(Stemp,'\r');
 if (Sptr != NULL) *Sptr = ' ';
 Sptr = strrchr(Stemp,'\n');
 if (Sptr != NULL) *Sptr = ' ';
 strim(Stemp);
 strcpy(String,Stemp);
 Stcode = TRUE;
 }
 return(Stcode);
}/* end of NextLine */
/*+=========================================================================
==BOOL SearchField: Search phone for data using key to select field.. ==
==========================================================================*/
BOOL SearchField(Key, Sdata)
KEYTYPE Key;
char *Sdata;
{
 BOOL Stcode ;/* status code returned */
 int OldSearchKey ;/* copy of search key returned */
 BOOL NoMatch ;/* indicates search data found */
 Stcode = FALSE;
 NoMatch = TRUE;

 OldSearchKey = List.SearchKey;
 if (List.SearchKey != 0) List.SearchKey++;
 while((NoMatch) && (List.SearchKey < List.Size))
 {
 switch (Key)
 {
 case NAME:
 if (spos(List.Info[List.SearchKey].Name , Sdata) > 0)
 NoMatch = FALSE;
 break;
 case PHONE:
 if (spos(List.Info[List.SearchKey].PhoneNumber , Sdata) > 0)
 NoMatch = FALSE;
 break;
 case ADDRESS:
 if (spos(List.Info[List.SearchKey].Address[0] , Sdata) > 0)
 NoMatch = FALSE;
 if (spos(List.Info[List.SearchKey].Address[1] , Sdata) > 0)
 NoMatch = FALSE;
 if (spos(List.Info[List.SearchKey].Address[2] , Sdata) > 0)
 NoMatch = FALSE;
 if (spos(List.Info[List.SearchKey].Address[3] , Sdata) > 0)
 NoMatch = FALSE;
 break;
 default:
 puts("Error - bad key");
 }
 if (NoMatch)
 List.SearchKey++;
 else
 Stcode = TRUE;
 }
 if (NoMatch) List.SearchKey = OldSearchKey;
 return(Stcode);
}/* end of SearchField */
/*+=========================================================================
==void DispRecord: Display record on video box of main menu... ==
==========================================================================*/
void DispRecord()
{
 char Stemp[80] ;/* temporary string */
 sprintf(Stemp,"[%3d] ",List.SearchKey);
 sjoin(Stemp,List.Info[List.SearchKey].Name);
 DispLine(Stemp, TableRow+1,
 TableCol+2, 37, VFBRWHITE, VFMAGENTA<<4);
 DispLine(List.Info[List.SearchKey].Address[0], TableRow+2,
 TableCol+8, 32, VFBRWHITE, VFMAGENTA<<4);
 DispLine(List.Info[List.SearchKey].Address[1], TableRow+3,
 TableCol+8, 32, VFBRWHITE, VFMAGENTA<<4);
 DispLine(List.Info[List.SearchKey].Address[2], TableRow+4,
 TableCol+8, 32, VFBRWHITE, VFMAGENTA<<4);
 DispLine(List.Info[List.SearchKey].Address[3], TableRow+5,
 TableCol+8, 32, VFBRWHITE, VFMAGENTA<<4);
 DispLine(List.Info[List.SearchKey].PhoneNumber, TableRow+6,
 TableCol+8, 32, VFBRWHITE, VFMAGENTA<<4);
}/* end of DispRecord */
/*+=========================================================================
==void DispLine: Display single line on video of main menu... ==
==========================================================================*/

void DispLine(String, Row, Col, Width, ForColor, BackColor)
char *String;
int Row;
int Col;
int Width;
int ForColor;
int BackColor;
{
 char Stemp[80] ;/* temporary string */
 char Bline[80] ;/* blank line */
 memset(Bline,' ',Width + 1);
 Bline[Width + 1]='\0';
 cputmemstr(Bline,Row,Col,ForColor,BackColor);
 strcpy(Stemp,String);
 Stemp[Width + 1]='\0';
 cputmemstr(Stemp,Row,Col,ForColor,BackColor);
}/* end of DispLine */



[LISTING TWO]

/*+=========================================================================
== Personal Phone Directory Utility ==
== author: john l. bradberry creation date: jan 30,1992 ==
== e-mail: jbrad@cc last modified: ==
==========================================================================*/
#include <stdio.h>

/*-------------------------- macros / constants --------------------------*/
/*----------------------- WINDOW RELATED CONTROL -------------------------*/
/*--------------------------- window globals -----------------------------*/
 static int TableRow = 14 ;/* table display row position */
 static int TableCol = 15 ;/* table display row position */
 static int MenuRow = 4 ;/* menu display row position */
 static int MenuCol = 10 ;/* menu display column position */
 static int Marker = 15 ;/* menu indicator token */
 static int Midx ;/* index counter into menu table */
 static int MenuLines = 8 ;/* number of lines in menu */
/*------------------------- window structures ----------------------------*/
static DEFINEMENU PhoneMain[] =
 {
 MENUITEM("c) Change Data Base : ",
 "Change name of data base file.", 'c')
 MENUITEM("f) Foward (Next Record) : ",
 "Display all fields in next record index.", 'f')
 MENUITEM("b) Backward (Prev Record): ",
 "Display all fields in previous record index.", 'b')
 MENUITEM("n) Name Search :",
 "Search name fields in all records for match.", 'n')
 MENUITEM("p) Phone Search :",
 "Search phone number fields in all records for match.", 'p')
 MENUITEM("a) Address Search :",
 "Search address fields in all records for match.", 'a')
 MENUITEM("r) Repeat Last Search :",
 "Repeat last search attempted and find next matching key.", 'r')
 MENUITEM("x) Return to DOS :",
 "Exit program and return to DOS operating system.", 'x')
 MENUITEMEND

 };
static DEFINETABLE PhoneTable[] =
 {
 TABLEITEM(" ")
 TABLEITEM(" ")
 TABLEITEM(" ")
 TABLEITEM(" ")
 TABLEITEM(" ")
 TABLEITEM(" ")
 TABLEITEMEND
 };
/*------------------------ function prototypes ---------------------------*/
void ClearMain();
/*+=========================================================================
==void ClearMain: Clear screen and display main menu... ==
==========================================================================*/
void ClearMain(void)
{
 int Row; /* row position */
 int Col; /* column position */
/*++++ Set menu related global variables... ++++*/
 BLINKOFF(atbyte);
 pagenum=PAGE0;
 setvpage(pagenum);
 colclear(VBCYAN);
 boxtype=1;
 shadow=0;
 Row=23;
 Col=0;
 cur_posit(Row,Col); /* standard prompt position */
 coleraselin(Row-1,VBBLUE);
 coleraselin(Row,VBBLUE);
 coleraselin(Row+1,VBBLUE);
 cur_posit(1,0);
 coceprt("PHONEMATE - Telephone Directory Utility (Ver 3.1)",
 VFBRWHITE,VBBLUE);
 cur_posit(TableRow,TableCol);
 table_display(" ", PhoneTable,VFLICYAN,VFMAGENTA,VFYELLOW,DOUBLEBAR);
 mensel = MenuLines-1;
 cur_posit(MenuRow,MenuCol);
 box_menu_start("Directory Utility",PhoneMain,VFBRWHITE,
 VFCYAN,VFBRWHITE,DOUBLEBAR,MenuLines,VFBLUE);
}/* end of ClearMain */


[LISTING THREE]

/*+=========================================================================
== Personal Phone Directory Utility ==
== author: john l. bradberry creation date: jan 30,1992 ==
== e-mail: jbrad@cc last modified: ==
==========================================================================*/

#include <stdio.h>

#include <xview/xview.h>
#include <xview/frame.h>
#include <xview/panel.h>
#include <xview/notice.h>

#include <xview/cms.h>
#include <xview/tty.h>
#include <xview/font.h>

#include "syshead.h"
/*-------------------------- macros / constants --------------------------*/
#define MAXLABELS 50 /* maximum address records allowed*/
typedef enum {VOID, NAME, ADDRESS, PHONE} KEYTYPE;
typedef enum {OFF, ON} ONOFF;
typedef enum {FALSE, TRUE} BOOL;
/*--------------------------- global variables ---------------------------*/
 static char *MARKER = "[]" ;/* record seperator */
 static char PHONEFILE[80] ;/* current data base file name */
 static KEYTYPE LastType ;/* last type of search performed */
 static char NameSearch[80] ;/* string used in name search */
 static char AddressSearch[80] ;/* string used in address search */
 static char PhoneSearch[80] ;/* string used in phone search */
/*------------------------------ structures ------------------------------*/
typedef struct
 {
 char Name[80] ;/* name of this person */
 char Address[4][80] ;/* maximum of four address fields */
 char PhoneNumber[80] ;/* phone number (parse later) */
 char Greeting[80] ;/* as in Dear Mr/Ms: */
 }PREC;
struct
 {
 PREC Info[MAXLABELS] ;/* data base to be read in */
 int Size ;/* number of records in phone list */
 char SearchString[80] ;/* string used in search */
 int SearchKey ;/* search index into phone list */
 } List;
/*------------------------ function prototypes ---------------------------*/
BOOL ReadList();
BOOL SearchField();
BOOL NextLine();
void DispRecord();
void DispLine();
/*----------------------- WINDOW RELATED CONTROL -------------------------*/
 static int LineRow = 200 ;/* table display row position */
 static int LineCol = 80 ;/* table display col position */
 Frame PhoneFrame ;/* base frame for phone menu */
 Panel PhonePanel ;/* base panel for phone menu */
 Menu PhoneMenu ;/* base menu for phone menu */
 Panel_item PhoneFile ;/* Phone data base file - handle.*/
 Panel_item PhoneNameSearch ;/* Phone name search - handle.*/
 Panel_item PhoneNumberSearch ;/* Phone number search - handle.*/
 Panel_item PhoneAddressSearch ;/* Phone address search - handle.*/
 Panel_item PhoneChoice ;/* Phone search options - handle.*/
/*------------------------ function prototypes ---------------------------*/
void PhoneQuit();
int PhoneSelect();
int ChoiceSelect();
int PhoneForward();
int PhoneBackward();
int RepeatSearch();
/*+=========================================================================
== program main: Phone Directory Utility... ==
==========================================================================*/

int main()
{
 List.SearchKey = 0;
 strcpy(PHONEFILE,"genlist.dat");
 List.SearchString[0]='\0';
 NameSearch[0]='\0';
 AddressSearch[0]='\0';
 PhoneSearch[0]='\0';
/*+++++ Display main menu mask...++++*/
 PhoneFrame = (Frame)xv_create(NULL, FRAME,
 FRAME_NO_CONFIRM, TRUE,
 FRAME_INHERIT_COLORS, TRUE,
 FRAME_LABEL,
 "PHONEMATE - Telephone Directory Utility (Ver 3.1)",
 NULL);
 PhonePanel = (Panel) xv_create(PhoneFrame, PANEL, NULL);
 PhoneFile = xv_create(PhonePanel, PANEL_TEXT,
 PANEL_NEXT_ROW, -1,
 PANEL_LABEL_STRING, "Change Data Base",
 PANEL_VALUE, PHONEFILE,
 PANEL_VALUE_DISPLAY_LENGTH, 50,
 PANEL_VALUE_X, 150,
 PANEL_NOTIFY_PROC, PhoneSelect,
 NULL);
 PhoneNameSearch = xv_create(PhonePanel, PANEL_TEXT,
 PANEL_NEXT_ROW, -1,
 PANEL_LABEL_STRING, "Name Search",
 PANEL_VALUE, NameSearch,
 PANEL_VALUE_DISPLAY_LENGTH, 50,
 PANEL_VALUE_X, 150,
 PANEL_NOTIFY_PROC, PhoneSelect,
 NULL);
 PhoneNumberSearch = xv_create(PhonePanel, PANEL_TEXT,PANEL_NEXT_ROW, -1,
 PANEL_LABEL_STRING, "Number Search",
 PANEL_VALUE, PhoneSearch,
 PANEL_VALUE_DISPLAY_LENGTH, 50,
 PANEL_VALUE_X, 150,
 PANEL_NOTIFY_PROC, PhoneSelect,
 NULL);
 PhoneAddressSearch = xv_create(PhonePanel, PANEL_TEXT,
 PANEL_NEXT_ROW, -1,
 PANEL_LABEL_STRING, "Address Search",
 PANEL_VALUE, AddressSearch,
 PANEL_VALUE_DISPLAY_LENGTH, 50,
 PANEL_VALUE_X, 150,
 PANEL_NOTIFY_PROC, PhoneSelect,
 NULL);
 PhoneChoice = xv_create(PhonePanel, PANEL_CHOICE,
 PANEL_LABEL_STRING, "Search Options",
 PANEL_NEXT_ROW, 40,
 PANEL_CHOICE_STRINGS,"Repeat Last Search",
 "Next Record",
 "Previous Record",
 NULL,
 PANEL_NOTIFY_PROC, ChoiceSelect,
 NULL);
 (void) xv_create(PhonePanel, PANEL_BUTTON,
 PANEL_LABEL_STRING, "Exit Phone Menu System",
 XV_X, 225,

 XV_Y, 450,
 PANEL_NOTIFY_PROC, PhoneQuit,
 NULL);
 if (ReadList(PHONEFILE) == FALSE)
 errout("Data Base File Read Error!");
 if (List.Size > 0) DispRecord();
/*++++ Sub menu control loop...++++*/
 window_fit(PhoneFrame);
 xv_main_loop(PhoneFrame);
/*++++ Exit and restore CRT to main video page...++++*/
}/* end of main */
/*+=========================================================================
== int PhoneSelect: Process event from phone menu selection... ==
==========================================================================*/
int PhoneSelect(item, event)
Panel_item item;
Event *event;
{
 char ItemName[82] ;/* name of item event */
 strcpy(ItemName , (char *)xv_get(item, PANEL_LABEL_STRING));
 if (spos(ItemName, "Data Base") > 0)
 {
 strcpy(PHONEFILE , (char *)xv_get(item, PANEL_VALUE));
 strim(PHONEFILE);
 if (ReadList(PHONEFILE) == FALSE)
 errout("Data Base File Read Error!");
 else
 DispRecord();
 }
 else if (spos(ItemName, "Name") > 0)
 {
 strcpy(NameSearch , (char *)xv_get(item, PANEL_VALUE));
 if (List.Size > 0)
 {
 List.SearchKey = 0;
 strcpy(List.SearchString,NameSearch);
 LastType = NAME;
 SearchField(LastType, List.SearchString);
 DispRecord();
 }
 else
 errout("No Valid Data Base!");
 }
 else if (spos(ItemName, "Number") > 0)
 {
 strcpy(PhoneSearch , (char *)xv_get(item, PANEL_VALUE));
 if (List.Size > 0)
 {
 List.SearchKey = 0;
 strcpy(List.SearchString,PhoneSearch);
 LastType = PHONE;
 SearchField(LastType, List.SearchString);
 DispRecord();
 }
 else
 errout("No Valid Data Base!");
 }
 else if (spos(ItemName, "Address") > 0)
 {

 strcpy(AddressSearch , (char *)xv_get(item, PANEL_VALUE));
 if (List.Size > 0)
 {
 List.SearchKey = 0;
 strcpy(List.SearchString,AddressSearch);
 LastType = ADDRESS;
 SearchField(LastType, List.SearchString);
 DispRecord();
 }
 else
 errout("No Valid Data Base!");
 }
 return XV_OK;
}/* end of PhoneSelect */
/*+=========================================================================
==int ChoiceSelect: Call search option function... ==
==========================================================================*/
int ChoiceSelect(item, event)
Panel_item item;
Event *event;
{
 int ChoiceVal ;/* value of keypress */
 ChoiceVal = (int )xv_get(item, PANEL_VALUE);
 if (List.Size > 0)
 {
 switch (ChoiceVal)
 {
 case 0:
 SearchField(LastType, List.SearchString);
 break;
 case 1:
 List.SearchKey = (List.SearchKey < List.Size - 1 ?
 List.SearchKey + 1 : 0);
 break;
 case 2:
 List.SearchKey = (List.SearchKey > 0 ?
 List.SearchKey - 1 : List.Size - 1);
 break;
 }
 DispRecord();
 }
 else
 errout("No Valid Data Base!");
 return XV_OK;
}/* end of ChoiceSelect */
/*+=========================================================================
==void PhoneQuit: Destroy frame and exit menu... ==
==========================================================================*/
void PhoneQuit()
{
 xv_destroy_safe(PhoneFrame);
}/* end of PhoneQuit */
/*+=========================================================================
==BOOL ReadList: Open user phone data base and read into structure... ==
==========================================================================*/
BOOL ReadList(Dbase)
char *Dbase;
{
 BOOL Stcode ;/* status code returned */

 char Stemp[80] ;/* temporary string */
 BOOL NewRecord ;/* indicates beginning new field */
 FILE *FileHandle ;/* pointer to pipe file */
 List.Size = -1;
 Stcode = FALSE;
 NewRecord = FALSE;
 FileHandle=fopen(Dbase,"rb");
 if (FileHandle != NULL)
 {
 while ((NextLine(Stemp, FileHandle)) && (List.Size < MAXLABELS -1))
 {
 if (spos(Stemp, MARKER) > 0)
 {
 List.Size++;
 if ((NextLine(List.Info[List.Size].Name,FileHandle)) &&
 (List.Size < MAXLABELS -1))
 {
 NextLine(List.Info[List.Size].Address[0],
 FileHandle);
 NextLine(List.Info[List.Size].Address[1],
 FileHandle);
 NextLine(List.Info[List.Size].Address[2],
 FileHandle);
 NextLine(List.Info[List.Size].Address[3],
 FileHandle);
 NextLine(List.Info[List.Size].PhoneNumber,
 FileHandle);
 NextLine(List.Info[List.Size].Greeting,
 FileHandle);
 }
 }
 }
 }
 if (List.Size > 0) Stcode = TRUE;
 return(Stcode);
}/* end of ReadList */
/*+=========================================================================
==BOOL NextLine: Read next line in file... ==
==========================================================================*/
BOOL NextLine(String, FileHandle)
char *String;
FILE *FileHandle;
{
 BOOL Stcode ;/* status code returned */
 char Stemp[80] ;/* temporary string */
 char *Sptr ;/* pointer to string */
 Stcode = FALSE;
 String[0] = '\0';
 if (fgets(Stemp, sizeof Stemp, FileHandle) != NULL)
 {
 Sptr = strrchr(Stemp,'\r');
 if (Sptr != NULL) *Sptr = ' ';
 Sptr = strrchr(Stemp,'\n');
 if (Sptr != NULL) *Sptr = ' ';
 strim(Stemp);
 strcpy(String,Stemp);
 Stcode = TRUE;
 }
 return(Stcode);

}/* end of NextLine */
/*+=========================================================================
==BOOL SearchField: Search phone for data using key to select field.. ==
==========================================================================*/
BOOL SearchField(Key, Sdata)
KEYTYPE Key;
char *Sdata;
{
 BOOL Stcode ;/* status code returned */
 int OldSearchKey ;/* copy of search key returned */
 BOOL NoMatch ;/* indicates search data found */
 Stcode = FALSE;
 NoMatch = TRUE;
 OldSearchKey = List.SearchKey;
 if (List.SearchKey != 0) List.SearchKey++;
 while((NoMatch) && (List.SearchKey < List.Size))
 {
 switch (Key)
 {
 case NAME:
 if (spos(List.Info[List.SearchKey].Name , Sdata) > 0)
 NoMatch = FALSE;
 break;
 case PHONE:
 if (spos(List.Info[List.SearchKey].PhoneNumber , Sdata) > 0)
 NoMatch = FALSE;
 break;
 case ADDRESS:
 if (spos(List.Info[List.SearchKey].Address[0] , Sdata) > 0)
 NoMatch = FALSE;
 if (spos(List.Info[List.SearchKey].Address[1] , Sdata) > 0)
 NoMatch = FALSE;
 if (spos(List.Info[List.SearchKey].Address[2] , Sdata) > 0)
 NoMatch = FALSE;
 if (spos(List.Info[List.SearchKey].Address[3] , Sdata) > 0)
 NoMatch = FALSE;
 break;
 default:
 puts("Error - bad key");
 }
 if (NoMatch)
 List.SearchKey++;
 else
 Stcode = TRUE;
 }
 if (NoMatch) List.SearchKey = OldSearchKey;
 return(Stcode);
}/* end of SearchField */
/*+=========================================================================
==void DispRecord: Display record on video box of main menu... ==
==========================================================================*/
void DispRecord()
{
 char Stemp[80] ;/* temporary string */
 DispLine(" ", LineRow+20, 10);
 DispLine(" ", LineRow+40, LineCol);
 DispLine(" ", LineRow+60, LineCol);
 DispLine(" ", LineRow+80, LineCol);
 DispLine(" ", LineRow+100, LineCol);

 DispLine(" ", LineRow+120, LineCol);
 sprintf(Stemp,"[%3d]",List.SearchKey);
 DispLine(Stemp, LineRow+20, 10);
 DispLine(List.Info[List.SearchKey].Name, LineRow+20, LineCol);
 DispLine(List.Info[List.SearchKey].Address[0], LineRow+40, LineCol);
 DispLine(List.Info[List.SearchKey].Address[1], LineRow+60, LineCol);
 DispLine(List.Info[List.SearchKey].Address[2], LineRow+80, LineCol);
 DispLine(List.Info[List.SearchKey].Address[3], LineRow+100, LineCol);
 DispLine(List.Info[List.SearchKey].PhoneNumber, LineRow+120, LineCol);
}/* end of DispRecord */
/*+=========================================================================
==void DispLine: Display single line on video of main menu... ==
==========================================================================*/
void DispLine(String, Row, Col)
char *String;
int Row;
int Col;
{
 int Idx ;/* index into array */
 char Slabel[160] ;/* string label */
 memset(Slabel,' ',158);
 Slabel[158]='\0';
 Idx = slen(String);
 while (Idx >=0)
 {
 Slabel[Idx] = String[Idx];
 Idx--;
 }
 (void) xv_create(PhonePanel, PANEL_MESSAGE,
 PANEL_LABEL_STRING, Slabel,
 PANEL_LABEL_BOLD, TRUE,
 XV_X, Col,
 XV_Y, Row,
 NULL);
}/* end of DispLine */



























November, 1992
 CONVERTING DITHERED IMAGES BACK TO GRAY SCALE


Undithering algorithms for image enhancement




Allen Stenger


Allen is a programmer for a large radar house. He may be reached on CompuServe
(70401, 1171) or on AppleLink (Stenger).


Most printing uses only black ink, which does not allow shades of gray to be
represented directly. Therefore, pictures in traditional halftoning are
represented by a grid of variable-sized black dots, with thicker dots used for
the darker areas. The eye blurs the dots and the white area between them, so
we perceive shades of gray, though the picture still contains only black and
white.
Computer printers and monitors usually don't have variable-sized dots, so
digital halftoning (colloquially called "dithering") uses the cruder method of
representing shades of gray by various patterns of fixed-sized black dots. For
example, 50 percent black might be shown by a checkerboard pattern of black
and white dots or pixels. Because of the loss of information in going from
8-bit pixels to 1-bit pixels, dithering is intended to be the last step before
printing or displaying. Image-enhancement techniques, even the simplest ones,
such as contrast enhancement, usually do not work on dithered images. Dithered
images are widely available on BBSs and networks, but the original image
(which you might like to have for display on a gray-scale monitor or to
enhance in some way) is usually not available. The loss of information in
dithering generally makes it impossible to recover the original image from a
dithered image, but there are ways to "undither" a dithered image into a
reasonable 8-bit reconstruction of the original image. Figure 1(a), for
example, shows a 256-gray-scale photograph; Figure 1(b) shows it in
Floyd-Steinberg dither (discussed below); and Figure 1(c) shows a
reconstruction from Figure 1(b) using the methods described in this article.


Dithering Methods


The two most common methods are ordered dither and Floyd-Steinberg dither.
Ordered dither uses a cleverly chosen set of black-and-white patterns (usually
8x8 squares of pixels) to represent the different gray-scale levels. (The use
of ordered dither can be recognized by the characteristic crosshatch patterns
that this set generates.) The dithering algorithm first divides the image into
a grid of 8x8 squares. If each square had a constant gray level, it would be
easy to dither. (Just use the pattern having the closest average gray value.)
Since the squares are rarely constant, the algorithm uses a thresholding
scheme to replace each gray pixel with a black or white pixel: For each pixel,
imagine that the whole square did have that gray level, look up the correct
pattern, and replace the pixel with the same-position pixel from the pattern.
OrderedDither in Listing One (page 133) implements ordered dithering.
While ordered dither processes each pixel independently (so in theory all
pixels could be processed in parallel), Floyd-Steinberg (F-S) dither (see
Floyd-SteinbergDither in Listing One) is a serial method that avoids the
artifacts (characteristic crosshatches) of ordered dither and does a better
job of representing fine lines. (The use of F-S dither can be recognized by
the characteristic snaky patterns occurring in almost-black or almost-white
areas.) F-S is an error-diffusion method that processes the pixels of each
scan line from left to right and top to bottom. Each pixel is examined and
rounded to black or white. For example, if it was rounded to black, then the
picture is now too black, and we compensate for this by making the neighboring
gray pixels a little lighter (we diffuse the error) so that the sum of all the
pixels' gray values is unchanged. Specifically, the pixels neighboring to the
east, south, southeast, and southwest are adjusted. Floyd and Steinberg
empirically distributed more of the error to some pixels than others; they
used the weights:
 0 x 7/16 3/16 5/16 1/16
where x is the pixel under consideration.


Computerized Blurring


The reason halftones and dithering work is that the eye blurs the dots into
shades of gray, so the first step in undithering is to mechanically smooth or
blur the image. The human eye does this without using any complicated
mathematics, but computers need an algorithm.
I processed the images in Figure 1 with NIH Image, a widely available
public-domain image processing and analysis program written and distributed in
source form (Think Pascal) by Wayne Rasband at the National Institutes of
Health. NIH Image, which runs on the Mac II, provides most needed transforms,
and allows the addition of user-written subroutines for specialized
processing. (Because of its size, NIH Image is available only electronically;
see "Availability" on page 5.)
More Details.
Computerized blurring is done by replacing each pixel's value with the average
of the pixels in a small region around it. (In NIH Image each pixel's gray
value is represented by a number from 0 to 255, with 255 representing black.)
NIH Image has a built-in smooth function, which averages a 3x3 square centered
at the pixel. The important grays of 50 and 25 percent black dither into
checkerboard patterns; applying the smooth function to them produces a gray
checkerboard instead of a constant gray, because an odd number of pixels is
being averaged. A 4x4 square averages an even number of pixels and produces
exactly the right gray level from these checkerboards. It also seems to
produce better results in undithering, so the first step is to blur the
dithered image with a 4x4 square.
To get a 4x4 average, you can use the convolve function. The word "convolve"
literally means "roll together," and a convolution rolls together the pixels
in a region as a weighted average and places it in the center pixel. The
user-provided convolution kernel is just an array of weights for the pixels.
For a 4x4 average, we therefore use the convolution kernel:
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Smoothing also has the desirable feature of increasing the number of gray
levels present. From a gray-scale viewpoint, the largest deficiency of a
dithered image is that it has a drab two levels of gray, not the rich 256
levels of the original picture. It is this lack of variety that prevents image
enhancements from working on dithered images.


Adaptive Smoothing


For Floyd-Steinberg dither, the first step by itself usually produces an
almost-realistic gray-scale image, and you could stop there. The two most
obvious imperfections of the first step are that the picture often looks
mottled and that the smoothing has blurred some details. The mottling occurs
because there are not enough grays in the gray scale (only 17 shades compared
to the original 256), causing a continuous shading of gray to be mapped into
discrete and visibly different grays.
Smoothing blurs indiscriminately. We would like to smooth out the mottled
areas without disturbing the high-contrast areas that represent the edges of
objects. We can do this with an optimal filtering technique, Lee's
local-statistics method (see "Optimal Estimation"). This method does what we
want by providing a variable amount of smoothing in different regions of the
picture. This is implemented as a user subroutine in NIH Image
(LeeLocalStatistics in Listing One). For F-S dither, a 3x3 smoothing works
well. To obscure the more blatant patterns in ordered dither, a 5x5 smoothing
works better, although it also blurs more details of the picture.


Finishing Touches


The third enhancement is to recover some details that have been blurred by the
previous two stages. For this we apply a mild sharpening filter. Sharpening
filters are also convolutions; the convolution kernel we use is:
 -1 -1 -1 -1 50 -1 -1 -1 -1
(Be sure to turn off Scale Convolutions in the Preferences to prevent the
contrast from being reduced.) Sharpening is the opposite of smoothing, and we
can only do a little without undoing the effects of the previous smoothings.
The result is shown in Figure 1(c). It is a little blurred, but is otherwise a
good likeness of the original.


References



Bryson, Jr. Arthur E. and Yu-Chi Ho. Applied Optimal Control. Waltham, MA:
Blaisdell, 1969.
Floyd, Robert and Louis Steinberg. "An Adaptive Algorithm for Spatial Gray
Scale." Society for Information Display Digest (1975).
Judice, C.N., J.F. Jarvis, and W.H. Ninke. "Using Ordered Dither to Display
Continuous Tone Pictures on an AC Plasma Panel." Proceeding of the Society for
Information Display (1974).
Knuth, Donald E. "Digital Halftones by Dot Diffusion." ACM Transactions on
Graphics (October, 1987).
Lee, Jong-Sen. "Digital Image Enhancement and Noise Filtering by Use of Local
Statistics." IEEE Transactions on Pattern Analysis and Machine Intelligence
(March, 1980).


Optimal Estimation


Optimal estimation is a group of statistical methods for solving the problem
of making a "best guess" of a measured parameter's true value when the
measurement is corrupted by "noise" (a random, undesired addition to the value
of interest). It is widely used in control applications like radar tracking.
With dithering, you can use it to estimate the true gray levels of a picture
that has been corrupted by dithering.
After the first stage of our undithering process we have a picture with 17
gray levels, and we know the original picture had 256 gray levels. The most
optimistic interpretation of the 17-level picture would be that each original
gray level had been rounded to the nearest of the 17 gray levels. (The picture
has been "posterized.") The difference between the original value and the
rounded value is the noise.
Lee's local statistics method uses a least-mean-squares estimator. To apply
this method, we need estimates of the mean and variance of both the true
signal (the original picture) and the noise (the error introduced by dithering
and blurring). The optimal estimate (see Bryson and Ho) is: estimate=mean +
gain* (observed value-mean)=gain* observed value+(1-gain)* mean value, where
the gain is calculated as: local variance/(local variance+noise variance).
None of these means or variances is known to us directly; we can estimate the
mean and variance of the original picture at each point by calculating the
mean and variance of a small square centered at the point in the blurred
picture. (These are the "local statistics," as opposed to "global statistics"
that would be calculated for the entire picture.) We can estimate the mean and
variance of the noise as follows: Since the 17 gray levels are spaced every 16
grays, the maximum error from rounding would be 8, and the error function
would be a uniform distribution from -8 to +8. The mean is therefore 0 and the
variance is 16**2/12 = 256/12, or about 21. Since the picture really went
through the even more corrupting process of dithering, we would expect the
true variance to be somewhat larger, and experimentation shows that 150
produces good results in the algorithm.
Note that the gain is always between 0 and 1 (inclusive), so the estimate is a
weighted average of the observation and the mean. If the local variance is
small compared to the noise, the gain is close to 0 and we use mostly the mean
value. (This is because a small local variance suggests that the noise is the
source of most of the variation, and we therefore wish to cancel it out by
averaging.) If the local variance is large compared to the noise, the gain is
close to 1, and we use mostly the observed value. (This suggests that most of
the variation is due to true differences in the gray levels and little is due
to noise, so we essentially ignore the noise and believe the observed value is
very close to the true value.) --A.S.



[LISTING ONE]

unit User;
{ This is an addition to, and incorporates parts of, the NIH Image program. }
{ NIH Image is written by Wayne Rasband at the National Institutes of Health }
{ and is in the public domain. This addition was written by Allen Stenger, }
{ March 1992. Written in THINK Pascal version 4.0.1. }
{ Replace the User.p supplied with Image with this one. Be sure to uncomment }
{ the call to InitUser in Image.p. If you have a small display you may need }
{ to use ResEdit to shorten the names of the other menu items in Image.rsrc }
{ so the User menu (which comes last) won't be pushed off the end. Use }
{ ResEdit to modify the User Menu in Image.rsrc to make the items Lee Local }
{ Statistics, Ordered Dither, Floyd-Steinberg Dither. }
{ Algorithm references: }
{ Ordered dither: C.N. Judice, J.F. Jarvis, and W.H.Ninke, "Using }
{ Ordered Dither to Display Continuous Tone Pictures on an AC Plasma }
{ Panel." Proceeding of the Society for Information Display v. 15 }
{ no. 4 (Fourth Quarter 1974), not paged. Reprinted in: John C. }
{ Beatty and Kellogg S. Booth (editors), Tutorial: Computer }
{ Graphics, 2nd edition. Silver Spring, MD: IEEE Computer Society }
{ Press, 1982, pp. 220-228.}
{ Lee local statistics: Jong-Sen Lee, "Digital Image Enhancement and }
{ Noise Filtering by Use of Local Statistics." IEEE Transactions on }
{ Pattern Analysis and Machine Intelligence, v. PAMI-2, no. 2 (March }
{ 1980), pp. 165-168. Reprinted in: Rama Chellapa and Alexander A. }
{ Sawchuk (eds.),Digital Image Processing and Analysis v. 1. Silver }
{ Spring, MD: IEEE Computer Society Press, 1985, pp. 440-443. }

interface
 uses
 QuickDraw, Palettes, PrintTraps, globals, Utilities, Graphics, Analysis,
 Camera, Functions;
 procedure InitUser;
 procedure DoUserMenuEvent (MenuItem: integer);
implementation
 type
 UserFilterType = (LeeLocalStats, OrderedDither, FloydSteinbergDither);
 procedure InitUser;
 begin
 UserMenuH := GetMenu(UserMenu);

 InsertMenu(UserMenuH, 0);
 DrawMenuBar;
 end;
{ Most of UserFilter is copied with minor modifications from Image (procedure
}
{ Filter in Functions.p). The new parts are the Lee local statistics and }
{ ordered dither code. Floyd-Steinberg dither is copied from Filter. }
 procedure UserFilter (filterType: UserFilterType);
 const
 PixelsPerUpdate = 5000; { controls screen updating }
{ constants for Lee local statistics method }
 NoiseVariance = 150; { empirical value for Lee method }
{ constants for ordered dither }
 DitherSize = 8; { dimensions of ordered dither matrix }
 DitherSizeMinus1 = 7; { ditto minus 1 }
 type
 DitherPattern = array[0..DitherSizeMinus1, 0..DitherSizeMinus1] of 0..255;
 var
{ general variables for this procedure }
 row, width, r1, r2, r3, c, value, error, sum, tmp, center: integer;
 mark, NewMark, LinesPerUpdate, LineCount: integer;
 MaskRect, frame: rect;
 L1, L2, L3, result: LineType;
 pt: point;
 AutoSelectAll, UseMask: boolean;
 StartTicks: LongInt;
{ variables for Lee local statistics method }
 localVariance: longint;
 localMean: longint;
 gain: real;
 i: integer; { loop control }
{ variables for ordered dither }
 thePattern: DitherPattern;
 procedure PutLineUsingMask (h, v, count: integer;
 var line: LineType);
 var
 aLine, MaskLine: LineType;
 i: integer;
 SaveInfo: InfoPtr;
 begin
 if count > MaxPixelsPerLine then
 count := MaxPixelsPerLine;
 GetLine(h, v, count, aline);
 SaveInfo := Info;
 Info := UndoInfo;
 GetLine(h, v, count, MaskLine);
 for i := 0 to count - 1 do
 if MaskLine[i] = BlackIndex then
 aLine[i] := line[i];
 info := SaveInfo;
 PutLine(h, v, count, aLine);
 end;
 procedure MakeDitherPattern (var p: DitherPattern);
 var
 row: 0..DitherSizeMinus1;
 column: 0..DitherSizeMinus1;
 halfsize: 1..DitherSize;
 scaleFactor: 1..256;
 begin
 { The pattern is defined recursively; we implement the recursion }

 { as an iteration. }
 p[0, 0] := 0;
 halfsize := 1;
 while halfsize < DitherSize do begin
 for row := 0 to halfsize - 1 do
 for column := 0 to halfsize - 1 do begin
 p[row, column] := 4 * p[row, column];
 p[row, column + halfsize] := p[row, column] + 2;
 p[row + halfsize, column] := p[row, column] + 3;
 p[row + halfsize, column + halfsize] := p[row, column] + 1;
 end;
 halfsize := halfsize * 2;
 end;
 { adjust scaling for pixel ranges 0..255 }
 scaleFactor := 256 div SQR(DitherSize);
 for row := 0 to DitherSizeMinus1 do
 for column := 0 to DitherSizeMinus1 do
 p[row, column] := scaleFactor * p[row, column] + scaleFactor div 2;
 end; {MakeDitherPattern}
 begin
 if NotinBounds then
 exit(UserFilter);
 StopDigitizing;
 AutoSelectAll := not Info^.RoiShowing;
 if AutoSelectAll then
 with info^ do begin
 SelectAll(false);
 SetPort(wptr);
 PenNormal;
 PenPat(pat[PatIndex]);
 FrameRect(wrect);
 end;
 if TooWide then
 exit(UserFilter);
 ShowWatch;
 if info^.RoiType <> RectRoi then
 UseMask := SetupMask
 else
 UseMask := false;
 WhatToUndo := UndoFilter;
 SetupUndoFromClip;
 ShowMessage(CmdPeriodToStop);
 frame := info^.RoiRect;
 StartTicks := TickCount;
 {Set up for ordered dither }
 if filterType = OrderedDither then
 MakeDitherPattern(thePattern);
 with frame, Info^ do begin
 changes := true;
 RoiShowing := false;
 if left > 0 then
 left := left - 1;
 if right < PicRect.right then
 right := right + 1;
 width := right - left;
 LinesPerUpdate := PixelsPerUpdate div width;
 GetLine(left, top, width, L2);
 GetLine(left, top + 1, width, L3);
 Mark := RoiRect.top;

 LineCount := 0;
 for row := top + 1 to bottom - 1 do begin
 {Move Convolution Window Down}
 BlockMove(@L2, @L1, width);
 BlockMove(@L3, @L2, width);
 GetLine(left, row + 1, width, L3);
 {Process One Row}
 if CommandPeriod then
 exit(UserFilter);
 case filterType of
 LeeLocalStats:
 for c := 1 to width - 2 do begin
 localMean := (L1[c] + L1[c + 1] + L1[c + 2]
 + L2[c] + L2[c + 1] + L2[c + 2]
 + L3[c] + L3[c + 1] + L3[c + 2]) div 9;
 localVariance := 0;
 for i := 0 to 2 do begin
 localVariance := localVariance + SQR(L1[c + i]
 - localMean);
 localVariance := localVariance + SQR(L2[c + i]
 - localMean);
 localVariance := localVariance + SQR(L3[c + i]
 - localMean);
 end;
 localVariance := localVariance div (3 * 3);
 if OptionKeyWasDown then { do extra smoothing }
 gain := localVariance /
 (localVariance + NoiseVariance * 16.0)
 else
 gain := localVariance / (localVariance + NoiseVariance);
 result[c - 1] :=
 round(localMean + gain * (L2[c + 1] - localMean));
 if result[c - 1] > 255 then
 result[c - 1] := 255;
 if result[c - 1] < 0 then
 result[c - 1] := 0;
 end; {LeeLocalStats}
 OrderedDither:
 for c := 1 to width - 2 do begin
 if L2[c + 1] >=
 thePattern[row mod DitherSize, c mod DitherSize] then
 result[c - 1] := 255 { dither to black pixel }
 else
 result[c - 1] := 0; { dither to white pixel }
 end; {OrderedDither}
 FloydSteinbergDither:
 for c := 1 to width - 2 do begin
 value := L2[c + 1];
 if value < 128 then begin
 result[c - 1] := 0;
 error := -value;
 end
 else begin
 result[c - 1] := 255;
 error := 255 - value
 end;
 tmp := L2[c + 2]; {A}
 tmp := tmp - (7 * error) div 16;
 if tmp < 0 then

 tmp := 0;
 if tmp > 255 then
 tmp := 255;
 L2[c + 2] := tmp;
 tmp := L3[c + 2]; {B}
 tmp := tmp - error div 16;
 if tmp < 0 then
 tmp := 0;
 if tmp > 255 then
 tmp := 255;
 L3[c + 2] := tmp;
 tmp := L3[c + 1]; {C}
 tmp := tmp - (5 * error) div 16;
 if tmp < 0 then
 tmp := 0;
 if tmp > 255 then
 tmp := 255;
 L3[c + 1] := tmp;
 tmp := L3[c]; {D}
 tmp := tmp - (3 * error) div 16;
 if tmp < 0 then
 tmp := 0;
 if tmp > 255 then
 tmp := 255;
 L3[c] := tmp;
 end; {FloydSteinbergDither}
 end; {case filterType}
 if UseMask then
 PutLineUsingMask(left + 2, row, width - 3, result)
 else
 PutLine(left + 2, row, width - 3, result);
 LineCount := LineCount + 1;
 if LineCount = LinesPerUpdate then begin
 pt.h := RoiRect.left;
 pt.v := row + 1;
 NewMark := pt.v;
 with RoiRect do
 SetRect(MaskRect, left, mark, right, NewMark);
 UpdateScreen(MaskRect);
 LineCount := 0;
 Mark := NewMark;
 if magnification > 1.0 then
 Mark := Mark - 1;
 if CommandPeriod then begin
 UpdatePicWindow;
 beep;
 if AutoSelectAll then
 KillRoi;
 exit(UserFilter)
 end;
 end;
 end; {for row:=...}
 trect := frame;
 InsetRect(trect, 1, 1);
 ShowTime(StartTicks, trect, '');
 end; {with}
 if LineCount > 0 then begin
 with frame do
 SetRect(MaskRect, left, mark, right, bottom);

 UpdateScreen(MaskRect)
 end;
 SetupRoiRect;
 if AutoSelectAll then
 KillRoi;
 end;
 procedure DoUserMenuEvent (MenuItem: integer);
 begin
 case MenuItem of { User menu must be set up in this order }
 1:
 UserFilter(LeeLocalStats);
 2:
 UserFilter(OrderedDither);
 3:
 UserFilter(FloydSteinbergDither);
 end;
 end;
end.












































November, 1992
DESIGNING A REAL-TIME DEBUGGER


The best of both worlds




David Potter


David is the president of Concurrent Sciences, developer of the Soft-Scope
debugger. He can be contacted at 530 S. Asbury, Moscow, ID 83843.


Theoretically, designing preemptive multitasking programming tools for
single-tasking operating environments is an academic exercise. However,
commercial operating systems like Intel's iRMX for Windows make it possible to
write real-time, deterministic applications for environments such as
single-tasking DOS or cooperative-multitasking Microsoft Windows. As Table 1
illustrates, Windows and iRMX for Windows are each complex operating systems
in their own right, with their respective strengths and weaknesses
complementing each other.
Table 1: Windows vs. iRMX.

 Microsoft Windows iRMX Operating System
 -------------------------------------------------------------------------

 Graphical user interface (+) Line-oriented interface (-)
 Runs standard Windows and
 Dos applications (+) Runs only iRMX applications (-)
 No-preemptive, cooperative
 multitasking (-) Preemptive, priority-based multitasking (+)
 Not real-time, not
 deterministic (-) Real-time and deterministic (+)

But designing programming tools--particularly a debugger--for this "mixed"
world is more complex than building tools for a single environment because the
tools must draw on the best of both worlds. This article describes the design
and implementation of a debugger we wrote for the iRMX for Windows
environment. Building the debugger posed numerous challenges. For one thing,
the debugger required a graphical, windowed interface and it had to cooperate
with other Windows and DOS applications, while understanding and taking
advantage of iRMX operating-system features. What we learned in the process
should be valuable to anyone writing iRMX for Windows applications.


Partitioning the Debugger


Our first step was to partition debugger functions into two groups: those
performed under Windows and those performed under iRMX; see Table 2. To take
advantage of the graphical interface, all user interaction needed to be on the
Windows side, while task control and status had to be on the iRMX side. We
wrote the Windows component (the "debugger") using the Microsoft Windows SDK;
the iRMX component (the "kernel") was written using Intel's iRMX development
tools.
Table 2: Partitioning the debugger workload.

 Under Microsoft Windows Under iRMX Operating System
 -----------------------------------------------------------------

 User interface Loading and running iRMX applications
 Program symbolics decoding Task control (breakpoints, stepping,
 switching tasks)
 Memory and source display Task status (at breakpoint, GP fault,
 running, and so on)
 Information about iRMX system objects
 (mailboxes, tasks, and so on)

Designing this debugger was more like writing a cross-debugger than a native
debugger, since the host environment (Microsoft Windows) and the target
environment (Intel's iRMX) are so different. In a typical cross-debugger, the
host and target environments are totally separate, run on separate CPUs in
different enclosures, and communicate through a serial connection. In fact,
the functional partitioning for this debugger and much of the source code came
directly from Soft-Scope III/CSiMON, our existing cross-debugger, which is
PC-hosted and communicates serially to a target-resident monitor, CSiMON
(normally embedded in EPROM).
In the case of the iRMX for Windows debugger, the host and target systems run
on the same hardware and, instead of communicating serially, they communicate
via iRMX mailboxes between the Windows-resident debugger and the iRMX-resident
kernel; see Figure 1. When the debugger needs information about the state of a
given task or its current register set and execution location, for instance,
it sends a request to the command mailbox, where the kernel waits. The kernel
then satisfies the request, sending the results back through a response
mailbox. Similarly, the debugger sends a command to the kernel to start and
stop execution of the iRMX task being debugged, and information about the
success or failure of the operation returns through the response mailbox.
To accomplish this, we used Intel's Real-Time Extension (RTE) library (a set
of iRMX functions that operate via a software interrupt), switching context to
the iRMX environment, performing the operation, then returning to the caller.
The RTE library gave us a way of sending and receiving iRMX messages, reading
and writing iRMX memory (not directly addressable by a Windows or DOS
program), and cataloging and looking up system objects (such as mailboxes) in
an iRMX object directory.
Under Windows, for instance, if the debugger must send a message through an
iRMX mailbox to a real-time task running concurrently on the iRMX side, we can
use the rqsendmessage system call. The request begins under Windows (or DOS)
and, through the RTE library, is sent via a software interrupt into iRMX. For
example, when the debugger requires the current execution point of the task
being debugged, it formats a request for the contents of the CS and IP
registers. It sends this request to the debugger kernel waiting on the other
side at its command mailbox. The debugger then waits at another mailbox for a
response to its request.
This sounds easy enough, but it turns out that a message sent to an iRMX
mailbox must be in the form of an iRMX object, normally an iRMX memory
segment. Furthermore, filling the iRMX memory segment with the necessary data
requires the use of the rqereadsegment RTE library function, which was created
specifically to be called from Windows. Because the memory spaces of Windows
and iRMX are totally different, a segment: offset reference made from Windows
is incompatible with a segment:offset reference from the iRMX side, and vice
versa. Also, the iRMX memory segment sent was explicitly created. This makes
sending and receiving a message a three-step process:
1. Create an iRMX segment with rqcreatesegment.
2. Write the message into the iRMX segment with rqewritesegment.
3. Send the message using rqsendmessage.
These functions can be performed from Windows using RTE calls. Example 1 shows
the contents of cmdbuff being copied to the iRMX cmd_seg segment and sent to
the iRMX sskernel_cmdmbx mailbox. To receive and read the return message, the
process is reversed:

Example 1: Contents of cmdbuff being copied to the iRMX cmd_seg segment and
sent to the iRMX sskernel_cmdmbx mailbox.

 . . .

 cmd_seg = rqcreatesegment (strlen (cmd), &status);
 rqewritesegment (cmdbuff, cmd_seg, 0, strlen (cmd), &status);
 rqsendmessage (sskernel_cmdmbx, cmd_seg, 0, &status);

 .
 .
 .

1. Receive the message using rqreceivemessage.
2. Read the message from iRMX segment into Windows-addressable memory.
3. Delete the iRMX segment with rqdeletesegment.
Example 2 shows the segment resp_seg being received from the sskernel_respmbx
mailbox and the contents of the segment being written to resp_buff.
Example 2: resp_seg being received from the sskernel_respmbx mailbox, and the
contents of the segment being written to resp_buff.

 . . .

 resp_seg = rqreceivemessage (sskernel_respmbx, 0, &dummbx, &status);
 rqereadsegment (resp_seg, 0, resp_buff, count, &status);
 rqdeletesegment (resp_seg, &status);

 .
 .
 .



Multiple, Real-time Tasks


Once the debugger is communicating with the kernel, the next problem concerns
the multiple, real-time tasks to be debugged on the iRMX side. How do we keep
track of and control these iRMX tasks, which march to the beat of a completely
different drummer than the debugger?
Remember those carnival performers who place spinning plates on top of a
series of poles and keep them all spinning by frantically running from one to
the next? If they concentrated on only one plate, the others would topple off
their supports and clutter the stage with broken dishes.
We faced a similar situation. While it is normally the job of the operating
system to transparently switch from one task to another , in this case the
debugger needed to move quickly and easily from task to task. Furthermore, it
had to single-step through the code of one task while other tasks operated
concurrently in the background. Thus, the operation of one task could be
isolated from that of another, and the system-critical tasks could function
while debugging other tasks in the application. Also important was the
capability to simultaneously set breakpoints in multiple tasks, so that one
task doesn't get too far ahead when the user stops to examine another.
Finally, the debugger needed to display all the tasks currently at break, so
the user could select any given task and make that the execution environment.
Because the debugger focuses on a single task at a time, its basic operation
is, on the Windows side, similar to that of a single-tasking debugger. The
Windows side involved considerations such as making task information available
and providing an interface so the user could switch from one task to another,
but our biggest challenge was in the kernel on the iRMX side. We needed to
allow several tasks to be simultaneously at their breakpoints, and had to be
able to switch from one task to another when the debugger commanded. Here, we
definitely had to run under iRMX in order to take over any task encountering a
breakpoint, save its current register state, and communicate through a set of
internal mailboxes managed by the kernel. Any task at break would dutifully
wait at a mailbox for a command from the debugger, and respond with
information about its register set, point of execution, and task state. It is
the kernel's job to manage this so that, for example, if three tasks are all
at break, the kernel can channel communication to and from each of the tasks
on demand from the debugger. Suppose the debugger had a request for the
register contents of task #2. The order of events would be as follows:
1. Request for register contents of task #2 is sent from the debugger to the
kernel.
2. Kernel receives the message, determines the proper mailbox for task #2, and
passes the request along.
3. In task #2's context, the register request is received, and a response is
sent back to the debugger.
Figure 2 shows what this might look like. On the Windows side, the debugger
must be aware of any changes in the tasks under debug, but its only knowledge
of them is through the kernel. What would happen if, for example, a task that
the debugger showed at break was deleted or suspended on the iRMX side?
Our solution was to have the debugger send commands to the kernel requesting
current information on the user tasks and jobs currently under investigation
(and displayed in the Soft-Scope Tasks window). The kernel responds with a
list of tasks currently at break, including where they are broken and how they
got there.


How Not to Receive an iRMX Message


The iRMX receive-message call (rqreceivemessage) has a parameter that allows
the caller to wait for a message for a specified length of time. We
discovered, however, that this parameter must be used with care. This
timed-wait capability is important for a real-time application, since you can
request a message even before it arrives and let the operating system wake you
either when the message arrives or when the requested time has elapsed,
whichever comes first. This way, synchronization with a cooperating task is
simple and straightforward. And, even under Windows, this works fine, as long
as nothing important must be done on the Windows side while you're waiting.
Remember that, while Windows is a multitasking operating system, iRMX thinks
of it as a single task, so a request to be put to sleep for a specified time
is tantamount to freezing all of Windows for the duration. Other iRMX tasks
can run while you're waiting, but Windows is dead! This is especially
problematic if a Windows process needs to complete before a task on the iRMX
side can send the response we're waiting for.
This is exactly the situation we were in when the debugger initiated the
loading of an iRMX application. A separate Windows application was spawned
(via WinExec) which created a window for the iRMX application to use and then
communicated to the iRMX operating system to load the application. After the
application load completed, we expected a message to be sent back to the
debugger via an iRMX mailbox. Immediately after the WinExec, we tried a
rqreceivemessage with a reasonable time-out and waited for a response. No
matter how long the time-out was, we never got our message back. Worse than
that, Windows was dead for the entire duration of the time-out!
The problem was that immediately after starting the Windows application, our
rqreceivemessage put the debugger and Windows to sleep, waiting for
completion. Deadlock. We never received the message, no matter how long we
waited!
The solution was to allow not only iRMX tasks to run while the debugger was
waiting (this is actually quite easy, since the debugger and Windows are
running at the lowest possible priority in the iRMX operating system), but to
let Windows run as well. We accomplished this by:
1. Not using the timed-wait with rqreceivemessage(), so control immediately
returned.
2. Waiting under Windows for a specified time, say one second.
3. Going back to #1 to check again, continuing for as many seconds as desired.
The iRMX sleep function allows an iRMX task to go to sleep for a certain
amount of time while other iRMX tasks run, but Windows doesn't have the exact
analog. And calling the iRMX rqsleep function after checking the mailbox only
re-created the problem we were trying to avoid. Windows does provide a Yield
function, which yields to other Windows applications, but not a way to wait
for a specified amount of time.
A function can be created, however, which does wait for at least the specified
time by using Yield in conjunction with GetCurrentTime. The newly created
function calls Yield repeatedly until the requested amount of time has
elapsed. The time elapsed since the last call to Yield can be calculated by
comparing successive results from GetCurrentTime. When the requested amount of
time has elapsed, then the function returns. This has the desired effect of
allowing all other Windows or iRMX tasks to run while we wait for the event.
In Example 3, the win_sleep function provides a sleep function which allows
other Windows applications to run, and returns control after the requested
number of seconds have passed.
Example 3: win_sleep provides a sleep function which allows other Windows
applications to run and returns control after the requested number of seconds
have passed.

 void win_sleep(

 signed int requested) /* Number of seconds to sleep. */
 {
 auto signed long sleep; /* Units of 1/1000th of a sec. */

 d_time (); /* Uses GetCurrentTime for elapsed
 secs. */
 sleep = 1000 * requested; /* Seconds * 1000 = msecs. */

 while (sleep > 0) {
 Yield()
 sleep - = d_time(); /* Subtract elapsed secs since last
 time. */
 }
 }



Conclusion


Not all iRMX for Windows applications will be as complex as this one, but
there will probably be many similarities between our experience in designing
the Soft-Scope debugger and the design and implementation of other iRMX for
Windows applications. To utilize the real-time features of the iRMX system and
combine them with the Microsoft Windows interface, partition your application
into a Windows component and an iRMX component. Then design a method of
communication and synchronization. In our case, we used iRMX mailboxes, but
other methods are possible, including the use of dynamic data exchange (DDE)
which is supported by the iRMX for Windows operating system.
_CONVERTING DITHERED IMAGES BACK TO GRAY SCALE_
by Allen Stenger


_DESIGNING A REAL-TIME DEBUGGER_
by David Potter

[EXAMPLE 1]

 .
 .
 .

cmd_seg = rqcreatesegment (strlen (cmd), &status);
rqewritesegment (cmdbuff, cmd_seg, 0, strlen(cmd), &status);
rqsendmessage (sskernel_cmdmbx, cmd_seg, 0, &status);

 .
 .
 .


[EXAMPLE 2]


 .
 .
 .

resp_seg = rqreceivemessage(sskernel_respmbx, 0, &dummbx, &status);
rqereadsegment (resp_seg, 0, resp_buff, count, &status);
rqdeletesegment (resp_seg, &status);

 .
 .
 .



[EXAMPLE 3]


void win_sleep(
 signed int requested) /* Number of seconds to sleep. */
{
 auto signed long sleep; /* Units of 1/1000th of a second. */

 d_time (); /* Uses GetCurrentTime for elasped secs */
 sleep = 1000 * requested; /* Seconds * 1000 = msecs. */

 while (sleep > 0) {
 Yield();
 sleep - = d_time(); /* Subtract elapsed sec since last time. */
 }
}














































November, 1992
TIME DILATION AND RELATIVISTIC DEBUGGING


Looking for patterns in the real-time behavior of code




Edward N. Adams III


Ed is an independent contractor specializing in analysis and synthesis of
microprocessor software. He can be contacted at 2782 Waverley Street, Palo
Alto, CA 94306.


Time is Nature's way of keeping everything from happening at once.
-- Woody Allen
The fundamental question related to debugging is, "What happened, in what
order, and when?" To answer this, you need to represent the time relations
among events. Breakpoints and debug print statements suffice for a short
problem, but they demand a lot of your time.
Some bugs, however, are manifest in a deterministic manner after thousands of
events. Others are part of a nondeterministic trajectory through the code. In
either case, you can debug much quicker using compact graphic displays. In a
fraction of a second, you can read hundreds of events from such a display.
Thus you can see both patterns and exceptions, and in many cases keep up with
the real-time behavior of the code. Time dilates when you're debugging this
fast, and you are doing relativistic debugging.
This article addresses the challenge of portraying time, both in the large and
in the small. Several methods are presented for representing a single stream
and multiple streams of events. The examples come from two different
real-world applications. The first, represented in Figure 1 through Figure 6,
is an embedded controller for a logic-speed data-acquisition device. It runs
under pSOS, a real-time, multitasking operating system from the Software
Components Group (San Jose, California), on a Motorola 68000 processor. It
communicates with a host computer via SCSI. Although data is transmitted DMA,
a few of the protocol bytes are transmitted one at a time, using the
_scsi_send_char routine.
The second system, illustrated in Figure 7-Figure 9, is an avionics display
subsystem running on a Performance Semiconductor 1750. The only task consists
of a top-level dispatcher and 12 major subroutines. The scheduling algorithm
is based on time and availability of inputs. Thus, the system can be viewed as
a multitasking system with round-robin scheduling. In this system, the
variable Process_phase acts as the current task index. The Process_6 routine
is expected to have work to do about every 30-33 milliseconds.
All figures were captured and displayed on Biomation's (Milpitas, California)
Variable Value Monitor (VVM-1), a logic analyzer-like instrument designed to
track and debug code.


One Variable vs. Time


Sometimes you can see both the erroneous and the normal behavior of a software
system in a single stream of data. Examples are:
The changes in variable values (Figure 1).
An instruction trace (Figure 2).
Time differences between points A and B (Figure 3).
Ordinal Time. Figure 1-Figure 3 represent ordinal time. This means that each
represents the order of events, without showing the amount of time between
them. So, in Figure 1, the variable Current_task points to one of three tasks:
RAM01, CHILD, or SYSMGR. From left to right, the values are CHILD, SYSMGR,
RAM01, CHILD, RAM01, and CHILD. This chart shows the pattern of task-switching
behavior during a fraction of a second of execution.
Figure 2 shows an Address Scope monitoring the instructions of the
_scsi_send_char subroutine. The y coordinate is the relative address in the
subroutine; the x coordinate is ordinal time. A descending diagonal line
represents one linear execution of the subroutine. Horizontally arrayed line
segments indicate separate iterations of one loop. This picture shows eight
calls to the subroutine, with two loops. The first loop iterates a variable a
number of times. The C code for this subroutine (see Example 1) clarifies the
reason for the looping.
Example 1: Code for subroutine _scsi_send_char.

 scsi_send_char(data)
 byte data;
 {
 int tries;

 scsi->write.data = data;

 /* Wait for ACKNOWLEDGE to go away. check for timeout */
 while (scsi->read.bus_and_status & BIT (0))
 ;

 /* Assert REQ so that MAC can accept data. */
 scsi->write.target_command = BIT(3);

 /* Wait till ACK is asserted. */
 while(!(scsi->read.bus_and_status & BIT (0)))
 ;

 /* DeAssert REQ now that ACK has been asserted. */
 scsi->write.target_command &= ~BIT(3);

 /* Wait for ACKNOWLEDGE to go away. check for timeout */
 while(scsi->read.bus_and_status & BIT (0))

 ;

 return 0;
 }

Both in debugging and in performance analysis, measurement of time difference
between two related events can be informative. In Figure 3, a Point A to Point
B Strip Chart plots the execution time of subroutine _scsi_send_char. When the
program gets to A, we note the time. When it gets to B, we note the time,
subtract the two, and plot a point logarithmically on the y axis. Once again,
the subroutine calls are displayed from left to right. This chart compactly
represents over 70 calls to the subroutine. It's easy to see the usual
behavior (deltat=2000 ticks) and a few outliers (deltat>5000 ticks).
The major strength of an ordinal time display is that it portrays the fine
detail of the pattern of behavior.
Cardinal Time. Many people request to see the events in Figure 1-Figure 3
spaced out in time. When deltax on the display is proportional to deltat, the
display is said to have a cardinal time scale. Figure 4(a) shows, in cardinal
time, the data for the first part of Figure 2. In cardinal time, you can see
the large gaps in time between a small number of clusters of events. Changing
the time scale, as in Figure 4(b), uncovers the fine detail.
ILTI Time. How do you show both the size of the time differences and the fine
structure of individual bursts in one display? One method is the Integrated
Log Time Interval, or ILTI (rhymes with "guilty") scale. It is similar to
ordinal time, except that adjacent events are separated by a distance
proportional to log (deltat). Compare Figure 5, an ILTI Address Scope, with
Figure 2 and Figure 4. The fine structure is as visible as in the ordinal time
display, but high-order time differences are also noticeable. For example, you
can see the time gaps between calls (the largest one being before the fifth
call). Furthermore, the lack of horizontal gaps elsewhere indicates that no
subroutine calls or interrupts intruded.
Plus deltaT. An alternative method of representing both the fine and global
structures of time is to add a parallel deltaT Scope, as in Figure 6. The
lower chart plots the time differences between successive events
logarithmically on the y axis. Each event on the Address Scope corresponds to
the time difference directly below it on the deltaT Scope. In this chart, most
intervals are near one micros, corresponding to uninterrupted instructions.
The seven much longer intervals correspond exactly to the gaps between the
finish and start of the routine. Most of these gaps are 100micros, but a
couple are much longer (2 ms and 100 ms).
This chart allows more horizontal compactness, and thus more events per
horizontal inch, than the ILTI display. Furthermore, it shows the time
intervals in much more detail.


Time Correlation


Debugging frequently requires investigating the correlations between
hypothesized causes (or enabling conditions) and observed effects, both good
and bad. Such investigations are aided by viewing parallel streams of events,
and clear representation of time becomes even more important.
Figure 7-Figure 9 correlate an Address Scope and a Data Scope, but the same
principles apply to any combination of events.
Ordinal Time. The Address Scope in Figure 7 shows the beginning of the
Process_6 subroutine. The Data Scope shows the value of the Process_phase
variable, similar to a current task id. Both use the ordinal time scale. In
this case, I wanted to see the behavior before and after Process_phase takes
on the value 4. In particular, is Process_6 called exactly whenever
Process_phase takes on value 6?
In the real world, the events of the two streams occur in interleaved order.
We need to preserve this order, while separating the two streams. Whenever the
subroutine is executing, we display the Process_phase variable value as
constant. Conversely, when the variable value is changing, we display the
subroutine address off the screen (-1).
The ordinal display shows the fine structure, both within and between the
streams.
Cardinal Time. Once again, when you want to see one level of a time interval
and don't care about individual events, use a cardinal time scale. Compare
Figure 8 with Figure 7. Now it becomes perfectly clear that an event in the
Address Scope coincides with the data variable taking on the value 6. If the
data variable were merely suspected of having an effect on the subroutine,
this chart would make the coincidence obvious. Then you could investigate
further, perhaps with the ordinal time chart.
Plus deltaT. In order to see both the fine and the coarse structure, add a
deltaT Scope, as shown in Figure 9. Here the Data Scope, Address Scope, and
deltaT Scope charts are all displayed in ordinal time. This allows immediate
correlation between the streams, and between the streams and the time
differences down below.
The sub-microsecond intervals correspond to instructions of the subroutine.
The multi-millisecond intervals correspond to changes to Process_phase. The
1-millisecond peak at address 11 of the Process_6 indicates that Process_6
called a subroutine. If the peak weren't always at address 11, it would
indicate an interrupt instead.


Conclusion


Relativistic debugging requires efficient display of events, their relative
orders, and their absolute times. Different graphic representations of event
streams give these relationships different emphases. But they all share the
property of compressing hundreds or thousands of events into a comprehensible
whole.
Ordinal time emphasizes the fine structure to the total exclusion of time
intervals. Cardinal time gives a rapid view of one order of magnitude of time
interval. The ILTI time scale retains the fine structure, while adding a
flavor of the largest of time intervals. For display of all orders of
magnitude of time, the correlated ordinal deltaT graph is most effective.
In short, the ideal method of dilating time depends on the observer's state of
mind.






























November, 1992
DEBUGGING MOTIF WIDGETS


A test driver makes the job easier


 This article contains the following executables: WIDGET.ARC


Kamran Husain


Kamran is a software consultant with Mentor Programming Services in Sugarland,
Texas. He has experience in telecommunications, graphics, real-time systems
programming, Windows, and UNIX system-software development. He can be reached
at 713-265-1539 or at khx@ se44.wg2.waii.com.


Debugging Motif widgets can be a daunting task because each Motif application
brings its own unique set of problems. One approach to debugging a widget is
to use a test driver to dissect the problem into individual parts and then
analyze them. The driver confirms that each function serves as a black box in
which the outputs perform as expected, given your range of inputs.
You could use a test driver, for example, to place a widget on a form by
itself and see if it shows up with the right resources and behaves correctly.
If the widgets appear on the screen correctly, you can feel confident about
incorporating the test code into your application. If not, you should check
error messages from the server before proceeding.
In this article, I'll present a test driver and discuss some pitfalls commonly
encountered when debugging Motif widgets. Much of my discussion is based on a
recent project involving a bar chart that needed changing at the last minute
-- the client wanted the data to be displayed as a line chart. My solution was
to extend the BarChart widget to a LineChart widget, which inherits all but
the drawing functions of the BarChart widget. This included showing all forms
of graphical data on a FIFO (first-in, first-out) basis (to display readings
received on a sensor), scaling the widget's vertical axes, and increasing the
number of bars to a predefined maximum. Since I was working from the BarChart
widget's basic framework, most of my efforts dealt with debugging the
LineChart widget. The source for the BarChart and LineChart widgets are
available electronically; see "Availability," page 5.


The Test Driver


The test driver presented here puts one BarChart and one LineChart on a form
as an example. Three push buttons are also provided: Toggle, Advance, and
Done. The view toggles from a BarChart to a LineChart when you click the mouse
button on Toggle. Clicking on Advance simulates a sensor feeding a canned set
of values into both widgets and sets the view to the LineChart widget. If you
press the button several times, you'll see a jagged waveform. The Done button
simply ends the application. See Listing One (page 135) for the test-driver
code.
The main program initializes the toolkit, creates an ApplicationShell, places
a Form on itself, arranges the widgets on the Form, and finally goes into the
main loop. During this process, it sets up resources for the widgets and
creates dummy data.
The arguments for the LineChart and BarChart widgets themselves take up most
of the code listing because of the large number of available options. The bulk
of the arguments to the BarChart and LineChart widgets set the parameters for
the data and its display function. Some parameters, such as foreground and
background color, don't have to be set; they can be defaulted. However, I have
included them to illustrate the flexibility in setting up these widgets.
I commented out the debug preprocessor flags that I used to debug the test
driver. You can set them back on to get an idea of the program's flow. These
flags are: DEBUG_SWAPPING_SCREENS, DEBUG_SI_COUNTER, and DEBUG_LIMITS.
I use the global variable siCounter to cycle through the arrays to produce the
jagged waveform when the Advance button is pressed.
Note that I have used macros to locate push buttons on the Form. This greatly
improves the listing's readability but produces really strange compiler bugs
if you make a syntax error. The call to the macro is a lot simpler than block
copying the same section of code several times.
The str2pix function provides a quick way to allocate colors in Motif: It
accepts the color name as a string and returns the pixel value for it. I
usually include this function in the set of library functions I used for
development and testing.
The callback function ToggleCB() manages and unmanages the BarChart and
LineChart widgets. If you put enough drawing-area widgets with pixmaps at the
same location on a Form, and at the press of a button manage and unmanage
these widgets in a predefined manner, you will be able to achieve graphics
animation. You could drive this process by a timer function, using
XtAppAddTimerCallback().


Look at All Variables


Start testing widget-display code by debugging the code for displaying
anything in test driver routines before inserting that code into a widget.
This may seem obvious, but it's not easy to convince yourself to write drivers
for every display function. Nevertheless, it will save time later on.
In most cases, running the dbx debugger shouldn't cause problems, and it's
helpful to see a stack trace after a core dump. Sometimes you can write and
link in your own exit() function and set a breakpoint in it. Even if your
toolkit is not generated with the debugging option on, you can still look at a
widget's internal instance values. Just include the widget's private header
file instead of its public file.
To step through the test code in a debugger and examine the internals of the
widgets themselves, replace the line #include LineChart.h with the code in
Example 1. Remember to avoid setting a breakpoint in a routine which might
stop the server. This would usually be in a drawing routine or sometimes even
a callback routine. This is because expose events are sent when a widget's
drawing area is uncovered, and the widget is set to receive such events. In
many cases, my server locked up when I tried to set a breakpoint in a redraw
function. Use printf statements judiciously in such cases. Also, remember to
terminate output strings with "\n" or you may not see any debug output until
the system has to flush its output buffer.
Example 1: Use these lines to step through the source code.

 #include "BarChartP.h"
 #include "BarChart.h"
 #include "LineChartP.h"
 #include "LineChart.h"

You may even use the XtWarning function call to display the status of
variables during execution. This function uses a string as an input, so it's
probably better to use a local string and do an sprintf all nonstring
variables. The declaration for XtWarning is void XtWarning(String message);.
My debugging solution was to make extensive use of multiple printf statements
and the preprocessor's #define statements. For clarity and brevity, I've
removed the long lines of such statements from the listings. For example, I
used a #defined switch DEBUG_LIMITS for printing status messages for all
limits on the current drawing statements.
For severe errors which caused my application to die without even a decent
error message, I used the XtSetErrorHandler() function. This function returns
nothing and requires as input a pointer to a function, which in turn receives
no parameters and returns nothing, too. The declaration is void
XtSetErrorHandler(void (*ErrorFunction)());.
This ErrorFunction can be set to display the status of all global variables
whenever an error occurs. Keep in mind though that errors are fatal and that
the application exits after such an occasion, whereas warnings do not
terminate program execution. I've found this function to be less useful than a
set of well-placed printf and XtWarning statements.
It's up to the application programmer to confirm that the widget's parent is
willing to accept children. Intrinsics do not require that the parent
accepting children be composite. Adding a widget to a noncomposite widget and
not following the instructions in the parent's documentation will not cause
any errors, but also will not allow the widget to be visible.
Another annoying, though not fatal bug is the following error message, which
is issued when you try to place a widget on a Form widget: "Bailed out of edge
synchronization after 10,000 iterations. Check for contradictory constraints
on children of this form." The exact message may be different, depending on
how your libraries have been compiled. Also, the widgets you attempt to place
on the screen may not be where you want them. This error indicates that you
may be requesting incorrect attachments of this widget to the Form.
If an application generates only Xlib errors, try to isolate error messages by
running it with the -synchronous switch option. This will make it run slower,
but you'll see error messages as they happen and not when the server flushes
its queue.
Also, callback functions that appear more than once in a widget's callback
list are called the number of times they appear. Just make sure you know the
number of times you've added a callback function to a widget, and you should
be all right.
Sometimes an application requires the input focus from the pointer or the
keyboard using XSetInputFocus(). This will cause an error if the widget is
invisible by the time the server gets the message. An example is the rare (but
still probable) case of a widget checking whether it is visible during the
call to set the input focus, and the user covering the widget up by the time
the server gets the message. The only real cure is to grab the server, get the
focus, then ungrab the server. This is done via the XtAcceptFocus() call which
calls XSync(), installs a dummy error handler which ignores any BadMatch
calls, calls the handler, then finally calls XSync() again to reinstall the
previous error handler.
The drawing area for a normal widget always has its (x = 0, y = 0) origin at
the top-left corner. Most end users expect the origin to be on the bottom-left
side. Be sure to document which convention you use and stick to it. To
transpose the y coordinate to the bottom-left corner, simply subtract the y
value from the maximum height of the widget.
The listing for the BarChart widget (available electronically) includes the
code for displaying negative and positive data, and we can set the resources
to specify what limits to pose on either axes. See the Resize method in the
listings. Keep in mind that each widget is actually an X Window; therefore,
all rules that apply to an X Window generally apply to a widget as well. For
example, a widget of 0 width or height will crash an application, so always
give a minimum default height and width. Note that in the Initialization and
SetValues functions, I explicitly force a minimum height and width on the
widget and just issue a warning.

Most XtGet/SetValue parameters are documented ambiguously. In some cases, you
get a copy of the data in a widget; in others, you get a pointer to "sacred"
data in the widget. Be careful. Print out the pointer addresses returned to
you by successive calls to the widget and compare them. If the addresses are
different, then a copy is being returned, and the data pointed to will have to
be freed. If the addresses are the same, chances are that the value returned
is a pointer to some internal structure within the widget, and should not be
freed.
By the way, the Intrinsics do not have any type-conversion format for all the
parameters set in the XtSetValues or XtGetValues functions. The functions copy
numbers of bytes to and from the addresses pointed to. If the data pointed to
cannot hold the number of bytes that the Intrinsics thinks it has to write,
then data around the fields may be corrupted. If your machine supports only
longword boundary packing of members, you may sometimes be spared this
problem.
By the same token, Dimension and Cardinal types are unsigned. Do not attempt
to subtract two Dimensions without checking for wraparound.


Mind Your Memory


As I tested the LineChart widget, I realized that the XtRealloc function
sometimes muddled up the pointers in an array of pointers. It turned out that
if I called XtRealloc once to get some pointers, then called XtMalloc for the
data, and then called XtRealloc again to reallocate two or more elements to an
existing array, some of my pointers in the older array were not updated. The
older array now pointed to the newly reallocated array of pointers! For
reasons still unknown to me, this didn't occur if I reallocated in multiples
of 16. So now, I check whether I have enough space. If not, I reallocate 16
more elements and adjust the size.
It's important to free up memory used in the widget. If your application runs
out of memory for some innocuous call, see if you are using GetAttributes (for
example, a GetText() function) and not freeing the data pointer's data.
My code also confirmed that the number of calls to the server is inversely
proportional to the program's execution speed. Therefore, you should attempt
to use as few calls to the server as possible. For example, I use one call to
XDrawLines() instead of multiple calls to XDrawLine(). It's a bit of a pain to
set up, but the gains in execution are well worth it.
When writing a widget, you can either make a copy of the data and save it
internally or simply point to the data being passed in. I prefer copying the
data in and saving it, even though this requires you to copy and maintain a
copy of this data. Of course, the merit of using the pointer is that data for
the widget is volatile. That is, it might point to an area in shared memory
(perhaps where another process is writing in values). The savings in overhead
are debatable, and keeping the data available is definitely a problem.
Destruction of a widget is a two-phase process. First, the Intrinsics marks
the widgets to be destroyed. When all the references to the widget(s) waiting
to be destroyed have been removed, the storage for these widgets is
deallocated. Thus, the areas for a widget that have been malloced are not
necessarily freed between the time the widget is destroyed and the time
execution reaches the top of the loop. In other words, do not reuse data
pointed by -- or located in -- a widget after calling a destroy function and
before returning to the top of the loop. Doing so may cause you to see widget
data still present after a destroy call when single stepping through code in a
debugger.
After a widget is destroyed, your pointers to the widget will not necessarily
point to NULL; you must reset the pointers yourself.


Further Changes to the Listings


The BarChart and LineChart code is only a scaffolding for you to work from and
build on. You could always add scales and crossbars to further annotate the
data or use more than one auxItems array to display more than two lines. This
would be particularly useful if Boolean attributes were added for checking
whether the data is to be displayed or not.
Alternatively, the redraw routine could be modified to display the LineCharts
as a scatter plot, with a Boolean attribute of, say, ConnectDots. If
ConnectDots is True, lines are drawn between all data points via the
XDrawLines function; if it is False, only points are shown via the XDrawPoints
function call and the points are therefore not connected. Similarly, we could
use inheritance to display data using the values in the Aux array.
The convenience functions could be rewritten so as not to assume small arrays
of data. This is inefficient for large arrays, and I will probably use a
circular buffer in the future. However, the drawing and resizing functions
will then have to work off the head and tail pointers.
Should you decide to write more convenience functions, use static variables
for all convenience-function data that you do not want the world to see. Using
globals in widgets is not a good idea, because widgets generally reside in
libraries, and it's easy to simply link in another synonymous variable
instead.
The LineChart and BarChart widgets could inherit from a base "meta class" -- a
container class not used for displaying. The BarChart and LineChart widgets
could then inherit the common elements of this meta class.

_DEBUGGING MOTIF WIDGETS_
by Kamran Husain


[LISTING ONE]

/* This code is given as an example of sorts for you to write your own Motif
** application. Compile this program with the line:
** acc -g test.c -o testme LineChart.o BarChart.o -lXm -lXt -lX11 -lm
** where
** acc - is the ANSI compiler at your site
** -g - is the debug option (optional)
** -o - specifies the output executable filename
** -lXm - is the Motif Library
** -lXt - is the Intrinsics Library
** -lX11 - is the X11 Library
** -lm - math library.
** LineChart.o and BarChart.o are object files from the
** sources available electronically from Dr Dobbs.
** If you want to step thru this code in a debugger and examine the internals
** of the Widgets themselves, replace the next #include "LineChart.h" line
** outside the comments with
** #include "BarChartP.h"
** #include "BarChart.h"
** #include "LineChartP.h"
** #include "LineChart.h"
*/

/* Some standard include files here. */
#include <X11/X.h>
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <X11/StringDefs.h>

#include <X11/Intrinsic.h>
#include <X11/Shell.h>
#include <Xm/Xm.h>
#include <Xm/PushB.h>
#include <Xm/Form.h>
#include "LineChart.h"

/* A macro to facilitate button placement on a form. The x,y are the top left
** corner on the form. The w is the right position relative to the form's left
** side the h is the bottom position relative to the form's top side */
#define MAKE_BTN_ON_FORM(form,btn,x,y,w,h,str) \
 n = 0; \
 XtSetArg(wars[n], XmNleftAttachment, XmATTACH_POSITION ); n++; \
 XtSetArg(wars[n], XmNtopAttachment, XmATTACH_POSITION ); n++; \
 XtSetArg(wars[n], XmNbottomAttachment, XmATTACH_POSITION ); n++; \
 XtSetArg(wars[n], XmNrightAttachment, XmATTACH_POSITION ); n++; \
 XtSetArg(wars[n], XmNrightPosition, (x+w)); n++; \
 XtSetArg(wars[n], XmNleftPosition, (x)); n++; \
 XtSetArg(wars[n], XmNtopPosition, (y)); n++; \
 XtSetArg(wars[n], XmNbottomPosition, (h+y)); n++; \
 btn = XmCreatePushButton(form,str,wars,n); \
 XtManageChild(btn)
/* Remove the comments below to enable some debug levels.
 #define DEBUG_SWAPPING_SCREENS
 #define DEBUG_LIMITS
 #define DEBUG_SI_COUNTER
*/
#define ScrnWidth 500
#define ScrnHt 400
#define MAXITEMS 10
Widget mainShell, /* Application shell */
 mainForm, /* Master form */
 plotLChart, /* Line Chart Widget */
 plotBChart, /* Bar Chart Widget */
 otherPBtn, /* Toggle Push Button */
 advancePBtn, /* Advance Data in Lists Push Button */
 donePBtn; /* Exit Push Button */
/* Pixels are stored in these. */
unsigned long WhiteColor,
 BlackColor,
 BlueColor,
 RedColor,
 GreenColor;

/* Storage Area for the dummy data.
 int dummyOne[MAXITEMS] = { -10. -5, 0, 15, 20, 15 ,0, -5, -10, -5 };
 int dummyTwo[MAXITEMS] = { 4, 13, 13, 10 ,11, 9, 4, -11, -13, -11 };
*/
static int siCounter = MAXITEMS - 1;
int one[MAXITEMS];
int two[MAXITEMS];
/* Declare the functions and callbacks here. Note that I do not use any
** parameters into the callbacks. Please refer to the Motif manual for
** details on these parameters if you need to use them. */
void exitCB();
int toggleCB();
int advanceCB();
void createDummyData(int count, int *oneptr, int *twoptr);
unsigned long str2pix(char *spec, Widget w);

int main(int argc, char *argv[])
{
 Arg wars[24]; /* for argument passing */
 XGCValues gcv; /* for creation for GC */
 XtGCMask gcmask; /* for GC flags */
 char strName[24]; /* Dummy storage area */
 int FormHt = 88; /* On the main form */
 int margin = 10; /* Distance between buttons */
 int dw = 20; /* Width of each button */
 int dh = 10; /* Height of each button */
 int h1 = margin; /* temporary var initialize */
 int h2; /* temporary variables */
 int DispHt, DispWd, ivalue, n, i, imax, imin, imid;
 /* Initialize toolkit */
 mainShell = XtInitialize(argv[0],"Demo", NULL, NULL, &argc, argv) ;
 n =0;
 XtSetArg(wars[n], XmNwidth, ScrnWidth); n++;
 XtSetArg(wars[n], XmNheight, ScrnHt); n++;
 XtSetArg(wars[n], XmNiconic, False ); n++;
 XtSetArg(wars[n], XmNiconName, "TestMe"); n++;
 XtSetValues(mainShell,wars,n);
 XtRealizeWidget(mainShell);
 /* Place the Form on the Shell. */
 n=0;
 XtSetArg(wars[n], XmNwidth, ScrnWidth); n++;
 XtSetArg(wars[n], XmNheight, ScrnHt); n++;
 mainForm = XmCreateForm(mainShell,"mainForm",wars,n);
 XtManageChild( mainForm );
 /* Use macros to place PushButtons on Form. Attach callbacks too. */
MAKE_BTN_ON_FORM(mainForm,otherPBtn,h1,FormHt,dw,dh,"Toggle");
 h2 = h1 + margin + dw;
MAKE_BTN_ON_FORM(mainForm,advancePBtn,h2,FormHt,dw,dh,"Advance");
 h2 += margin + dw;
MAKE_BTN_ON_FORM(mainForm,donePBtn,h2,FormHt,dw,dh,"Done");
 XtAddCallback(otherPBtn,XmNactivateCallback, toggleCB, NULL);
 XtAddCallback(advancePBtn,XmNactivateCallback, advanceCB, NULL);
 XtAddCallback(donePBtn,XmNactivateCallback, exit, NULL);
 createDummyData(MAXITEMS, one, two);
 /* Calculate extents of this dummy data. */
 imax = -100; imin = 100;
 for (i=0; i< MAXITEMS;i++)
 {
 ivalue = one[i];
 if (imax < ivalue) imax = ivalue;
 if (imin > ivalue) imin = ivalue;
 ivalue = two[i];
 if (imax < ivalue) imax = ivalue;
 if (imin > ivalue) imin = ivalue;
 }
 imid = imin + 1;
#ifdef DEBUG_LIMITS
 /* Tell the user where you are. */
 printf("\n Min = %d Mid = %d Max = %d", imin,imid,imax);
#endif
 /* Allocate the colors here. */
 WhiteColor = str2pix("White", mainForm);
 BlackColor = str2pix("Black", mainForm);
 BlueColor = str2pix("Blue", mainForm);
 RedColor = str2pix("Red", mainForm);

 GreenColor = str2pix("Green", mainForm);
 sprintf(strName,"%d,%d",imax,imin);
 /* Define and declare the Line Chart Widget. */
 n = 0;
 XtSetArg(wars[n], XmNheight, DispHt); n++;
 XtSetArg(wars[n], XmNwidth, DispWd); n++;
 XtSetArg(wars[n], XmNleftAttachment, XmATTACH_POSITION ); n++;
 XtSetArg(wars[n], XmNrightAttachment, XmATTACH_POSITION ); n++;
 XtSetArg(wars[n], XmNtopAttachment, XmATTACH_POSITION ); n++;
 XtSetArg(wars[n], XmNbottomAttachment, XmATTACH_POSITION ); n++;
 XtSetArg(wars[n], XmNleftPosition, 5 ); n++;
 XtSetArg(wars[n], XmNrightPosition, 95); n++;
 XtSetArg(wars[n], XmNtopPosition, 5 ); n++;
 XtSetArg(wars[n], XmNbottomPosition, 75 ); n++;
 XtSetArg(wars[n], XmNforeground, BlackColor); n++;
 XtSetArg(wars[n], XmNbackground, WhiteColor); n++;
 XtSetArg(wars[n], XtNitems, one); n++;
 XtSetArg(wars[n], XtNauxItems, two); n++;
 XtSetArg(wars[n], XtNitemCount, MAXITEMS); n++;
 XtSetArg(wars[n], XtNmaxItemCount, MAXITEMS); n++;
 XtSetArg(wars[n], XtNmaxValue, imax); n++;
 XtSetArg(wars[n], XtNmidValue, imid); n++;
 XtSetArg(wars[n], XtNminValue, imin); n++;
 XtSetArg(wars[n], XtNzeroColor, BlueColor); n++;
 XtSetArg(wars[n], XtNpositiveColor, BlueColor); n++;
 XtSetArg(wars[n], XtNnegativeColor, RedColor); n++;
 plotLChart = XtCreateWidget("LineChartType",
 XvlinechartWidgetClass, mainForm, wars, n);
 imid = (imax + imin) / 2;
#ifdef DEBUG_LIMITS
 printf("\n Min = %d Mid = %d Max = %d", imin,imid,imax);
#endif
 /* Define and declare the Bar Chart Widget. */
 n = 0;
 XtSetArg(wars[n], XmNheight, DispHt); n++;
 XtSetArg(wars[n], XmNwidth, DispWd); n++;
 XtSetArg(wars[n], XmNforeground, BlackColor); n++;
 XtSetArg(wars[n], XmNbackground, WhiteColor); n++;
 XtSetArg(wars[n], XmNleftAttachment, XmATTACH_POSITION ); n++;
 XtSetArg(wars[n], XmNrightAttachment, XmATTACH_POSITION ); n++;
 XtSetArg(wars[n], XmNtopAttachment, XmATTACH_POSITION ); n++;
 XtSetArg(wars[n], XmNbottomAttachment, XmATTACH_POSITION ); n++;
 XtSetArg(wars[n], XmNleftPosition, 5 ); n++;
 XtSetArg(wars[n], XmNrightPosition, 95); n++;
 XtSetArg(wars[n], XmNtopPosition, 5 ); n++;
 XtSetArg(wars[n], XmNbottomPosition, 75 ); n++;
 XtSetArg(wars[n], XtNitems, one); n++;
 XtSetArg(wars[n], XtNauxItems, two); n++;
 XtSetArg(wars[n], XtNitemCount, MAXITEMS); n++;
 XtSetArg(wars[n], XtNmaxItemCount, MAXITEMS); n++;
 XtSetArg(wars[n], XtNmaxValue, imax); n++;
 XtSetArg(wars[n], XtNmidValue, imid); n++;
 XtSetArg(wars[n], XtNminValue, imin); n++;
 XtSetArg(wars[n], XtNmaxShowValue, imax); n++;
 XtSetArg(wars[n], XtNminShowValue, imin); n++;
 XtSetArg(wars[n], XtNzeroColor, BlueColor); n++;
 XtSetArg(wars[n], XtNpositiveColor, BlueColor); n++;
 XtSetArg(wars[n], XtNnegativeColor, RedColor); n++;
 plotBChart = XtCreateManagedWidget("BarChartDisplay",

 XvbarchartWidgetClass, mainForm, wars, n);
 /* Show the widgets on the form and go into a loop */
 XtManageChild(mainForm);
 XtRealizeWidget(mainShell);
 XtMainLoop();
}
/* Terminate the application. */
void exitCB()
{
exit(0);
}
/* It's easier to use a flag here instead of calling XtIsManaged() on every
** widget. This flag could index a circular list of Widgets and thus your
** ToggleCB() function would go something like:
** ...
** XtUnmanage(YourList[ToggleFlag]);
** ToggleFlag++;
** if (ToggleFlag >= LengthOfList) ToggleFlag =0;
** XtManage(YourList[ToggleFlag]);
** ...
** where "YourList" is an array of widgets of length "LengthOfList". */
static ToggleFlag = 1;
/* Callback for function to toggle the type of display. */
int toggleCB()
{
if (ToggleFlag)
 {
#ifdef DEBUG_SWAPPING_SCREENS
printf("\n Debug statement goes here ");
#endif
 XtUnmanageChild(plotBChart);
 XtManageChild(plotLChart);
 ToggleFlag = 0;
 }
 else {
#ifdef DEBUG_SWAPPING_SCREENS
printf("\n Debug statement goes here ");
#endif
 XtManageChild(plotBChart);
 XtUnmanageChild(plotLChart);
 ToggleFlag = 1;
 }
}
/* Convenience to load in colors for the application. Returns pixel value
** given the following inputs: char *spec; color name
** Widget w; a widget pointer
*/
unsigned long str2pix(char *spec, Widget w)
{
 Colormap cmap;
 XColor color_struct; /* For this color */
 Display *dpy; /* Curent display */
 static void *ht = NULL; /* hash table pointer */
 /* Get Display */
 dpy = XtDisplay(w);
 /* Get color map */
 cmap = XDefaultColormap(dpy, DefaultScreen(dpy));
 /* Parse the color specification */
 if (!XParseColor(dpy, cmap, spec, &color_struct))

 {
 printf("Invalid color name (%s)\n",spec);
 exit(1);
 }
 /* Try to allocate the color */
 if (!XAllocColor(dpy, cmap, &color_struct))
 {
 printf("Cannot allocate color in colormap\n");
 exit(1);
 }
 /* Return the allocated color */
 return(color_struct.pixel);
}
/* Cycle the data thru itself when button is pressed on "Advance" */
int advanceCB()
{
if (siCounter < 1) siCounter = MAXITEMS;
#ifdef DEBUG_SI_COUNTER
printf("\n siCounter = %d", siCounter);
#endif
siCounter--;
XvBarChartAddItem(plotLChart, one[siCounter]);
XvBarChartAddItem(plotBChart, two[siCounter]);
}
/* Create some data in an array. */
void createDummyData(int count, int *oneptr, int *twoptr)
{
int i;
for (i = 0; i < count/2; i++)
 {
 oneptr[i] = (i * 2) + 10;
 twoptr[i] = (i * 3) + 5;
 }
for (i = count/2; i < count; i++)
 {
 oneptr[i] = (i * 2) - 10;
 twoptr[i] = (i * 3) - 5;
 }
}























November, 1992
EXAMINING TURBO PASCAL FOR WINDOWS


Application frameworks ease Windows development




Michael Floyd


Mike is executive editor at DDJ. He can be reached at the DDJ offices, on
CompuServe at 76703,4057, or on MCI Mail at mfloyd.


There's an old saying among veteran programmers that real programmers don't
eat quiche, code only in assembly language, and eschew languages like Basic,
Fortran, and (especially) Cobol. Over the past few years, however, C (and more
recently C++) has replaced assembler as the "macho" language of choice.
Windows developers in particular have been led to believe that C and C++ are
their only options. But a recent encounter with Borland's Turbo Pascal for
Windows (TPW) leads me to believe that you can have your quiche and eat it
too--and do some real programming in the process.
To test TPW's suitability as a Windows programming tool, I wrote ExpertWin, a
general-purpose expert shell. This seemed appropriate because expert systems
are typically UI intensive. In addition to examining TPW, I use Borland's
ObjectWindows Library (OWL) and explore issues relating to common
user-interface design. ExpertWin makes no attempt to use every feature in OWL;
it does what it needs to with pull-down menus and dialog boxes.


The ExpertWin System


ExpertWin is a "backward chaining" system, meaning it starts with a conclusion
(or goal) and a set of rules leading to that conclusion. The system then
attempts to prove the goal by matching the rules with known facts from a
database. If a needed fact is not contained in the database, the system
queries the user for information.
Each conclusion is stored along with its associated attributes in a database
(called either a "rule" or a "knowledge base") in the form: conclusion[attr1,
attr2,...attrN]. The square brackets denote a list of attributes, and commas
are used to AND two attributes. The rule reads as "conclusion is true if attr1
and attr2 and everything up to attrN are true." Upon opening a file, ExpertWin
parses each rule and stores the resulting rule components into a nested
linked-list structure. Once the rule base is loaded, the user can insert
additional rules, search for either a conclusion or an attribute, list rules
in the knowledge base, implement a query, or clear the rule base from memory
so that a new database can be loaded.
Any given rule can inherit attributes from an ancestor rule, much as an object
inherits features from its parent. The user creates this relationship by
specifying the parent's rule name in the child's attribute list. During a
query, ExpertWin looks at each attribute and first determines whether it is
another rule. If so, ExpertWin goes to that rule and attempts to prove it
before continuing through the list. I won't spend much time on the code, but
the entire inference engine is contained in the TExpert.Inference method; see
Listing One, page 145.


Some Assembly Required


ExpertWin uses OWL, Borland's application framework; for more on this, see the
text box entitled "Application Frameworks." From the term framework, you might
expect a superstructure--a skeleton program from which to build your
application--that contains all common elements required for compliance with
user-interface guidelines. However, every application framework I've worked
with requires you to assemble various components to create a skeleton program.
Figure 1 (page 149) shows a template for a generic application that includes
the basic elements an ObjectWindows application requires. Although the
template has been abbreviated for clarity, I maintain an expanded version of
this template that includes all standard menus, such as File and Help, and all
common dialogs, such as FileOpen and About. It's a living, breathing Windows
application complete with all standard Windows interface objects.
The generic template in Figure 1 also details the basic structure of an OWL
application. All apps must define an application object (TGenericApp), which
is derived from the standard OWL type TApplication. TApplication initializes
the first instance of your application, creates and displays the main window,
and handles the processing of Windows messages. It also handles cleanup when
the app is closed.
These routines are invoked from the main body of the program, which first
creates an instance of TGeneric, then calls Init, Run, and Done. Init
constructs the application object, then calls InitMainWindow to construct the
main window. You'll usually find it necessary to override the InitMainWindow
method defined in TApplication, so I've included a definition in the template.
Run invokes a message-processing loop which intercepts Windows messages and
dispatches them to methods in your program. These methods, called
"message-response methods," are contained in TGeneric and represent all
behavior within main window. A message-response method is analogous to a
Windows callback function--a function called by Windows. In this case,
however, your message-response method is called via TApplication. By
convention, message-response methods are given a name corresponding to the
command message they respond to. For example, TGeneric.CMAbout is a
message-response method that invokes the About dialog box when it receives the
cm_About message (defined as Const in Figure 1).
The message-processing loop runs until the user terminates the application. At
this point, the destructor Done is called to free memory and terminate the
program.


OWL Hierarchy


There's a significant learning curve to both Windows and OWL programming.
Fortunately, with OWL you can take it in bites. Once you learn to create the
main window and you're comfortable with basic message-loop processing, you
begin to learn how to create window, menu, dialog, and control objects--the
interface objects. Be especially prepared to spend time with the plethora of
controls, which include edit controls, list boxes, radio buttons, combo boxes,
check boxes, and scroll bars.
Seemingly unrelated to user-interface objects is the support for simple
container objects called collections. Collections provide a powerful means to
implement data structures such as dynamic arrays and lists, as well as
collections of objects in a generic, yet consistent manner. The TCollection
object type includes over 20 collection-manipulation methods such as Load,
GetItem, PutItem, and Delete, and iterator methods that can apply a function
or procedure to each item in the collection. There are also subtypes that
allow you to work with sorted collections and collections of strings.
More Details.
OWL also provides support for object-oriented I/O using streams. You can
create your own stream types, but you must override many of the basic I/O
methods. My guess is that most newcomers will spend time learning to create
the other interface objects before getting into TStream and its descendants,
TDosStream and TEmsStream.
With OWL, you can create an app without really getting the Windows dirt under
your fingernails. But there's more to a Windows app than windows, menus, and
dialogs. Although OWL supports MDI, you must make calls directly to the
Windows API to support GDI, DDE, or OLE. This isn't a complaint, just an
observation that OWL does not completely insulate you from the Windows API. It
does, however, reduce the number of API calls you must learn.


Reusable vs. Roll Your Own


Two singly linked lists are used to handle the rule base; see Listing Two,
page 148. Attr is a simple record that uses a character array to store a rule,
and ANext is a pointer to the next rule in the list. The second linked list,
Item, also uses a character array to store the conclusion and a pointer to the
next conclusion. But Item also contains a pointer to Attr, which links the
rule list to the conclusion. The net effect is that Attr is a nested linked
list.
SList is an object that handles list manipulation of conclusions. NestedList
is derived from SList and handles the rule list. Most of this is straight
linked-list manipulation with the twist that list traversals must often pass
through both lists.
I'd originally decided to use a stream to read the rules from disk file into a
collection. As mentioned, both collections and streams are provided for in
OWL. But there's an inherent problem in trying to manufacture something new
while you're still learning the framework. Recall that the inference engine
requires synchronized list traversal. At this point, I didn't know enough
about OWL internals to create a collection of collections, much less write a
routine to synchronously traverse two collections. I could have grabbed the
source, picked it apart, and modified the general collection routines to suit
my specific needs. Or I could write my own in half the time. So much for
reusability.
Once that decision was made, it became evident that using streams to read the
data into a record was overkill. Streams are great for dealing with the
complexities of storing objects to disk, but straightforward Turbo Pascal I/O
would serve better. In all fairness, you gain a great deal of power using
streams. For example, storing the current state of a query would best be
implemented using streams.


Strings vs. PChars



Because Windows requires null-terminated strings, TPW introduces the pointer
type PChar. Note that the string types in Listing Two are all null-terminated.
Though the changes are relatively easy, PChars require some adjustments in the
way you work with strings. Most problems you encounter will result from
confusing PChars with normal strings.
A PChar is a pointer to a null-terminated string and is equivalent to the
declaration PChar=^Char. You create a null-terminated string variable either
by declaring the variable as type PChar or by declaring it as a zero-based
array of characters. If the variable is declared as a PChar, you must allocate
memory for the string using the StrNew function. Further, you must dispose of
the string using StrDispose before assigning a different string to the
variable. Failure to do so can have disastrous consequences. On the other
hand, the compiler automatically allocates memory when the string is declared
as a zero-based array. Note that despite the array of Char declaration, this
is an array of pointers, not of characters.
PChars require that you use functions to manipulate strings instead of using
familiar TP syntax. For example, with normal Turbo Pascal strings, you assign
a string to a variable using the assignment operator: Str:='foo. To assign the
same string to a PChar, you must use the StrCopy function: StrCopy(Str,
'foo').
Also note that because PChars are pointers, you are passing by reference when
calling a procedure or function. Normal strings, however, are usually passed
by value unless you use the Var keyword in the function header. If you need to
work on a copy of the PChar string rather than the string itself, you must
create a local variable and copy the string to that variable.
Finally, make sure you set the $X+ compiler directive to enable extended
syntax. This is the default setting, but it may on occasion be turned off.


UI Design


IBM formalized its guidelines for OS/2 Presentation Manager interface design
in its common user access (CUA) guidelines, published in 1989. Windows
developers have used these guidelines as a road map in developing their
applications. (Note that IBM updated these guidelines in 1991, but the
revisions are specific to OS/2 2.0.) For the purposes of this article, I've
used The Windows Interface (Microsoft Press, 1992) to aid in the design of the
ExpertWin UI.
Conformance to user-interface guidelines is important for visual and
functional consistency. Guidelines also take much of the guesswork out of your
design and will probably keep your boss happy when the product ships.
In browsing this design guide, however, you quickly realize the enormity of
detail in such an undertaking. Fortunately, the components of TPW are a big
help in creating interface objects that conform to the Microsoft (and CUA '89)
guidelines. For example, I found the Resource Workshop extremely helpful in
implementing conforming menus. According to the guidelines, any menu item that
leads to a dialog box should contain three trailing ellipses. Further, menu
items that fall into logical groups should use a separator line to distinguish
the groups. There are also specific recommendations for features within File,
Edit, View, Options, and Help menus. Returning to ExpertWin with guidelines in
hand, I was prepared to go through the process of updating the menus starting
with the File menu. To my surprise, I was able to automatically generate the
standard File menu complete with ellipses, separators, keyboard
accelerators--the works.
There are also recommendations for "common dialogs" like Open, Save As, Print,
and so on. As you might expect, TPW includes a Common Dialogs DLL (CommDlg)
that you include in a USES statement. Once included, you can display the
dialog via a single call rather than the traditional method of creating a
dialog resource and dialog-box procedure. I was able to implement the File
menu, all common dialogs, and processing code (such as parsing the rules after
opening a file) all in one day.
You also have access to the Borland Windows Custom Controls (BWCC), which
include various controls that have a chiseled-steel, 3-D look. Accessing the
custom controls is a simple matter of including BWCC in the USES clause and
recompiling. Now all standard controls will take on the Borland look. My
primary complaint is that they're non-standard UI components. But if you can
live with this nonstandard appearance, they can spice up an otherwise ordinary
interface.


Conclusion


I didn't spend much time talking about the typical tools like the
Windows-hosted development environment, which features integrated debugging, a
configurable desktop and toolbar, and syntax-directed highlighting. These are
features you should expect. And although I'd still like to see an integrated
class browser that lets me move to and edit objects, or a debugger that
doesn't have to switch to character mode, the feature list isn't what's
important. More important is getting a developer-class tool for creating
real-world Windows applications. Turbo Pascal for Windows fits the bill.


References


The Windows Interface. An Application Design Guide. Redmond, WA: Microsoft
Press, 1992.
SAA CUA Basic Interface Design Guide, Document SC26-4582. IBM: 1989.
SAA CUA Advanced Interface Design Guide, Document SC26-4583. IBM: 1989.


Products Mentioned


Turbo Pascal for Windows Borland International 1800 Green Hills Road Scotts
Valley, CA 95066 408-438-8400 $149.95


Application Frameworks


There are a number of so-called application frameworks bouncing around, among
them Symantec's yet-to-be-released Bedrock, Borland's OWL, and Inmark's zApp.
Microsoft, too, has dabbled with the "application framework" moniker,
originally tagging its C++ class library "AFX" before releasing it as "MFC."
With all these diverse tools making the rounds, I began looking for a precise
definition of an application framework. What I discovered was that, while all
tool vendors could describe their application framework, none could come up
with a general, yet succinct definition of one. (This caused me to wonder if,
when it comes to application frameworks, marketing gurus were indeed engaging
in "fuzzy logic.")
The original application framework might have been the user-interface class
library for Smalltalk-80. In the days that predate GUIs, Smalltalk-80 was
exceptional at constructing highly interactive, easy-to-use user interfaces.
It's no accident that both Smalltalk-80 and much of the UI design in the
Macintosh was pioneered at Xerox PARC. Nor is it surprising that an
object-oriented approach based on Smalltalk's Model-View-Controller (MVC) was
adopted in the Lisa Toolkit and later MacApp.
Eventually, Apple Computer coined the term "application framework" to describe
MacApp. But when DDJ asked Apple for the "official" definition of the term, we
were surprised to learn that Apple didn't have one either--even in the MacApp
documentation. After some discussion, an Apple spokesperson provided us with
the following: "An application framework is an extended collection of classes
that cooperate to support a complete application architecture or model,
providing more complete application development support than a simple set of
classes."
Borland's ObjectWindows Library (discussed in this article) not only generally
conforms to the Apple definition, but looks remarkably similar to MacApp. For
example, both rely on an event-based processing loop that is built into the
framework and use a naming convention that prefixes all framework objects with
a T and pointers to that object with a P. Also like MacApp, OWL combines
interface objects that make it difficult to create an application that does
not conform to user-interface guidelines. While the OWL class hierarchy
differs somewhat from MacApp, every object in the hierarchy is derived from
the base object TObject. This design also bears striking similarity to
Smalltalk.
-- M.F.

_EXAMINING TURBO PASCAL FOR WINDOWS_
by Michael Floyd


[LISTING ONE]

program ExpSys;

{$R WINXPERT.RES}


uses WinDOS, WObjects, WinTypes, Strings, WinProcs, StdDlgs, CommDlg, BWCC,
Lists;

const
 id_Menu = 200;
 id_About = 100;
 cm_FileOpen = 102;
 cm_FileSaveAs = 104;
 cm_Insert = 201;
 cm_Search = 202;
 cm_FindAttr = 203;
 cm_ForChain = 212;
 cm_BackChain = 204;
 cm_ClearFacts = 205;
 cm_About = 999;
 cm_Quit = 108;

 id_EC1 = 106;
 id_EC2 = 107;
 id_EC3 = 108;
 id_CB2 = 109;
 id_ST1 = 110;
 id_ST2 = 111;
 id_ST3 = 155;
 id_ST4 = 160;
 id_LB1 = 151;
 id_BN1 = 152;
 id_BN2 = 153;
 id_BN3 = 154;
 id_YesBtn = 161;
 id_NoBtn = 162;
 NotFound = 97;
 YesBtn = 98;
 NoBtn = 99;

Type
 TFilename = array [0..255] of Char;
 DataFile = file of Item;

{---------------------}
{ Application Objects }
{---------------------}
 type
 StatTxtRec = record
 StaticText : array [0..40] of Char;
 end;

 TExpertApp = object(TApplication)
 procedure InitMainWindow; virtual;
 end;

 PExpert = ^TExpert;
 TExpert = object(TWindow)
 DC : HDC;
 EC1, EC2, EC3 : PEdit;
 LB1 : PListBox;
 Head, Tail : PItem;
 AHead, ATail : Pattr;
 FileName : TFileName;
 IName, AName : array[0..40] of Char;


 constructor Init(AParent: PWindowsObject; ATitle: PChar);
 destructor Done; virtual;
 function Inference(Query : PChar; Rules : PItem) : Integer;
 procedure Show; virtual;
 procedure CmInsert(var Msg: TMessage); virtual cm_First + cm_Insert;
 procedure CMFileOpen(var Msg: TMessage); virtual cm_First + cm_FileOpen;
 procedure CMFileSaveAs(var Msg: TMessage); virtual cm_First + cm_FileSaveAs;
 procedure CMSearch(var Msg: TMessage); virtual cm_First + cm_Search;
 procedure CMFindAttr(var Msg: TMessage); virtual cm_First + cm_FindAttr;
 procedure CMForChain(var Msg: TMessage); virtual cm_First + cm_ForChain;
 procedure CMBackChain(var Msg: TMessage); virtual cm_First + cm_BackChain;
 procedure ClearFacts(var Msg : TMessage); virtual cm_First + cm_ClearFacts;
 procedure CMAbout(var Msg: TMessage); virtual cm_First + cm_About;
 procedure CMQuit(var Msg: TMessage); virtual cm_First + cm_Quit;
 end;

 PTestDialog = ^TTestDialog;
 TTestDialog = object(TDialog)
 constructor Init(AParent: PWindowsObject; ATitle: PChar);
 procedure IDBN1(var Msg: TMessage); virtual id_First + id_BN1;
 procedure IDLB1(var Msg: TMessage); virtual id_First + id_LB1;
 end;

 PQueryDlg = ^TQueryDlg;
 TQueryDlg = object(TTestDialog)
 procedure IDBN2(var Msg: TMessage); virtual id_First + id_BN2;
 procedure IDBN3(var Msg: TMessage); virtual id_First + id_BN3;
 end;

 PGetFact = ^TGetFact;
 TGetFact = object(TDialog)
 constructor Init(AParent: PWindowsObject; ATitle: PChar);
 procedure IDYesBtn(var Msg: TMessage); virtual id_First + id_YesBtn;
 procedure IDNoBtn(var Msg: TMessage); virtual id_First + id_NoBtn;
 end;

Var
 APtr : PAttr; {Global ptr to PAttr}
 KnowledgeBase : Text;
 InFile, OutFile : Text;
{---------------------}
{ TGetFact Methods }
{---------------------}
constructor TGetFact.Init(AParent: PWindowsObject; ATitle: PChar);
begin
 TDialog.Init(AParent, ATitle);
end;

procedure TGetFact.IDYesBtn(var Msg: TMessage);
begin
 EndDlg(YesBtn); {Return YesBtn to ExecDialog and end dialog}
end;

procedure TGetFact.IDNoBtn(var Msg: TMessage);
begin
 EndDlg(NoBtn); {Return NoBtn to ExecDialog and end dialog}
end;


{---------------------}
{ TTestDialog Methods }
{---------------------}
constructor TTestDialog.Init(AParent: PWindowsObject; ATitle: PChar);
begin
 TDialog.Init(AParent, ATitle);
end;

procedure TTestDialog.IDBN1(var Msg: TMessage);
var
 TextItem : PChar;
 TmpStr : array[0..40] of Char;
 IList : PItem;
begin
 IList := ListPtr;
 While IList <> nil do
 begin
 TextItem := StrNew(IList^.ItemName);
 SendDlgItemMsg(id_LB1, lb_AddString, 0, LongInt(TextItem));
 StrDispose(TextItem); { Don't forget to dispose TextItem }
 IList := IList^.Next;
 end;
end;

procedure TTestDialog.IDLB1(var Msg: TMessage);
var
 RDlg, Idx : Integer;
 SelectedText: array[0..20] of Char;
 ExpList : SList;
 AttrTxtRec : StatTxtRec;
 D: PDialog;
 S1: PStatic;
begin
 if Msg.LParamHi = lbn_SelChange then
 begin
 Idx := SendDlgItemMsg(id_LB1, lb_GetCurSel, 0, LongInt(0));
 SendDlgItemMsg(id_LB1, lb_GetText, Idx, LongInt(@SelectedText));
 APtr := ExpList.GetAttr(SelectedText);
 D := New(PQueryDlg, Init(@Self, 'DIAL2'));
 StrCopy(AttrTxtRec.StaticText, APtr^.Attribute);
 New(S1, InitResource(D, id_ST3, SizeOf(AttrTxtRec.StaticText)));
 D^.TransferBuffer := @AttrTxtRec;
 RDlg := Application^.ExecDialog(D);
 end;
end;

{---------------------}
{ TQueryDlg Methods }
{---------------------}
procedure TQueryDlg.IDBN2(var Msg: TMessage);
begin
 If APtr^.ANext <> nil then
 begin
 APtr := APtr^.ANext;
 SetWindowText(GetItemHandle(id_ST3), APtr^.Attribute);
 end
 else
 begin
 MessageBox(HWindow, 'Item is True', 'List Check completed', MB_OK);

 EndDlg(MB_OK);
 end;
end;

procedure TQueryDlg.IDBN3(var Msg: TMessage);
begin
 MessageBox(HWindow, 'Cannot prove item', 'Item not proved', MB_OK);
 EndDlg(0);
end;


{--------------------}
{ TExpertApp Methods }
{--------------------}
procedure TExpertApp.InitMainWindow;
begin
 MainWindow := New(PExpert, Init(nil, 'WinExpert 1.0'));
end;

{-------------------}
{ TExpert Methods }
{-------------------}
constructor TExpert.Init(AParent: PWindowsObject; ATitle: PChar);
var
 AStat : PStatic;
begin
 Head := nil;
 Tail := nil;
 AHead := nil;
 TWindow.Init(AParent, ATitle);
 With Attr do
 Begin
 Menu := LoadMenu(HInstance, PChar(100));
 Style := ws_SysMenu or ws_VScroll or ws_HScroll or ws_MaximizeBox
 or ws_MinimizeBox or ws_SizeBox;
 X := 0; Y := 0;
 W := 640; H := 450;
 end;
 EC1 := New(PEdit,Init(@Self, id_EC1, 'foo', 20, 50, 100, 30, 0, False));
 EC2 := New(PEdit, Init(@Self, id_EC2, '', 121, 50, 150, 30, 0, False));
 AStat := New(PStatic, Init(@Self, id_ST1, 'Classification:', 20, 30, 150, 20,
0));
 AStat := New(PStatic, Init(@Self, id_ST2, 'Attributes:', 121, 30, 150, 20,
0));
end;

destructor TExpert.Done;
begin
 TWindow.Done;
end;

function TExpert.Inference(Query : PChar; Rules : PItem) : Integer;
var
 Goal : PItem;
 Conditions : PAttr;
 MBoxText : array[0..40] of Char;
 RVal, InferFlag : Integer;
 D: PDialog;
 S1: PStatic;
 STxtRec : StatTxtRec;
Begin

 Inference := NotFound;
 Goal := Rules;

 { Pattern Matcher }
 While (Goal <> nil) and (StrIComp(Goal^.ItemName, Query) <> 0) do
 Goal := Goal^.Next;
 If Goal <> nil then { This is necessary because TPW's StrIComp() }
 begin { does no checking & crashes when Goal is nil }
 If StrIComp(Goal^.ItemName, Query) = 0 then
 begin { Goal Matches }
 Conditions := Goal^.Prop;
 While Conditions <> nil do
 begin
 InferFlag := Inference(Conditions^.Attribute, Rules);
 If InferFlag = YesBtn then
 Conditions := Conditions^.ANext
 Else If InferFlag = NoBtn then
 begin
 Inference := NoBtn;
 exit;
 end
 Else If InferFlag = NotFound then
 begin {prove attribute by asking; if true get next and prove }
 StrCopy(MBoxText, 'is ');
 StrCat(MBoxText, Goal^.ItemName);
 StrCat(MBoxText, ' ');
 StrCat(MBoxText, Conditions^.Attribute);
 StrCopy(STxtRec.StaticText, MBoxText);
 D := New(PGetFact, Init(@Self, 'DIAL3'));
 New(S1, InitResource(D, id_ST4, SizeOf(STxtRec.StaticText)));
 D^.TransferBuffer := @STxtRec;
 RVal := Application^.ExecDialog(D);
 If RVal = YesBtn then
 begin
 Conditions := Conditions^.ANext;
 end
 else {Condition Failed--Backtrack for other solutions}
 begin
 Inference := NoBtn;
 exit;
 end; { else }
 end; { Else If}
 end; { While }
 {if all True then Inference := True }
 If (RVal = YesBtn) or (Conditions = nil) then
 Inference := YesBtn
 else Inference := NotFound;
 end; {While}
 end; {If}
end; { Inference }

procedure TExpert.CMInsert;
var
 AttrList : NestedList;
 Attribute : array[0..40] of Char;
 StartPos, EndPos: Integer;
 TxtField1, TxtField2 : array[0..40] of Char;

begin

 EC1^.GetSelection(StartPos, EndPos);
 if StartPos = EndPos then
 EC1^.GetText(@TxtField1, 20)
 Else
 EC1^.GetSubText(@TxtField1, StartPos, EndPos);
 StrCopy(IName, TxtField1);
 EC2^.GetText(@TxtField2, 20);
 StrCopy(Attribute, TxtField2);
 If Length(Attribute) > 0 then
 AttrList.NewNode(AHead, ATail, Head, Tail, IName, Attribute);
 Show;
end;

procedure TExpert.Show;
var
 PStr : array[0..19] of Char;
 Y1 : Integer;
 Node : PItem;
begin
 Node := ListPtr;
 Y1 := 100;
 DC := GetDC(HWindow);
 TextOut(DC, 2,99, 'Items in list: ',14);
 While Node <> nil do
 begin
 Y1 := Y1 + 15;
 StrCopy(PStr,Node^.ItemName);
 TextOut(DC, 31,Y1, PStr, StrLen(PStr));
 Node := Node^.Next;
 end;
 ReleaseDC(HWindow, DC);
end;

procedure TExpert.CMFileOpen(var Msg: TMessage);
const
 DefExt = 'dat';
var
 OpenFN : TOpenFileName;
 Filter : array [0..100] of Char;
 FullFileName: TFilename;
 WinDir : array [0..145] of Char;
 Node : PItem;
 AttrList : NestedList;
 Attribute : array[0..40] of Char;
 Ch : Char;
 Str : array[0..40] of Char;
 I : Integer;
begin
 GetWindowsDirectory(WinDir, SizeOf(WinDir));
 SetCurDir(WinDir);
 StrCopy(FullFileName, '');

{ Set up a filter buffer to look for Wave files only. Recall that filter
 buffer is a set of string pairs, with the last one terminated by a
 double-null.
}
 FillChar(Filter, SizeOf(Filter), #0); { Set up for double null at end }
 StrCopy(Filter, 'Dat Files');
 StrCopy(@Filter[StrLen(Filter)+1], '*.dat');


 FillChar(OpenFN, SizeOf(TOpenFileName), #0);
 with OpenFN do
 begin
 hInstance := HInstance;
 hwndOwner := HWindow;
 lpstrDefExt := DefExt;
 lpstrFile := FullFileName;
 lpstrFilter := Filter;
 lpstrFileTitle:= FileName;
 flags := ofn_FileMustExist;
 lStructSize := sizeof(TOpenFileName);
 nFilterIndex := 1; {Index into Filter String in lpstrFilter}
 nMaxFile := SizeOf(FullFileName);
 end;
 If GetOpenFileName(OpenFN) then
 begin
 I := 0;
 FillChar(IName, sizeOf(IName), #0);
 FillChar(Attribute, sizeOf(Attribute), #0);
 Assign(InFile, FileName);
 Reset(InFile);
 While not eof(InFile) do
 begin
 Read(InFile, Ch);

 While Ch <> '[' do {construct class name from file}
 begin
 Move(Ch, IName[I], sizeOf(Ch));
 I := I + 1;
 Read(InFile, Ch);
 end; {While}

 I := 0;
 Read(InFile, Ch); {Now get Attributes}
 While Ch <> ']' do
 begin
 If Ch <> ',' then
 begin
 FillChar(Attribute[I], sizeOf(Ch), Ch);
 I := I + 1;
 end {If <> ','}
 else begin
 If Length(Attribute) > 0 then
 AttrList.NewNode(AHead, ATail, Head, Tail, IName, Attribute);
 FillChar(Attribute, sizeOf(Attribute), #0);
 I := 0;
 end; {else}
 Read(InFile, Ch);
 end; {While <> ']'}
 If Length(Attribute) > 0 then
 AttrList.NewNode(AHead, ATail, Head, Tail, IName, Attribute);
 Read(InFile, Ch);
 Read(InFile, Ch);
 I := 0;
 FillChar(IName, sizeOf(IName), #0);
 FillChar(Attribute, sizeOf(Attribute), #0);
 end; {While not eof}


 close(Infile);
 Show;
 end; {If}
end;

procedure TExpert.CMFileSaveAs(var Msg: TMessage);
const
 DefExt = 'dat';
var
 SaveFN : TOpenFileName;
 Filter : array [0..100] of Char;
 FullFileName: TFilename;
 WinDir : array [0..145] of Char;
 Goal : PItem;
 Conditions : PAttr;
begin
 GetWindowsDirectory(WinDir, SizeOf(WinDir));
 SetCurDir(WinDir);
 StrCopy(FullFileName, '');

{ Set up a filter buffer to look for Wave files only. Recall that filter
 buffer is a set of string pairs, with the last one terminated by a
 double-null.
}
 FillChar(Filter, SizeOf(Filter), #0); { Set up for double null at end }
 StrCopy(Filter, 'Dat Files');
 StrCopy(@Filter[StrLen(Filter)+1], '*.dat');

 FillChar(SaveFN, SizeOf(TOpenFileName), #0);
 with SaveFN do
 begin
 hInstance := HInstance;
 hwndOwner := HWindow;
 lpstrDefExt := DefExt;
 lpstrFile := FullFileName;
 lpstrFilter := Filter;
 lpstrFileTitle:= FileName;
 flags := ofn_FileMustExist;
 lStructSize := sizeof(TOpenFileName);
 nFilterIndex := 1; {Index into Filter String in lpstrFilter}
 nMaxFile := SizeOf(FullFileName);
 end;
 if GetSaveFileName(SaveFN) then
 begin
 Goal := ListPtr;
 Conditions := Goal^.Prop;
 Assign(OutFile, FileName);
 Rewrite(OutFile);
 while Goal <> nil do
 begin
 write(OutFile, Goal^.ItemName);
 write(OutFile,'[');
 while Conditions <> nil do
 begin
 write(OutFile, Conditions^.Attribute);
 Conditions := Conditions^.ANext;
 If Conditions <> nil Then
 write(OutFile, ',');
 end;

 writeln(OutFile, ']');
 Goal := Goal^.Next;
 If Goal <> nil then
 Conditions := Goal^.Prop;
 end;
 close(Outfile);
 end;
end;

procedure TExpert.CMSearch;
var
 ExpList : SList;
 SearchStr : array[0..40] of Char;
begin
 StrPCopy(SearchStr,'');
 Application^.ExecDialog(New(PInputDialog, Init(@Self,'Search Item',
 'Enter Item:', SearchStr, Sizeof(SearchStr))));

 If ExpList.Search(Head, SearchStr) <> nil Then
 MessageBox(HWindow, SearchStr, 'Item found: ',mb_OK)
 Else
 MessageBox(HWindow, SearchStr, 'Item NOT found: ',mb_OK);
Show;
end;

procedure TExpert.CMFindAttr;
var
 TmpPStr, SearchStr : array[0..40] of Char;
 Classification : String;
begin
 StrPCopy(SearchStr,'');
 Application^.ExecDialog(New(PInputDialog, Init(@Self,'Search Item',
 'Enter Item:', SearchStr, Sizeof(SearchStr))));
 StrCopy(AName, SearchStr);
 If (Length(AName) <> 0) and (Head <> nil) then
 Begin
 Classification := SearchItemList(Head, AName);
 If Length(Classification) <> 0 Then
 Begin
 StrCat(SearchStr,' is an attribute of ');
 StrPCopy(TmpPStr, Classification);
 StrCat(SearchStr, TmpPStr);
 MessageBox(HWindow, SearchStr, 'Attribute found: ',mb_OK)
 end
 else
 MessageBox(HWindow, SearchStr, 'Attribute NOT found: ',mb_OK);
 end;
 Show;
end;

procedure TExpert.CMForChain;
begin
 Application^.ExecDialog(New(PTestDialog, Init(@Self, 'DIAL1')));
end;

procedure TExpert.CMBackChain(var Msg: TMessage);
var
 TmpPStr, SearchStr : array[0..40] of Char;
 Inferred : Integer;

begin
 StrPCopy(SearchStr,'');
 Application^.ExecDialog(New(PInputDialog, Init(@Self,'Search Item',
 'Enter Item:', SearchStr, Sizeof(SearchStr))));
 Inferred := Inference(SearchStr, ListPtr);
 If Inferred = YesBtn then
 MessageBox(HWindow, 'Goal proved', 'Message', MB_OK)
 else
 MessageBox(HWindow, 'Cannot prove Goal', 'Message', MB_OK);
 Show;
end;

procedure TExpert.ClearFacts(var Msg : TMessage);
var
 Expert : TExpertApp;
 ExpList : SList;
 AttrList : NestedList;
begin
 ExpList.FreeList;
 ListPtr := nil;
 NListPtr := nil;
 Head := nil; AHead := nil;
 Tail := nil; ATail := nil;
 MessageBox(HWindow, 'Knowledge Base Cleared!', '',mb_OK);
end;

procedure TExpert.CMQuit;
begin
 PostQuitMessage(0);
end;

{ Displays the program's About Box dialog.}

procedure TExpert.CMAbout(var Msg: TMessage);
begin
 Application^.ExecDialog(New(PDialog, Init(@Self, PChar('DIAL4'))));
end;

{ Main }
var
 Expert : TExpertApp;

Begin

 Expert.Init('WinExpert');
 Expert.Run;
 Expert.Done;

end.


[LISTING TWO]


Unit Lists;

Interface

Type


 PAttr = ^Attr;
 Attr = record
 Attribute : array[0..40] of Char;
 ANext : PAttr;
 end;

 PItem = ^Item;
 Item = record
 ItemName : array[0..40] of Char;
 Prop : PAttr;
 Next : PItem;
 end;

 PList = ^SList;
 SList = object
 Node : PItem;
 constructor Init;
 destructor Done; virtual;
 procedure FreeList;
 procedure AddNode(var Head, Tail : PItem; NewName : PChar);
 function Search(Head : PItem; Name : PChar) : PItem;
 function GetAttr(Key : PChar) : PAttr;
 end;

 PNestedList = ^NestedList;
 NestedList = object(SList)
 NNode : PAttr;
 constructor Init;
 procedure FreeList;
 procedure NewNode(var AHead, ATail : PAttr; var Head, Tail : PItem;
 IName, NewAttr : PChar);
 function Search(Head : PAttr; Attribute : PChar) : Boolean;
 end;

 function SearchItemList( Head : PItem; Attribute : PChar): String;

var
 ListPtr : PItem;
 NListPtr : PAttr;


Implementation

Uses WinDOS, WObjects, WinTypes, Strings, WinProcs;

{ ----------------------- }
{ NestedList methods }
{ ----------------------- }
constructor NestedList.Init;
begin
 NNode := nil;
end;

procedure NestedList.FreeList;
begin
 NNode := NListPtr;
 while NNode <> nil do
 begin

 Dispose(NNode);
 NNode := NNode^.ANext;
 end;
end;

procedure NestedList.NewNode (var AHead, ATail : PAttr; var Head, Tail :
PItem;
 IName, NewAttr : PChar);
var
 ANode : PAttr;
 LPtr : PItem;
begin
 LPtr := SList.Search(Head, IName);
 If LPtr = nil Then
 begin
 AddNode(Head, Tail, IName);
 New(ANode);
 AHead := ANode;
 ATail := ANode;
 ANode^.ANext := nil;
 StrCopy(ANode^.Attribute, NewAttr);
 LPtr := SList.Search(Head, IName);
 LPtr^.Prop := ANode;
 end
 Else {Item already exists-add ANode to existing}
 begin
 New(ANode);
 AHead := LPtr^.Prop;
 ATail^.ANext := ANode;
 ATail := ANode;
 ANode^.ANext := nil;
 StrCopy(ANode^.Attribute, NewAttr);
 end;
end;

function NestedList.Search ( Head : PAttr; Attribute : PChar) : Boolean;
var
 I : Integer;
begin
 Search := False;
 NNode := Head;
 While NNode <> nil do
 begin
 I := StrIComp(NNode^.Attribute, Attribute);
 If I = 0 then
 begin
 Search := True;
 Exit;
 end;
 NNode := NNode^.ANext;
 end;
end;

function SearchItemList( Head : PItem; Attribute : PChar): String;
var
 Node : PItem;
 ANode : PAttr;
 AttrList : NestedList;
begin
 Node := Head;

 ANode := Node^.Prop;
 SearchItemList := '';
 While Node <> nil do
 begin
 If not AttrList.Search(ANode, Attribute) then
 begin
 Node := Node^.Next;
 If Node <> nil Then
 ANode := Node^.Prop;
 end
 else
 begin
 SearchItemList := Node^.ItemName;
 Exit;
 end;
 end;
end;

{ ----------------------- }
{ List methods }
{ ----------------------- }

constructor SList.Init;
begin
 ListPtr := nil;
 Node := nil;
end;

Destructor SList.Done;
begin
 FreeList;
end;

procedure SList.FreeList;
var
 AttrList : NestedList;
begin
 Node := ListPtr;
 while Node <> nil do
 begin
 NListPtr := Node^.Prop;
 Dispose(Node);
 AttrList.FreeList;
 Node := Node^.Next;
 end;

end;

{ Insert a New Item in the list }
procedure SList.AddNode (var Head, Tail : PItem; NewName : PChar);
var
 Added : PItem;
begin
 New(Added);
 If Head = nil then
 begin
 Head := Added;
 Tail := Added;
 ListPtr := Added;

 end
 Else begin
 Tail^.Next := Added;
 Tail := Added;
 end;
 Node := Head;
 Added^.Next := nil;
 StrCopy(Added^.ItemName, NewName);
end;

{ Search for a specified Item - return pointer if found }
function SList.Search ( Head : PItem; Name : PChar) : PItem;
var
 I : Integer;
begin
 Search := nil;
 Node := Head;
 While Node <> nil do
 begin
 I := StrIComp(Node^.ItemName, Name);
 If I = 0 then
 begin
 Search := Node;
 Exit;
 end;
 Node := Node^.Next;
 end;
end;

{Search for an Attribute and return pointer to its list}
function SList.GetAttr(Key : PChar) : PAttr;
var
 I : Integer;
Begin
 GetAttr := nil;
 Node := ListPtr;
 While Node <> nil do
 begin
 I := StrIComp(Node^.ItemName, Key);
 If I = 0 then
 begin
 GetAttr := Node^.Prop;
 Exit;
 end
 else
 Node := Node^.Next
 end;
end;
end.













November, 1992
DEVELOPING A PORTABLE C++ GUI CLASS LIBRARY


Finding an acceptable middle ground




Andreas Meyer


Andreas is director of software development at Star Division GmbH. He can be
reached at Sachsenfeld 4, 2000 Hamburg 1, Germany, or via the Internet at
am@starlab.uucp.


Over two years ago, we began developing a desktop publishing/word processing
application that was to run on Microsoft Windows, IBM OS/2 Presentation
Manager, Apple Macintosh, Open Look, and OSF/Motif.
We recognized there were two approaches to this project. The first was to
write the application by directly accessing platform-specific API functions.
The advantage of this approach is guaranteed fast execution without overhead.
However, you have to code different implementations of the same application
for each GUI.
The second approach was to use a single portable library between the
application and the GUI. This was the approach we took, although writing the
class library ended up taking a full year of development time.
The essential requirements of a library like this are that it provide GUI
functionality and multiple-document interface (MDI) support, a help system,
printing, combo controls, and a portable resource system. At the same time, we
wanted to be able to take advantage of system-dependent capabilities (like
Windows OLE or DDE) and not lose the basic look-and-feel of an individual GUI.
We chose to program in C++ because of its standard object-oriented
features--encapsulation, inheritance, polymorphism, extendibility, and class
reusability. No portable C++ toolkit was available at the time (at least none
that supported the abstraction level of the GUI functionality and came with
source code), so we built our own, naming it StarView.


An Overview of GUI Systems


Windows 3.x/NT, Presentation Manager (PM), Macintosh, Open Look, and OSF/
Motif have many similar features. For instance, they're all event driven, work
with resources, and support screen coordinates with variable metrics. Since
Windows and PM have roughly 100 different messages, Motif and Open Look about
40 Xlib events, and the Macintosh about ten standard events, we created common
events for mouse and keyboard input, windows, controls, menus, and timers. We
then mapped these events to about 40 virtual functions in our library's base
classes. We defined symbolic key codes (KEY_A) and mouse modifiers for all
platforms. We were then able to derive our own classes and overload the
virtual functions to access the events. In Windows and PM, most messages are
handled inside the library automatically; for the Mac, most had to be
emulated.
When you consider the resource systems on the different platforms, you notice
that each system supports different levels of functionality. Consequently, we
created our own resource syntax and wrote compilers for each system to
translate the system-independent resources into the target system's resources,
developing a portable resource system with the same level of functionality on
each platform. One disadvantage of this is that we couldn't use the resource
tools of the different platforms. We had to write resource files with a text
editor, compile them, test them, and rewrite them. Because it took a long time
to create the first resource files, we constructed our own interactive
resource editor called Design Editor which reads and writes our resource files
and creates corresponding class definitions and constructors. From that point
on, it was easy to develop portable resources and create corresponding classes
for them.
Another problem involved the different coordinate systems of each platform. On
PM, for example, the coordinate-system origin is the lower-left corner of the
screen; on the other systems, it's the upper-left corner. We decided to use
the upper left as the origin and developed a map mode for specifying the
metric, a scale, and an offset in the coordinate system. We realized that it
would be very difficult to use resource files with the same metrics on each
platform. If you specify a dialog in pixels, it will be very small on a
high-resolution screen. If you specify a dialog in inches, it will have the
same size on all platforms and resolutions, yet look ugly on smaller displays.
Normally, the size of the system font is a good measure by which to specify
the metrics of dialogs, so we created the map mode SystemFont to specify a
dialog in system- or application-font units. We specified all our dialogs in
the map-mode application font.
Memory allocation was yet another problem. In Windows and PM, early PC C++
compilers used alloc() and free() in new and delete so that each object needed
an entire segment. With only 8192 segments available, the application often
ran out of segments. To avoid this, we wrote our own memory manager which
allocated 4K segments from the system's heap and suballocated them according
to the object's size. This let us overload new and delete and call our own
memory manager to allocate the memory. We continued to use the standard C++
new and delete with the Mac, Open Look, and Motif.
As Table 1 shows, the platforms did support important common features: MDI,
context-sensitive help, printers, and controls like Windows combo boxes. MDI
support was very important for our word processor, but it was available only
on Windows. Consequently, our MDI for OS/2, Open Look, and Motif looks very
similar to the Windows version. You can arrange the document windows in an
application window with an MDI menu in which the document windows are listed
automatically. On the Macintosh, however, there are no application windows.
Instead, the document windows are arranged on the desktop.
Table 1: Features supported by the various platforms.

 Windows OS/2 Mac Motif Open Look
 ----------------------------------------------------

 MDI Yes No No No No
 Help System Yes Yes No No Yes
 Printing Yes Yes Yes No No
 Combo controls Yes Yes No No Yes

Another important common feature was context-sensitive help like that found in
Windows and OS/2. Because we didn't want to maintain different help files, we
settled on the Windows help system with its RTF-formatted help files. To use
this format on OS/2, we wrote a RTF-to-OS/2 help-system format converter.
Unfortunately, it wasn't that easy for the Macintosh, Open Look, and Motif.
The Macintosh balloon help, for instance, is useful only for short messages
and doesn't support references to other topics. Therefore, we had to write our
own Windows-like help system which is used on Macintosh, Open Look, and Motif.
Fortunately, this is a StarView application, so we had to code it only once.
This system has an integrated help compiler that converts RTF help files into
our own help format.
For output, we designed the class OutputDevice. We then derived the classes
Window, Printer, and VirtualDevice, which are used by applications to create
output to a window, bitmap, or printer. This was straightforward on Windows,
OS/2, and Macintosh, but difficult on OpenLook and Motif, where we used a
printer library that supported all Xlib output functions.
To emulate the Windows-OS/2 controls ComboBox, DropDownComboBox, and
DropDownListBox on the Macintosh, we combined edit menus, list boxes, and
pop-up menus. For Open Look and Motif, we combined text fields, arrow buttons,
and list boxes.
To manipulate the internals of an object, we created the class sysdepen, which
lets you, for example, access the Macintosh graphport directly or send a
Windows message to an object. However, this generates nonportable,
system-dependent code and should always be encapsulated in system-dependent
classes. Likewise, in the spirit of portability, we decided not to support
platform-specific features like Windows DDE or Macintosh QuickTime on all
platforms. We did, however, implement OLE and DDE support as nonportable
features especially for the Windows environment.


Developing the Class Library


During the year-long class-library development phase, we made three major
design and interface changes: We included an additional mechanism for event
handling, provided a mechanism for multiple referenced objects, and added
const.
Initially, we provided a virtual method for each event. To get an event, we
had to derive a class and overload the virtual function. This isn't a problem
with a window class, where you derive a class and add the functionality, but
it is problematic with controls, menus, and accelerators.
For example, the OK button in a dialog overloaded the virtual method Click()
in the button class, checked the status of the dialog, and terminated it. The
problem with this was that we had to derive a new button class for each OK
button in different dialogs, because they all called slightly different
functions. Although we could use multiple instances of the same button class
with the identical OK buttons, some individual buttons performed special
functions. To eliminate the large number of classes, we added a callback
mechanism to the controls, menus, and accelerators. This enabled us to use an
instance of the library's button class directly. We set a callback into a
button instance; when activated, the button executed the callback. Executing
the callback is the default implementation of the virtual method Click() in
the button class. If we don't need the callback, we derive a new button class
and overload the virtual method Click(). Because the code is written in C++
(not in C), there's no way to use C function pointers for callbacks, so the
callback is always a pair of two pointers. The first references the object
itself, and the second references the appropriate method of that instance. For
this pair of pointers, we created the class Link, which stores the pointers
and executes the callback. In Example 1, the OK button terminates a dialog.
GetParent() returns a pointer to the dialog and the method
MyDialog::QuitDialog() terminates and dialog.
Example 1: A derived ModalDialog with a PushButton. In the constructor of
MyDialog, the PushButton is initialized to call MyDialog:: QuitDialog when
clicked. There, the state of the dialog is checked and the dialog is
terminated. You don't need to derive your own Button class for this example.

 class MyDialog : public ModalDialog
 {
 protected:
 DefPushButton aOkButton;

 public:
 MyDialog( Window* pParent );
 void QuitDialog( DefPushButton* pButton );
 }

 MyDialog::MyDialog( Window* pParent ):
 ModalDialog( pParent ),
 aOkButton( this )
 {
 aOkButton.ChangeClickHdl( LINK (this, Mydialog::QuitDialog ) );
 }

 void MyDialog::QuitDialog( DefPushButton* pButton )
 {
 if( ... ) // check dialog state
 ModalDialog::EndDialog();

Our second redesign involved a mechanism for multiple referenced objects --
bitmaps, pens, fonts, and brushes -- used in different windows. A brush, for
example, consists of a f reground color, a background color, and a style --
components that can be queried and changed.
In our first design, these elements were selected with a pointer into multiple
dialogs. All dialogs used the same instance of the background brush, but it
was difficult to decide when this brush object had to be deleted. ("Is anyone
still using the brush?" "Can I delete it now?")
It became apparent that this approach would lead to programming errors. For
instance, all dialogs in an application should have the same background color.
However, we didn't want to give each dialog its own instance of a background
brush because of memory and speed considerations. The solution was to use a
dummy brush class that contained only a pointer to the brush data. Multiple
instances of brushes can then share the same data. The data has an instance
counter that is incremented in the copy constructor and decremented in the
destructor of the brush class. Thus, instances of brushes can be passed to
different dialogs because the instance has only the size of the pointer, and
the copy constructor works very fast. The brush data is deleted when the
instance count is decremented to 0.
We also split the methods of the brush class in two groups. The first group of
methods returns only values, while the second modifies the brush instance.
When we modify the brush, the instance data is copied, and only the copy is
altered.
We use this mechanism not only in the classes described above, but also with
bitmaps and in our String class. At first we worked with char*, allocated
memory, copied strings, and different buffer sizes. Now we use String class,
which is faster, requires less memory, and is easy to manipulate.
The last major change in the interface was the introduction of const. As
described earlier, we separated the methods of the brush class into modifying
and nonmodifying groups, allowing us to declare the nonmodifying methods
const.
However, this approach introduced some unexpected problems. For one thing, the
compiler returned an error when we modified an object in a const-declared
method. Secondly, if there was a const* or a const& to an instance of a class,
you could only use const methods of that class. Because of this, we had to
separate the entire library into const and non-const methods. From this we
learned that you should either use const everywhere or avoid its use
altogether.


Developing the Application


After a year's worth of work, we'd developed the class library and were ready
to finally begin work on our application -- the portable word processor.
Our development environment was a mirror of our platform requirements: PCs
running Windows, Windows NT, or OS/2 2.0; Sun SPARCstations with Open Look or
Motif; and Apple Macintoshes.
All computers were connected to an Ethernet-based network equipped with Novell
Netware 3.11, with NFS and Appleshare support available. We put the entire
source code for all platforms on a commonly accessible file server.
Our main development platform is Windows 3.1, so all modules are written for
the Windows environment first. Only tested and validated Windows versions are
then ported. Porting means not only copying the source code from the file
server to the workstations and compiling it there, but also handling the
ASCII-file format on the various systems, different makefiles, and special,
compiler-dependent language problems.
Although the source code was on a common file server, all system-specific
modifications were made on the target systems. Every modification was
registered using a source-code control system that modifies a single logfile
on the server; each module can be modified by only one programmer at a time.
Finally, a GUI specialist fine-tuned the resource files to ensure adherence to
system style guides for look-and-feel.


Compiler Experiences


On the PC, we initially used the Glockenspiel C++ compiler, but compilation
time was too slow, so we switched to Zortech for both Windows and OS/2
(although in retrospect we wish it had supplied more warnings). We used MPW
C++ on the Macintosh and Sun's C++ for Open Look and Motif development. (The
library itself supports Zortech C++ 3.0, Borland C++ 3.1, and Microsoft C/C++
7.0 for Windows developers; Zortech C++ and Borland C++ for PM; Sun C++ 2.1
for Motif; Sun C++ 3.0 for Open Look; and MPW C++ 3.1 for the Macintosh.)
When the Macintosh library was ready, we compiled the application sources
using the MPW C++ compiler, a cfront with powerful language checking. However,
expressions like those in Example 2(a) resulted in messages like "Sorry, not
implemented," because of return objects created on the stack. To avoid this,
use code like that in Example 2(b).
Example 2: (a) Code like this caused cfront-generated error messages; (b) code
like this helps you avoid errors.

 (a)
 class object
 {
 Object GetNext();
 }
 if( bTest && ( aObj.GetNext() == aObj ) ) ...

 (b)
 Obj aNext = aObj.GetNext();

 if( bTest && ( aNext == aObj ) )...

The Sun C++ 2.1 compiler had difficulties with the precompiler. For example,
we defined the macro in Example 3(a) to concatenate two strings. The result
was a precompiler error, so we resorted to the code in Example 3(b).
Example 3: (a) This macro, which concatenates two strings, generated
precompiler errors; (b) code that avoids precompiler errors.

 (a)

 #define CONCAT( a, b ) a##b
 CONCAT( A, B ) // this will throw an error

 (b)
 #define CONCAT( a, b ) a/**/b

A bug in all cfronts is that they don't accept a default object in the
constructor Class:: Class( String aString = String ("Default") );. So, in
spite of having a list of the C++ expressions that cause problems when porting
source code, errors occurred again and again, causing us to repeatedly build
new versions of our project on the Macintosh and Sun.
After compiling and linking the word processor on the Macintosh, we tried to
run the application. It took us about a week to fix the thorniest bugs in the
library and start up the application. In the process, we also discovered
differences in the code generated by the various compilers. The most serious
problems were bugs in the Zortech and Borland C++ compilers.
For instance, the Zortech 3.0 compiler does not adjust the virtual-function
table in destructors of derived classes. If you destroy an instance of a
derived class and you use a virtual function in the destructor of the base
class, the compiler should call the implementation of the base class. Instead,
the Zortech compiler calls the implementation of the derived class, as shown
in Example 4. By destroying a derived instance, the destructor of Base calls
the virtual method foo(), and while it should call Base:: foo(), it calls
Derived::foo() instead.
Example 4: The Zortech compiler calls the implementation of the derived class
this way.

 class Base
 {
 ~Base() { foo(); }
 virtual foo();
 }
 class Derived : public Base
 {
 virtual foo();
 }

The second problem with the Zortech compiler occurs if you have a NUL pointer
to an object with a virtual destructor, and you delete this pointer. In such a
case, that class's destructor should not be called, but the Zortech compiler
does make the call because the compare with NUL is done inside the destructor,
not outside.
We ran across a problem in the Borland 3.0 compiler when we passed nameless
temporary objects in the constructor's initializer list. In Derived::Derived()
: Base( String( "abc" ) );, for instance, the constructor of the string is
called, but not the destructor. The same problem occurs in the new C/C++ 7.0
compiler from Microsoft. (In Borland C++ 3.1, however, this problem is
corrected.)
We collected all our compiler experiences in a test suite. Whenever we
received a new version of a compiler or we went to a new platform, we tested
the compiler against the suite. Although we'd written many test applications
for the class library, we found many more bugs in it. We learned that large
applications are much better test suites for a library than a set of small
tests.



_DEVELOPING A PORTABLE C++ GUI CLASS LIBRARY_
by Andreas Meyer

Example 1:

class MyDialog : public ModalDialog
{
protected:
 DefPushButton aOkButton;
public:
 MyDialog( Window* pParent );
 void QuitDialog( DefPushButton* pButton );
}

MyDialog::MyDialog( Window* pParent ) :
 ModalDialog( pParent ),
 aOkButton( this )
{
 aOkButton.ChangeClickHdl( LINK( this, MyDialog::QuitDialog ) );
}

void MyDialog::QuitDialog( DefPushButton* pButton )
{
 if( ... ) // check dialog state
 ModalDialog::EndDialog();
}



Example 2:



(a)

class Object
{
 Object GetNext();
}
if( bTest && ( aObj.GetNext() == aObj ) ) ...

(b)

Obj aNext = aObj.GetNext();
if( bTest && ( aNext == aObj ) ) ...


Example 3:

(a)

#define CONCAT( a, b ) a##b
CONCAT( A, B ) // this will throw an error

(b)

#define CONCAT( a, b ) a/**/b



Example 4:

class Base
{
 ~Base() { foo(); }
 virtual foo();
}
class Derived : public Base
{
 virtual foo();
}
























November, 1992
PROGRAMMING PARADIGMS


User-interface Worries




Michael Swaine


William F. Buckley always looks so relaxed when he's debating someone on
television. It's partly the slouch. And the smirk. Posture and posturing as
debating tools.
Buckley should really be relaxed these days, since he recently sloughed off a
long-time responsibility as editor of National Review. He claims to have been,
at the time of his retirement, the most senior editor in America, in terms of
the number of years on the job. I suspect that he is overlooking some other
pretty senior editors, such as William M. Gaines, long-time editor of Mad
magazine.
Gaines died this summer, but his magazine continues its policy of cheeky
irreverence, as shown in its recent flap with The American Rifleman, the
official magazine of the National Rifle Association.
Mad annoyed the NRA by publishing a satirical fake NRA ad in January. (There
was no question about it being a fake ad: Even those naive readers who might
be fooled by Mad's rather broad humor ought to have noticed that the magazine
has never carried paid advertising.) An editorial in The American Rifleman
fired back by encouraging its readers to write to Mad and express their
feelings. Hundreds did, some threatening to boycott (nonexistent) Mad
advertisers; Mad published some sample letters, and made plans to take its
next shot at the NRA.
I wasn't there in the Mad offices to see the reaction of co-editor John
Ficarra. I choose to imagine that he slumped in his chair and smirked
insufferably, just like William F. Buckley. I do know what he said. Expressing
the magazine's official response to the threats, he said "What, me worry?"


What, You Worry?


Would that we could all be as unworried as Alfred E. Newman or John Ficarra.
But most of us find reasons to worry from time to time. And some of us should
worry more. I'm thinking of Xerox managers in the late '70s, and graphical
user-interface designers in the early '90s.
The official computer-industry history has it that the graphical user
interface was mainly invented at Xerox PARC back in the '70s; that it was the
result of a lot of research, including research on what users actually did
with computers; that Xerox missed their window, so to speak; that Xerox also
failed to protect the PARC research and development efforts; and that the
company as a result lost all but moral superiority in the user-interface
market.
Are there things to worry about in user-interface design in the '90s? For
instance, should you be worried about the fact that certain user-interface
techniques have been patented? For example, combining scrolling with multiple
subwindows, or using exclusive-OR to put a cursor on the screen, or saving the
contents of an obscured window into off-screen memory for later refresh? All
patented. All grounds for lawsuit against you if you use them without license.
Or should you be worried by the fact that user interfaces have increasingly
become an object of copyright litigation? Should you worry about protecting
your user-interface efforts, or protecting yourself from some litigious patent
or copyright holder?
Xerox actually brought its graphical user interface to market in a commercial
business machine before Apple introduced the Mac. But the Star wasn't the
right machine at the right time, apparently--in any case, it didn't win the
day, even though the technology that did, as the official computer-industry
history tells it, was Xerox's. There are many ways to miss your window.
The Xerox PARC GUI is '80s technology, and we're in the '90s. Should you worry
about the rumors that the desktop metaphor is on its way out? Is it due to be
replaced by the Next Big Thing? And if so, which Next Big Thing? Multimedia
computing? The Apple-IBM operating system? Enterprise Computing? Is it
possible that, by investing time and money in current-paradigm, user-interface
design, you will be missing the window into the Next Big Thing?
The argument has been mooted here a couple of months ago that multimedia
computing could require--and deliver--a new interface, obsoleting the
PARCface. Or even that it could bring about the demise of user interface
entirely in favor of "user experience." Taligent has been discussed often
enough here, too. Neither of these Next Big Things seems to represent an
immediate threat to the efforts of user-interface designers. Enterprise
computing does raise some questions about who will use computers and for what,
and who will buy the software, and what will be expected of applications, and
possibly even about user interface.
Should you worry about usability testing? The user research done at PARC was
groundbreaking, but it's old. User interfaces have evolved since the '70s.
Even the users have changed: Instead of the old naive users of that era, we
have new naive users. Is it worth worrying whether your user-interface design
is usable? Is anybody checking to see whether users appreciate what new user
interfaces do for them? Should you be checking, or at least worrying?


Rights and Wrongs


In A Connecticut Yankee in King Arthur's Court, Mark Twain wrote:
The very first official thing I did, in my administration--and it was on the
very first day of it, too--was to start a patent office; for I knew that a
country without a patent office and good patent laws was just a crab, and
couldn't travel any way but sideways or backways.
Much has been said, here and elsewhere, about the value of patents in
programming. One organization that might not agree with Mark Twain is The
League for Programming Freedom. The League opposes the granting of patents for
software and copyrights for user interfaces.
I'll try to summarize, fairly but succinctly, the League's position on patents
and copyrights. Although some of the recent lawsuits seem specious to me, I'm
not so much promoting the League's position as pointing out that there is a
clearly articulated position on the issue. Defenders of software patents and
user-interface copyright, I think, need to address the points raised by the
League.
The League's software-patent arguments run something like this:
1. Software patents are unnecessary. The protection afforded by trade secrets
and implementation copyright was always sufficient before. There was no
evidence, before the Patent Office began granting so many patent applications
in the early '80s, that innovation in software was being hampered by lack of
patent protection.
2. The system isn't working. The Patent Office began granting patents with no
software-savvy examiners on staff. Even now, it doesn't pay competitive
salaries to get competent examiners. Many of the decisions of the examiners
are questionable even if you believe software patents are a good thing.
3. Software isn't like hardware. The differences argue for different treatment
under the law.
4. Software patents are hampering innovation. Today, a developer of software
should run a series of costly patent searches before releasing a new piece of
commercial software, and even that won't guarantee that the developer hasn't
violated someone's patent. This is something to worry about.
Another thing to worry about if you decide to try to patent something these
days is the possibility of losing your right to even talk about your
invention. The Patent Office has been passing along patent applications to the
Pentagon for a national security check, and record numbers of applications
have been coming back stamped "SECRET." Seven hundred per year over the past
three years.
If your patent application gets stamped "SECRET," you are enjoined from even
talking about it, and you don't get the patent. The policy is a remnant of
cold war paranoia and has been challenged. There seems to be some reason to
believe that the policy will get changed, perhaps by letting the Commerce
department fill the Pentagon's role in the process. Whether that will be an
improvement or not is something to worry about. The whole deal looks like one
more argument against applying for a patent.
The League's user-interface copyright arguments run something like this:
Copyright is a government-imposed monopoly whose sole purpose under law is to
"promote the progress of science and the useful arts." In the case of
user-interface design, it does not do that. Particularly, it does not protect
small, innovative companies from large ones, it discourages meaningful
competition, it is costly and inconvenient for users, it prevents much-needed
standardization, it promotes incompatibility, it discourages useful
incremental innovations, it permits extortionary threats of litigation by
large companies, it is not a necessary incentive for the development of new
interfaces, and interface developers themselves don't want it. We ought to
oppose the granting of copyrights that do not "promote the progress of science
and the useful arts."
If you find the League's positions interesting, or if you find my summary of
them inadequate, you may want to get in touch with the League for Programming
Freedom, One Kendall Square, #143, P.O. Box 9171, Cambridge, MA 02139.
Mark Twain really did believe in the Patent Office, and lost a bundle backing
the latest publishing technology. He spent the last years of his life earning
back the money to pay back those who had invested in his schemes. He was more
successful as a writer than as a technology analyst.


Which Next Big Thing?


Enterprise computing is being touted by some as the Next Big Thing, and the
touts claim that it will change the way applications are designed, the way
software is marketed and purchased, and yes, the way the user interacts with
the applications.
The notion seems to be that, generation #1 having been mainframes, generation
#2 minis, and generation #3 personal computers, the platform of choice in
generation #4 will be the network. Shades of Sun. How long ago was Sun Micro
advising us that, "The network is the computer?"
In the world of enterprise computing, emphasis will shift back to
enterprise-wide tasks and away from the personal-productivity tasks that have
characterized the personal-computer universe. Away from word processing and
spreadsheets and back toward accounts receivable, payroll, inventory, order
processing, and manufacturing logistics. Away from single-user systems and
toward multiuser systems, distributed processing, relational-database storage,
and standard query language access.
Aside from Lotus, whose Notes product is an enterprisewide application,
personal computer software companies are not working on enterprise computing
and are not expected by savvy analysts to be big players in that market.

Does that mean that enterprise computing companies, perhaps raised in the
mainframe or minicomputer worlds, will be producing the user interfaces of the
next generation?
Well, let's reason it out. Let's assume, and it's a big assumption, that
enterprise computing really takes off and sucks investment money away from
personal-productivity software and becomes the center of the computing
universe. Even so, the developers of applications for enterprise systems will
have to deal with an installed base of personal-productivity applications and
operating systems. There does not seem to be any compelling reason for them to
make things difficult for the user by challenging the user-interface
assumptions of today.
The inertia of the installed base will carry a lot of weight.


Usability Testing


My sources tell me that over 34,000 of you are currently developing GUI
applications for Windows. You may or may not know what Microsoft is doing in
the way of user-interface usability testing.
It's impressive. Users sit in a lab and perform for the Microsoft testers.
They get videotaped. They are encouraged to think aloud as they work, and
their verbalized thoughts are captured on tape. They are observed through
one-way glass. They are monitored by software external to the applications
they are using, and also by instrumented versions of those applications.
The sheer mass of data collected and the variety of means for collecting it
aren't alone very impressive. Exploratory data collection and analysis is fine
for generating hypotheses and research questions, but for answering questions
the data collection should be theory driven.
In fact, that's how it is at Microsoft. Specific questions are asked and
specific hypotheses are tested in the lab. The hypotheses and questions arise
from lab research and from more anecdotal sources: product support calls,
letters, responses to coupon ads, focus groups, and field studies. The
exploratory research leads to models of user interaction or activity
structure, and these models produce specific questions to be resolved in the
lab.
Microsoft cites several tangible results of its usability testing for
user-interface design. Replacing the cut-and-paste model for text manipulation
in Word for Windows 2.0 was a result of such testing. In a more mundane vein,
testers learned how more informative dialog box messages can cut down user
errors. The concept is obvious enough, sure, but by doing the testing they got
specific data on cases where it did and didn't work.
So do you need to worry about usability testing? Probably not, unless that's
your job or the product is all yours and you're not content to follow the lead
of a company like Microsoft that has done the worrying for you.


Education Market


And now for something completely different.
Let's say you've got a friend. This friend is a an applications or systems
programmer for a medium-to-large-sized company. But that's just his day job.
In the evenings and on weekends, your friend programs. He can't help himself,
he has to do it. But he isn't just doing it for fun; he dreams of launching
his own software company one day. It's not necessary that he make it big; his
day job's secure; but programming is what he does, and there's so much poor
software out there, and he keeps coming up with these ideas....
He has a kid, and some of the ideas are ideas for educational products. But
although his kid likes the stuff he writes, he's uncertain about educational
software as a market. Don't you need to be a teacher to write that stuff?
Don't you have to get accepted into the curriculum through some bureaucratic
process? And aren't American schools (at least) so poverty-stricken that they
can't buy paper, let alone software?
There is a market for educational products, but it's a good thing that your
friend doesn't need to make it big. He shouldn't try to sell to schools, of
course, but to parents like himself. Many distributors have a good track
record with educational software, and if your friend is thinking of products
that include some art or music or video, his chances will be better.
Much educational software is not high-investment. Your friend doesn't have to
include video; a good idea, well implemented, could sell well. And some
educational software is serial in nature: A well-received product can open the
door for other similar products. Content can be a differentiating factor, and
he should note that the price for pressing a CD-ROM has been dropping
precipitously over the past years.
There are the well-known distributors: Broderbund, The Bureau of Electronic
Publishing, Educorp, Great Wave Software, The Learning Company. But have you
heard of Parsons Technology, Orange Cherry Software, Nordic Software? MECC is
a distributor that actually can get your friend's product into schools.
Information about these distributors and a lot more can be found in a fat book
titled Pride's Guide to Educational Software, by Bill and Mary Pride,
published in 1992 by Crossway Books, a Division of Good News Publishers, 1300
Crescent Street, Wheaton, IL 60187. Yes, Good News Publishers appears to be a
religious book publishing company, but your friend shouldn't let that bias
him. This is a good book for the irreligious, too.
The authors know their stuff, and this book is a good look at what's out
there. It includes many reviews of educational products in various categories
for various levels. Although it doesn't cover everything or every level or
category evenly, I was impressed by its scope and the apparent expertise of
the authors, who wrote all the reviews themselves.
If I, like your friend, were thinking about writing some educational software,
I'd study this book. It might help me evaluate the market, size up my chances,
price my product, and find a distributor. On the other hand, it might tell me
that the program I planned to write had been written already and could be had
for $19.95 through educational-software sources everywhere. Either way, it
would be helpful.

































November, 1992
C PROGRAMMING


Tools USA and D-Flat++ Off and Running




Al Stevens


I'm writing this in August, having just returned from the Tools USA '92
conference, a small gathering at the University of California, Santa Barbara,
that addresses the technology of object-oriented languages and systems. The
conference is hosted by Bertrand Meyer, the architect and purveyor of the
Eiffel object-oriented development environment and the author of several books
on object-oriented design.
Mike Floyd of DDJ and I were on the program with a workshop titled, "User
Interface Design." It was our intention to discuss the issues raised by my
experiences with D-Flat++. Much of that experience should interest
object-oriented programmers and designers. The project involves porting a
medium-sized C library to C++. It implements the CUA interface in C++. It
reveals things about the use of a C++ class library as a user-interface API.
There were about 20 attendees in the workshop when we started. Some questions
with shows of hands told us several things about the group. They were all
interested in user-interface design, but most of them did not know what CUA
is, and few had any knowledge of C++. About two minutes into the technical
discussion, I began to feel like the proprietor of Phil's on Murphy Brown--I
wanted to install speed bumps to keep the customers from racing out.
Obviously, the goals of our workshop were not clearly advertised.
Not so obvious was that very few Tools USA '92 attendees cared a whit about
C++. That became evident later when I cruised the show and talked to people.
These were clearly Eiffel constituents, and there was little that they wanted
to learn from me. Many of them were MIS management types, and most of them had
never heard of Dr. Dobbs Journal. At least that part got corrected.
The keynote address followed a recent trend, in which speakers use the podium
to peddle their own products. In this case, the product was an object-oriented
database with shades of SQL. To begin with, the speaker used a practice common
among those who pitch object-oriented database-management systems. In
comparing the object-oriented data model to the traditional relational model,
they mislead their listeners about the abilities of relational databases. They
create artificial deficiencies to bolster arguments for their product. This is
unfortunate. The object-oriented database model--once we agree on just what
that is--has its place, and it solves some problems that other data models do
not support well. But no good purpose is served by negative campaigns to
promote it. Nonetheless, this particular speaker did that and then followed
the technical presentation with a lengthy, unabashed commercial for his
product. I expected to open the conference proceedings and find a reader
service number and a bingo card.
The conference provided lodging in the campus dorms. I had forgotten the
Spartan life. I haven't had a cold shower since my own long-ago school
days--nor felt the need for one.
In the balance, Tools USA '92 was informative and professional. I took out
more than I contributed and recommend it to anyone who wants to improve their
knowledge and keep up with the latest techniques in object-oriented design.
The next edition is in Versailles, France in March of '93. That would be a
nice one to attend. I wonder if they need anyone to park cars.
My seat-mates on the return trip to Orlando were two sisters, aged 10 and 8,
on their way to Disney World. Their parents were in the row behind us. The
sisters were nice, mannerly young ladies, but at every move their father would
yell at them. "Sit up straight!" "Eat your lunch!" "Don't do that!" The girls
would flinch, and I could sense that they were used to this harangue. Sometime
during the trip one of the girls asked where I'd been. When I said I'd been to
Santa Barbara, she asked if I worked for the state government. Her father
works for the state, she said, and goes to Santa Barbara sometimes. "What does
your father do?" I asked. She answered, "If you don't pay your taxes, he comes
and gets your car."
Probably repossesses pacemakers, too.


D-Flat++ Goals


The D-Flat++ class library is well underway. Its design is largely influenced
by my experiences with D-Flat. DF++ will support my original requirements
better than D-Flat does, because it will be less, not more, and it will be in
C++. The goals that launched D-Flat were for a DOS-based, text-mode CUA
library in C to support some applications that I was developing. The library
grew with features as I learned more about CUA and as readers responded. But
when I look at my D-Flat applications, I realize that I never used many of its
features. The example Memopad application uses more D-Flat features than any
of my real applications. I wondered why, and I concluded: A 25x80 text-mode
screen does not support full-blown CUA all that well. For example, why have a
multiple-document interface with minimized windows on a text-mode screen? What
good are simulated icons on such paltry screen real estate?
CUA is a moving target. IBM has modified their standard reference document at
least once. I get e-mail messages that complain that D-Flat doesn't work
exactly like this or that "CUA-compliant" program. Every program has its own
interpretation. IBM's standard reference document is unambiguous, but no one
seems to be reading it. Microsoft is redefining the standard with minor
changes--improvements, actually--to the Windows 3.1 interface. Their version
will, no doubt, prevail.
What are our requirements? If we need a full-blown C++ CUA package, we have
several avenues. Microsoft's and Borland's application-frameworks packages are
C++ classes that hook into the Windows API. Borland's Turbo Vision is a
full-featured C++ class library that implements CUA in DOS text-mode. What do
we need from D-Flat++ that we can't get elsewhere? We need a simple class
library that implements the minimum features necessary to launch a
single-user, single-document application. The CUA part will support menus,
dialog boxes, and the usual controls.


The Desktop


This month we will look at the design of the D-Flat++ desktop and its
input/output devices and discuss some DF++ concepts in general terms.
DF++ is an object-oriented design. An application will operate from within a
global object called the "desktop." The DeskTop class contains one
application-window object and the device objects--the mouse, keyboard, screen,
cursor, speaker, and clock objects. The application program does not declare
most of these objects. There is already a DeskTop object named desktop when
the program starts up, and it already has all the devices except for the mouse
if no mouse driver is installed. The application program does declare the
application window, which automatically associates itself with the global
desktop object.
A DF++ program will define its application by defining menus and dialog boxes
and deriving a custom application window class from the base Application
class. DF++ adopts the event-driven, message-based model of D-Flat, but uses
the features of C++ to manage the messages. The application program sets up
the application window and lets the desktop object collect events and dispatch
messages. The difference is that the messages are class member functions that
belong to the various window classes. The oblique SendMessage processing of
D-Flat is unnecessary in DF++.
Listing One, page 182, is desktop.h, which defines the DeskTop class. The
class includes pointers to the application window, the window that has the
focus, and the window, if any, that has captured the focus.
Having the focus is a characteristic shared by most window libraries that use
the desktop metaphor. A user has a screen with a number of windows on it. Some
of them display information that the user needs to read; others provide ways
for the user to enter information. There are menus, list boxes, edit boxes,
buttons, and so on, and the user's attention is focused on just one of them at
a time. That control window is said to "have the focus." When the user presses
a key or moves or clicks the mouse, that event should go to the window that
has the focus. The key might scroll a window's text, add the typed character
to the window's text, push a button, or perform any of a number of things that
the user can do. It might even cause another window to get the focus.
Sometimes a window object will capture the focus, which means that until the
window releases it, none of the user's actions will be sent to another window.
The desktop maintains a simple listhead of windows that have captured the
focus. When a window captures the focus, it joins the list. When the window
releases the focus, the desktop surrenders the focus to the window that had it
last.
The desktop declares all the device objects. Listings Two through Seven,
beginning on page 182, are screen.h, mouse.h, cursor.h, keyboard.h, speaker.h,
and clock.h, which define the classes for the device objects. The application
program will interrogate or modify these devices by sending messages to them
through the desktop.


Events and Messages


The application calls the desktop's DispatchEvents function to start event and
message processing. That function then calls the DispatchEvents functions of
the sysmouse, syskeyboard, and sysclock objects. Those functions poll their
respective devices and call the appropriate member function of the window that
should get the message. The device object determines which window that should
be. Keyboard messages go to the window that has captured the focus, if any.
Otherwise they go either to the window that has the focus or to the
application window. In all cases, the messages are sent as calls to the
window's member function through one of the pointers in the desktop object.
Mouse messages go to the window where the mouse is pointing unless a window
has captured the mouse, in which case, the capturing window gets the message,
regardless of where the mouse cursor points.
When a window object gets a message, it can do one of four things. First, the
window object can process the message and return. Second, it can decide that
it has no interest in the message but that some base class somewhere up the
class hierarchy to which the window class belongs should process the message.
Third, the window object can do a combination of the first two. Finally, it
can intercept and reject the message, deciding that neither it nor its base
classes should process the message.
Suppose that you have an EditBox class derived from a TextBox class (which you
will, eventually), and that an object of that class has the focus. The user
presses a key, and the EditBox class gets a keyboard message. Example 1 shows
how the class would exercise the four options just discussed for message
processing.
Example 1: Message processing.

 Void EditBox::Keyboard(int key)

 {
 switch (key) {
 case F1:
 // --- process the F1 key
 // ...
 break;

 case F2:
 // --- pass the F2 key up the hierarchy
 TextBox::Keyboard(key);
 break;
 case F3:
 // --- process the F3 key
 // ...
 // --- pass the F3 key up the hierarchy
 TextBox::Keyboard(key);
 break;
 case F4:
 // --- intercept and reject the F4 key
 break;
 default:
 break;
 }
 {

Window objects can send messages to other window objects or to themselves.
Menus pop down because of messages between windows, and application's
processes get called by menu-selection messages. Later columns will describe
these processes in detail. The concept of sending messages is not new to C++
programmers nor to programmers in Windows-like, event-driven environments.
This project uses C++ messages to implement event-driven messages. So far in
my development, the DF++ programs are simpler and more expressive than the
D-Flat programs, no doubt because of the natural fit of C++ messages and
event-driven user-interface messages.


The Cigar-box User-interface Paradigm


Mike Hall is a gentleman bartender in my hometown, and we have spent a lot of
time together. Mike's current station is in a small lounge in a beachside
resort hotel owned by some folks in England. They installed a homegrown
point-of-sale system with terminals throughout the hotel. Its user interface
is remarkable, to say the least. Their own programmer contrived it and
steadfastly insists that the interface is so closely and tightly intertwined
with the rest of the program that it cannot be modified. It takes Mike about
12 keystrokes on a custom array of plastic-covered function keys to ring up
the sale of a beer. Every transaction returns to the top-level menu, and he
has to start from the top no matter what. Often the printer goes off into the
weeds and forgets what font it should use. A receipt can take a minute to
print because the THANK YOU is printed in dot-matrix graphics letters a foot
high. The program crashes a lot and forgets the cumulated total on a guest
check, so Mike has to post it all over again. They told Mike that the system
is their check-and-balance to match inventory with sales and control what the
industry calls "shrinkage"--another word for employee theft. Mike told them
that if they'd turn off the computer, give him a cigar box to put the cash in,
and let the employees steal $1000.00 a month, they'd be money ahead.

C PROGRAMMING COLUMN_
by Al Stevens


[LISTING ONE]

// ---------- desktop.h
#ifndef DESKTOP_H
#define DESKTOP_H

#include "screen.h"
#include "cursor.h"
#include "keyboard.h"
#include "mouse.h"
#include "speaker.h"
#include "clock.h"

class DFWindow;

class DeskTop {
 DFWindow *apwnd; // application window
 DFWindow *infocus; // current window with the focus
 DFWindow *firstcapture; // first window to capture the focus
 DFWindow *focuscapture; // current window with captured focus
 // ------- the desktop devices
 Screen sysscreen; // the system screen
 Mouse sysmouse; // the system mouse
 Keyboard syskeyboard; // the system keyboard
 Cursor syscursor; // the system cursor
 Clock sysclock; // the system clock
 Speaker sysspeaker; // the system speaker
public:
 DeskTop();

 ~DeskTop();
 DFWindow *ApplWnd() { return apwnd; }
 void SetApplication(DFWindow *ApWnd) { apwnd = ApWnd; }
 Bool DispatchEvents();
 DFWindow *InFocus() { return infocus; }
 DFWindow *FocusCapture() { return focuscapture; }
 DFWindow *FirstCapture() { return firstcapture; }
 void SetFocus(DFWindow *wnd) { infocus = wnd; }
 void SetFocusCapture(DFWindow *wnd) { focuscapture = wnd; }
 void SetFirstCapture(DFWindow *wnd) { firstcapture = wnd; }
 // ------- the desktop devices
 Mouse& mouse() { return sysmouse; }
 Screen& screen() { return sysscreen; }
 Keyboard& keyboard() { return syskeyboard; }
 Cursor& cursor() { return syscursor; }
 Clock& clock() { return sysclock; }
 Speaker& speaker() { return sysspeaker; }
};
extern DeskTop desktop;

#endif




[LISTING TWO]

// ----------- screen.h
#ifndef SCREEN_H
#define SCREEN_H

#include <dos.h>
#include "dflatdef.h"
#include "rectangl.h"

class Screen {
 unsigned address;
 unsigned mode;
 unsigned page;
 unsigned height;
 unsigned width;
 union REGS regs;
 // ---- compute video offset address
 unsigned vad(int x, int y) { return y * (width*2) + x*2; }
public:
 Screen();
 unsigned Height() { return height; }
 unsigned Width() { return width; }
 unsigned Page() { return page; }
 Bool isEGA();
 Bool isVGA();
 Bool isMono() { return (Bool) (mode == 7); }
 Bool isText() { return (Bool) (mode < 4); }
 void Scroll(Rect &rc, int d, int fg, int bg);
 unsigned int GetVideoChar(int x, int y);
 void PutVideoChar(int x, int y, unsigned int c);
 void WriteVideoString(char *s,int x,int y,int fg,int bg);
 void SwapBuffer(Rect &rc, char *bf,
 Bool Hiding, Bool HasShadow, Bool isFrame);

 void GetBuffer(Rect &rc, char *bf);
 void PutBuffer(Rect &rc, char *bf);
};
const int VIDEO = 0x10;
inline int clr(int fg, int bg)
{
 return fg (bg << 4);
}
#endif






[LISTING THREE]

// ---------- mouse.h
#ifndef MOUSE_H
#define MOUSE_H

#include <dos.h>
#include "dfwindow.h"
#include "timer.h"

class Mouse {
 Bool installed; // True = mouse is installed
 char *statebuffer; // mouse state buffer
 Timer doubletimer; // mouse double-click timer
 Timer delaytimer; // mouse typematic click timer
 int prevx; // previous mouse x coordinate
 int prevy; // previous mouse y coordinate
 int clickx; // click x position
 int clicky; // click y position
 int releasex; // release x position
 int releasey; // release y position
 union REGS regs;
 DFWindow *MouseWindow(int mx, int my);
 void CallMouse(int m1,int m2=0,int m3=0,int m4=0,unsigned es=0);
 void DispatchRelease();
 void DispatchMove();
 void DispatchLeftButton();
public:
 Mouse();
 ~Mouse();
 Bool Installed() { return installed; }
 void GetPosition(int &x, int &y); // get mouse position
 void SetPosition(int x, int y); // set mouse position
 Bool Moved(); // True if mouse has moved
 void Show(); // show the mouse cursor
 void Hide(); // hide the mouse cursor
 Bool LeftButton(); // True if left button is pressed
 Bool ButtonReleased(); // True if button was released
 void SetTravel(int minx, int maxx, int miny, int maxy);
 void DispatchEvent();
};
const int MOUSE = 0x33; // mouse interrupt vector
// -------- mouse commands
const int RESETMOUSE = 0;

const int SHOWMOUSE = 1;
const int HIDEMOUSE = 2;
const int READMOUSE = 3;
const int SETPOSITION = 4;
const int BUTTONRELEASED = 6;
const int XLIMIT = 7;
const int YLIMIT = 8;
const int BUFFSIZE = 21;
const int SAVESTATE = 22;
const int RESTORESTATE = 23;
// -------- timer delays for mouse repeat, double clicks
const int DELAYTICKS = 1;
const int FIRSTDELAY = 7;
const int DOUBLETICKS = 5;

#endif





[LISTING FOUR]

// ------------- cursor.h
#ifndef CURSOR_H
#define CURSOR_H

// ------- video BIOS (0x10) functions
const int SETCURSORTYPE = 1;
const int SETCURSOR = 2;
const int READCURSOR = 3;
const int HIDECURSOR = 0x20;

const int MAXSAVES = 50; // depth of cursor save/restore stack
class Cursor {
 // --- cursor save/restore stack
 int cursorpos[MAXSAVES];
 int cursorshape[MAXSAVES];
 int cs; // count of cursor saves in effect
 union REGS regs;
 void Cursor::GetCursor();
public:
 Cursor();
 ~Cursor();
 void SetPosition(int x, int y);
 void GetPosition(int &x, int &y);
 void SetType(unsigned t);
 void normalcursor() { SetType(0x0607); }
 void Hide();
 void Show();
 void Save();
 void Restore();
 void SwapStack();
};
inline void swap(int a, int b)
{
 int x = a;
 a = b;
 b = x;

}
#endif





[LISTING FIVE]

// ------------ keyboard.h
#ifndef KEYBOARD_H
#define KEYBOARD_H

#include <dos.h>
#include "dflatdef.h"

class Keyboard {
 union REGS regs;
 int shift;
public:
 Keyboard() { shift = GetShift(); }
 Bool ShiftChanged();
 int ShiftState() { return shift = GetShift(); }
 int AltConvert(int);
 int GetKey();
 int GetShift();
 Bool KeyHit();
 void clearBIOSbuffer();
 void DispatchEvent();
};
const int KEYBOARDVECT = 9;
const int KEYBOARDPORT = 0x60;

inline void Keyboard::clearBIOSbuffer()
{
 *(int far *)(MK_FP(0x40,0x1a)) =
 *(int far *)(MK_FP(0x40,0x1c));
}
// ----- keyboard BIOS (0x16) functions
const int KEYBRD = 0x16;
const int READKB = 0;
const int KBSTAT = 1;

const int ZEROFLAG = 0x40;
const int OFFSET = 0x1000;

const int RUBOUT = 8;
const int BELL = 7;
const int ESC = 27;
const unsigned F1 = (187+OFFSET);
const unsigned F8 = (194+OFFSET);
const unsigned SHIFT_F8 = (219+OFFSET);
const unsigned F10 = (196+OFFSET);
const unsigned HOME = (199+OFFSET);
const unsigned UP = (200+OFFSET);
const unsigned PGUP = (201+OFFSET);
const unsigned BS = (203+OFFSET);
const unsigned FWD = (205+OFFSET);
const unsigned END = (207+OFFSET);

const unsigned DN = (208+OFFSET);
const unsigned PGDN = (209+OFFSET);
const unsigned INS = (210+OFFSET);
const unsigned DEL = (211+OFFSET);
const unsigned CTRL_HOME = (247+OFFSET);
const unsigned CTRL_PGUP = (132+OFFSET);
const unsigned CTRL_BS = (243+OFFSET);
const unsigned CTRL_FIVE = (143+OFFSET);
const unsigned CTRL_FWD = (244+OFFSET);
const unsigned CTRL_END = (245+OFFSET);
const unsigned CTRL_PGDN = (246+OFFSET);
const unsigned SHIFT_HT = (143+OFFSET);
const unsigned ALT_A = (158+OFFSET);
const unsigned ALT_B = (176+OFFSET);
const unsigned ALT_C = (174+OFFSET);
const unsigned ALT_D = (160+OFFSET);
const unsigned ALT_E = (146+OFFSET);
const unsigned ALT_F = (161+OFFSET);
const unsigned ALT_G = (162+OFFSET);
const unsigned ALT_H = (163+OFFSET);
const unsigned ALT_I = (151+OFFSET);
const unsigned ALT_J = (164+OFFSET);
const unsigned ALT_K = (165+OFFSET);
const unsigned ALT_L = (166+OFFSET);
const unsigned ALT_M = (178+OFFSET);
const unsigned ALT_N = (177+OFFSET);
const unsigned ALT_O = (152+OFFSET);
const unsigned ALT_P = (153+OFFSET);
const unsigned ALT_Q = (144+OFFSET);
const unsigned ALT_R = (147+OFFSET);
const unsigned ALT_S = (159+OFFSET);
const unsigned ALT_T = (148+OFFSET);
const unsigned ALT_U = (150+OFFSET);
const unsigned ALT_V = (175+OFFSET);
const unsigned ALT_W = (145+OFFSET);
const unsigned ALT_X = (173+OFFSET);
const unsigned ALT_Y = (149+OFFSET);
const unsigned ALT_Z = (172+OFFSET);
const unsigned ALT_1 = (0xf8+OFFSET);
const unsigned ALT_2 = (0xf9+OFFSET);
const unsigned ALT_3 = (0xfa+OFFSET);
const unsigned ALT_4 = (0xfb+OFFSET);
const unsigned ALT_5 = (0xfc+OFFSET);
const unsigned ALT_6 = (0xfd+OFFSET);
const unsigned ALT_7 = (0xfe+OFFSET);
const unsigned ALT_8 = (0xff+OFFSET);
const unsigned ALT_9 = (0x80+OFFSET);
const unsigned ALT_0 = (0x81+OFFSET);
const unsigned ALT_HYPHEN = (130+OFFSET);
const unsigned CTRL_F4 = (225+OFFSET);
const unsigned ALT_F4 = (235+OFFSET);
const unsigned ALT_F6 = (237+OFFSET);

enum {CTRL_A=1,CTRL_B,CTRL_C,CTRL_D,CTRL_E,CTRL_F,CTRL_G,CTRL_H,
 CTRL_I,CTRL_J,CTRL_K,CTRL_L,CTRL_M,CTRL_N,CTRL_O,CTRL_P,
 CTRL_Q,CTRL_R,CTRL_S,CTRL_T,CTRL_U,CTRL_V,CTRL_W,CTRL_X,
 CTRL_Y,CTRL_Z };
// ------- BIOS shift key mask values
const int RIGHTSHIFT = 0x01;

const int LEFTSHIFT = 0x02;
const int CTRLKEY = 0x04;
const int ALTKEY = 0x08;
const int SCROLLLOCK = 0x10;
const int NUMLOCK = 0x20;
const int CAPSLOCK = 0x40;
const int INSERTKEY = 0x80;

#endif




[LISTING SIX]

// --------- speaker.h
#ifndef SPEAKER_H
#define SPEAKER_H

class Speaker {
 void Wait(int n);
public:
 void Beep();
};
const int FREQUENCY = 100;
const long COUNT = 1193280L / FREQUENCY;

#endif





[LISTING SEVEN]

// --------- clock.h
#ifndef CLOCK_H
#define CLOCK_H

#include "timer.h"

class Clock {
 Timer clocktimer;
public:
 Clock();
 void DispatchEvent();
};
#endif














November, 1992
STRUCTURED PROGRAMMING


Implementation License




Jeff Duntemann, KG7JF


After reading my July column, a friend of mine suggested that I see Whoopi
Goldberg's new film, Sister Act. "If you like nuns, Jeff, you'll love it."
Yes, I like nuns. It's hard not to admire people who make you eat your words.
Thirty years ago, I heard entirely too often, "You'll all thank me
someday"--usually for forcing us to bust butt on fractions, phonics, or
spelling when we'd rather be out digging holes in empty lots or raiding the
neighbor's garbage cans for broken TV sets. We all swore we'd hate them all
our lives, but I think most of us grew up sufficiently over time to grin a
little, swallow the memory of all that preadolescent anguish, and toss a
little prayer in the Sisters' direction, wherever they are.
Me, I've gone farther than forgiveness. The very values I hold most
important--guts, literacy, discipline, hard work, an ability to harmonize, and
unswerving adherence to a set of Very High Principles--are the things nuns are
made of. Laugh if you want; you could do worse.
And Sister Act is funny indeed. In the film, a black Las Vegas lounge singer
witnesses a Mafia murder, and before testifying is hidden for her protection
in an old-style convent in San Francisco. Among other things, she teaches the
choir of mostly elderly nuns how to sing and redefines sacred music a little
in the process.
At the film's finale, "Sister Mary Clarence" (Whoopi) leads the choir in a
slow, very solemn hymn that took me several seconds to identify. It wasn't a
hymn--not exactly. In fact, it was an old girl-group #1 rock and roll song
recorded originally in 1963 by Little Peggy March: "I Will Follow Him." It was
the same melody and the same words, but sung in such a way that you know
they're talking about an entirely different Him.


Implementation License


Schlocky old pop music is a special contrarian passion of mine, and I know it
well enough to be familiar with a number of cases like this: The same song,
performed so differently by different artists can sound like, well, almost a
different song.
Whether you're an aspiring musician or an aspiring programmer, sooner or later
you must confront the intriguing questions: Given a design (or a score), how
much room is there for innovation? How closely do you have to follow a design
to stay out of trouble?
At this point, music and hacking part company, mostly because down here in
Hackerville we haven't even decided yet on what constitutes a design. I've
been handed a five-page text description of a 50,000-line system and been told
to "Just Go Do It." That's one end of the spectrum; in cases like that, you
basically do the design yourself, implement it, and then tell the Big Guys
that it was just what they asked for, whether it is or not.
Some firms actually spend considerable time using formal methods to create a
design, usually in the form of Yourdon-style data-flow diagrams and minispecs,
or in some other formal notation system. My experience here is a little
sparse, but in conversations with other programmers I've heard tales of
getting so bogged down in an unimplementable design that the programmer had no
choice but to resign--by that time the design had undergone a sort of
apotheosis and was not challengeable by anyone beneath the vice-presidential
level.
Designs become unimplementable for a number of reasons, nearly all of which
have to do with designing beyond your constraints. Bad analysis can produce an
unimplementable design, but even the best analysis can create a design that
just can't happen. One ugly pitfall is expecting a design to be portable
across platforms without a significant workover. Hell, the design worked on
the Sigma 9 in 1977; why can't you put it up on a 486? Still, most
unimplementable designs come from arrogant designers who simply write off
brick-wall constraints as "implementation issues" and act as though they
aren't there.
This is why I continue to hold that the people who write the code are the
people who should draft and perfect the design. A design that runs afoul of
its constraints is dead meat, and nobody understands constraints like
programmers, sheesh.
I'm spending this run of columns discussing design issues. Applications come
in a number of broad species, and I'll try to address all of those with which
I'm familiar and the special design considerations connected to each of them.


Telecommunications Applications


Some types of applications lend themselves to certain types of designs. Some
classes of applications are in fact so shaped by the nature of the platform
that the design is almost a given. My best example of this is a communications
program. I wrote a number of these while I worked at Xerox, and a few more
since then on my own, and no matter how I approached the project at the
outset, once the code was done they all ended up shaped to much the same way.
I've spent a lot of time trying to do this a fundamentally better way, but no
such way has ever emerged from my experiences.
This may change once we move away from DOS to a multithreaded operating system
like OS/2 or Windows NT, but we won't know until we get there. (Non-C
development tools are significantly absent in the OS/2 arena, which won't help
the operating system's popularity, especially with this particular columnist.)
I have a sense that I could break down the canonical communications-program
polling loop into a number of highly independent threads, each implementing a
hard-shelled little state machine that performs some part of the work. Within
a couple of years, I'm sure I'll find out if this is the case.


The Algorithm that Wouldn't Die


But until then, the highest-level algorithm for a communications program will
probably look very much like Figure 1. The heaviest black arrows represent the
main polling loop. The pairs of smaller black arrows represent function calls
out of the loop into utility procedures. The lightest arrows represent the
movement of data within the system.
The shaded rings are interrupt-driven ring buffers. Ring buffers are circular
queues, FIFO structures of a fixed size that wrap around seamlessly using a
head pointer and a tail pointer. They can be implemented as objects as long as
you take care with the interrupt drivers.
Both modem input and output are buffered. (The lightning-bolt symbols indicate
an interrupt-driven data path.) When the modem receives a complete character,
it generates an interrupt, and the interrupt service routine (ISR) places the
inbound character into the next free position in the inbound ring buffer. The
polling loop (which may be temporarily busy doing something else) then picks
up the character for processing when it gets a chance. Similarly, there is an
outbound character queue where the application can place large chunks of data
at one time (say, from a fast disk read) and let the modem take the data as
fast as it can. Typically, the modem port can generate an interrupt each time
it successfully transmits a character and has room for the next one. The
outbound ISR takes the next character from the outbound queue and stuffs it
out to the modem for transmission. This continues (independent of the polling
loop) until the outbound queue is empty.
The algorithm itself is very simple. A polling loop checks a number of devices
to see if something needs to be done. The first check might be to the
keyboard. Is a keystroke waiting? If so, it is picked up and processed. The
character might be intended for transmission to the modem, in which case it is
placed in the outbound ring buffer. The character might also be a command to
the program itself, in which case the command is parsed and some utility
routine in the application is called.
Once the keyboard is checked, the loop moves on to the outbound ring buffer. A
file transmission may be in progress, so the ring buffer is checked to see if
there is room for another slug of data bytes from the file being transmitted.
If the modem is still chewing on the outbound buffer and hasn't cleared enough
space in the buffer for another lump of data (1K is a common value), the loop
lets the outbound buffer continue chewing and moves on.
Next, the outbound ring buffer is polled. Is a character from the remote
system waiting to be processed? If so, it is picked up and parsed. If it's
just data, it is grabbed and sent to the appropriate destination; typically a
disk or the CRT. If it's a command, the command is parsed and the appropriate
utility routine in the application is called.
So it goes, round and round, until the command to pack it all in comes from
the keyboard, or perhaps from a "disconnect" command from the remote system.


Spec vs. Design vs. Implementation


What I've just described is an algorithm driven very heavily by the
constraints of DOS and of our serial-port hardware. The algorithm can be
considered a specification, but it is so hemmed in by considerations outside
the control of the program (ISR protocols, single-threaded DOS, the PC serial
port) that it is very much a design as well.
There is some leeway in implementing the design. You can implement it
literally as a WHILE..DO loop (which works well for simple programs), or you
can get fancier and encapsulate the bulk of the algorithm into a
"telecommunications kernel" with strictly defined data paths to and from the
application using the kernel. Implementing it as a kernel makes it much more
reusable.
Figure 1 is, of course, not a detailed design. From this level of detail down,
the design will be dominated by issues related to telecommunications
protocols. You have to decide what protocols the program will support
(ETX/ACK, Xmodem, Ymodem, and so on), whether there will be any attempt at
terminal emulation, and if so for which terminal spec.



The Art of the State


Once you decide which protocols to support, you must rigorously define how
each one works. Diagrams really help here, and while there are a multitude of
ways to diagram a telecomm protocol, the best way I know is to consider the
protocol a state machine and draw it using state-transition diagrams.
A state machine is a logical abstraction that defines the operation of certain
kinds of "well-behaved" software. You set a program up such that at any moment
the software is in one of a limited number of well-defined "states," where any
subsequent possible circumstance that affects the software might transfer the
system to another defined state. A system, for example, may be set up to exist
in a processing state or a waiting state. It's in one or the other,
period--there's no third or fourth option. While it's in a waiting state,
there is a set of events that can move it to a processing state, and all other
events leave it in a waiting state. Similarly, while the system is in a
processing state, a well-defined set of events will move it into a waiting
state, while all others leave it in the processing state.
Telecomm protocols lend themselves well to implementation as state machines
because while the system is transferring information, it will be in one of
only a few different states: waiting for the initial "let's go!" character,
accepting characters, calculating a checksum, transmitting the checksum,
waiting for an ACK, and so on. It's easy to define how the system moves from
one state to the next, and the conditions that force a change of state aren't
excessively complex or varied. Furthermore, the sequence of state changes
remains the same from one file transfer to the next.
Understanding state machines and defining them is worth a couple of columns
all by itself, and I hope to get to it within the next few months. It's too
large a subject to cover in detail this month.


Pushing the Platform


While I was struggling to learn Turbo Vision, I was a little dismayed to watch
my friend George Seaton (who already knew Turbo Vision) struggling to create a
professional telecommunications system with it.
Telecomm is one of those categories of applications that I consider "real
time;" that is, telecomm apps must be able to deal with protocols that happen
at speeds independent of the program's operation. 9600 baud is 9600 baud no
matter what your machine's clock speed, and if you have to run at 9600 baud,
you'd better be sure your machine won't roll its eyes at you and laugh. Most
of George's problems stemmed from a need to work at defined (and high) baud
rates on underpowered machines.
Turbo Vision got in the way here simply by being too busy in the background,
soaking up cycles, doing its own thing beneath the surface and out of sight.
Turbo Vision's TTerminal object looks very attractive, because it implements a
scroll-from-the-bottom TTY-style window. Feed it a character from the input
ring buffer, and TTerminal will take care of the rest.
At 1200 baud, a breeze; at 2400 baud, no big sweat; at 4800 baud, well, things
start to get tight. At 9600 baud, it's definitely white knuckles on the
steering wheel. George eventually had to buy the Turbo Pascal runtime source
and start tearing expendable features out of TTerminal to make it run fast
enough to meet his specs, which involved interfacing to an existing mainframe
system and were not negotiable.
The lesson here is that Turbo Vision brought with it a set of constraints that
didn't appear until George turned up the steam and started to push the
platform. He managed to make the system work effectively, but using Turbo
Vision took the hardware to its limits at 9600 baud, which isn't as
astonishingly fast as it was a few years ago.
To be fair, Turbo Vision made it unnecessary to reinvent a CUA-compliant user
interface, and George had the luxury of "not recommending" 8088-class machines
to his corporate users. He thinks the benefits are worth the costs. (Having
watched him suffer through the process, I'm very glad he forgives as easily as
he does.)


Turbo Vision Trinkets


Fairly late in the Turbo Vision learning game I caught on to something slick:
Borland tossed in a number of well-encapsulated little Turbo Vision gizmos
with Turbo Pascal 6.0, which, if you didn't look closely at all the stuff the
automated installer shoveled onto your hard disk, you probably missed.
Look again. In particular, load and compile the TVDEMO.PAS program and check
out the little calculator and calendar. These trinkets can be lifted and
dropped into your own Turbo Vision applications with almost no effort.
I added both the calculator and calendar to my HCALC.PAS mortgage calculator
in about 20 minutes from a dead stop. Here's all you have to do:
1. Copy the CALC.PAS and CALENDAR.PAS files into your project directory. This
step is necessary only if you intend to modify either file, as I did. More on
this below.
2. Define two new command constants at the front of your main application
file:
cmCalculator = 191; cmCalendar = 190;
The constant values are arbitrary; they were the next ones in line in HCALC
and can be anything as long as they remain unique values. Remember that Turbo
Vision reserves all constants from 100 down to 0.
3. Add two items to your menu-bar definition. These allow the user to select
the calendar and calculator. I added them as main items on the menu bar itself
rather than as items within a pull-down menu, but that's up to you. Make sure
that the items you insert into your menu bar emit the command codes you
defined for the calculator and calendar.
4. This is key: Lift the two short procedures Calculator and Calendar from
TVDEMO.PAS and add them to your main application file as local procedures
within the main application object's HandleEvent method. They must be local
within the application object's HandleEvent, or the call both procedures make
to TProgram.ValidView will be out of scope and not understood by the compiler.
5. If your application does not have a help system, change the help-context
assignments inside both local procedures to hcNoContext. If it has a help
system, change the context assignments to whatever context is appropriate.
6. Add lines to your application object's HandleEvent method to respond to the
two new commands by calling the two local procedures you just added. I've
reprinted my own HCALC's modified HandleEvent method in Listing One (page
188). All I did was add two lines to the CASE statement that invoke the added
local procedures appropriately when the new commands are detected.
Rebuild the whole mess, and that's it. You now have a serviceable, if not
fancy calendar and calculator with virtually no effort. When I reached this
point, I felt that the effort expended in learning Turbo Vision had begun to
pay off.


Clearing the Deck


If you remember, my HCALC program has a menu item that clears all opened
windows from the desktop. I did it as an experiment in broadcast events, but
it has proven to be a handy feature that I suspect I will add to all my future
Turbo Vision apps. The calculator and calendar objects from TVDEMO.PAS, of
course, don't understand the cmCloseBC broadcast event and hang sullenly
around when the desktop is cleared.
It didn't take much to add the ability to respond to cmCloseBC. I've
written-up a short file (see Listing Two, page 188) to give you the new code
you need. Here's what you do for the TCalculator object:
1. Add the constant definition for cmCloseBC somewhere at the top of the
CALC.PAS file. Make sure the value you define matches the value from your main
program object! A better way to do this is to create a new unit that exports
all command definitions used anywhere in your application. As I expand HCALC,
I intend to create such a utility unit.
2. Replace the original object definition for TCalculator with the definition
shown inListing Two. TCalculator does not have its own HandleEvent method and
instead relies fully on the one in its parent class, TDialog. If you're going
to give TCalculator the ability to respond to a new event, you have to give it
a HandleEvent method to respond with.
3. Add the TCalculator.HandleEvent method shown in Listing Two.
That's all you have to do. The mods for TCalendar are similar and will make a
good exercise. One thing to watch out for: Both the calendar and the
calculator objects have two parts to them. The calculator is part TView and
part TDialog, and the calendar is part TView and part TWindow. It's not
entirely clear to me why Borland wrote them this way, but in my first
experiments I made the mistake of adding the response to cmCloseBC to the
TView part of the calculator. It was easy because the TView portion already
had a HandleEvent method. I was wrong--and closing down the TView aspect of
the calculator without closing down the TDialog aspect crashed my machine
hard.
Here's the rule: You must close down what you instantiate. In the TVDEMO.PAS
program, the TDialog aspect of the calculator is what is instantiated and
added to the desktop, so that's what you have to shut down with a call to
destructor Done, from within the HandleEvent method you added. Take a look at
the calendar code and the way that TVDEMO.PAS invokes it, and choose which
aspect of the calendar needs to respond to cmCloseBC.
And don't forget to call the ancestor class's HandleEvent before diving into
your own HandleEvent!


The Vision Continues


As hard as I've tried to be done with Turbo Vision, it's been harder to shake
than an Arizona staghorn cluster stuck halfway through your hiking boots. Next
month we'll take a look at a couple of new Turbo Vision tools in an effort to
generalize a little about the design of Turbo Vision applications.
_STRUCTURED PROGRAMMING COLUMN_
by Jeff Duntemann


[LISTING ONE]


PROCEDURE TMortgageApp.HandleEvent(VAR Event : TEvent);

{-----------------------------------------------------------------}
{ The following are procedures local to TMortgageApp.HandleEvent: }
{-----------------------------------------------------------------}

PROCEDURE Calculator;
VAR
 P: PCalculator;

BEGIN
 P := New(PCalculator, Init);
 P^.HelpCtx := hcNoContext; { Used to be hcCalculator }
 IF ValidView(P) <> NIL THEN
 Desktop^.Insert(P);
END;

PROCEDURE Calendar;
VAR
 P: PCalendarWindow;

BEGIN
 P := New(PCalendarWindow, Init);
 P^.HelpCtx := hcNoContext; { Used to be hcCalendar }
 Desktop^.Insert(ValidView(P));
END;

BEGIN
 TApplication.HandleEvent(Event);
 IF Event.What = evCommand THEN
 BEGIN
 CASE Event.Command OF
 cmNewMortgage : NewMortgage;
 cmLoadMortgage : LoadMortgage;
 cmSaveMortgage : SaveMortgage;
 cmCloseAll : CloseAll;
 cmPrint : PrintMortgage;
 cmCalculator : Calculator; { Calculator is NOT a method! }
 cmCalendar : Calendar; { Calendar is NOT a method! }
 ELSE
 Exit;
 END; { CASE }
 ClearEvent(Event);
 END;
END;




[LISTING TWO]

{-----------------------------------------------------------}
{ This file describes mods needed to make the TV calculator }
{ respond to the broadcast command cmCloseBC used in Jeff }
{ Duntemann's HCALC mortgage calculator program. }
{ THIS IS NOT A COMPLETE, COMPILABLE FILE! This is just a }
{ collection of mods to make to Borland's TV Demo file }
{ CALC.PAS. }

{ By Jeff Duntemann 8/2/92 }
{-----------------------------------------------------------}

{ Add this constant definition up front somewhere in CALC.PAS: }
CONST
 cmCloseBC = 196;

{ Modify the object definition for TCalculator like this: }
TCalculator =
 OBJECT(TDialog)
 CONSTRUCTOR Init;
 PROCEDURE HandleEvent(VAR Event : TEvent); VIRTUAL;
 END;

{ Add this method definition to the CALC.PAS file: }
PROCEDURE TCalculator.HandleEvent(VAR Event : TEvent);
BEGIN
 TDialog.HandleEvent(Event);
 IF Event.What = evBroadcast THEN
 IF Event.Command = cmCloseBC THEN
 Done;
END;








































November, 1992
GRAPHICS PROGRAMMING


The Good, the Bad, and the Run-sliced




Michael Abrash


Years ago, I worked at a company that asked me to write blazingly fast
line-drawing code for an AutoCAD driver. I implemented the basic Bresenham's
line-drawing algorithm; streamlined it as much as possible; special-cased
horizontal, diagonal, and vertical lines; broke out separate, optimized
routines for lines in each octant; and massively unrolled the loops. When I
was done, I had line drawing down to a mere five or six instructions per
pixel, and I handed the code over to the AutoCAD driver person, content in the
knowledge that I had pushed the theoretical limits of the Bresenham's
algorithm on the 8Ox86 architecture, and that this was as fast as line drawing
could get on a PC. That feeling lasted for about a week, until Dave Miller,
who these days is a Windows display-driver whiz at Engenious Solutions,
casually mentioned Bresenham's faster run-length slice line-drawing algorithm.
Remember Bill Murray's safety tip in Ghostbusters? It goes something like
this. Harold Ramis tells the Ghostbusters not to cross the beams of the
antighost guns. "Why?" Murray asks.
"It would be bad," Ramis says.
Murray says, "I'm fuzzy on the whole good/bad thing. What exactly do you mean
by 'bad'?" It turns out that what Ramis means by bad is basically the
destruction of the universe.
"Important safety tip," Murray comments dryly.
I learned two important safety tips from my line-drawing experience; neither
involves the possible destruction of the universe, so far as I know, but they
are nonetheless worth keeping in mind. First, never, never, never think you've
written the fastest possible code. Odds are, you haven't. Run your code past
another good programmer, and he or she will probably say, "But why don't you
do this?" and you'll realize that you could indeed do that, and your code
would then be faster. Or relax and come back to your code later, and you may
well see another, faster approach. There are a million ways to implement code
for any task, and you can almost always find a faster way if you need to.
Second, when performance matters, never have your code perform the same
calculation more than once. This sounds obvious, but it's astonishing how
often it's ignored. For example, consider the snippet of code in Example 1.
Here, the programmer knows which way the line is going before the main loop
begins -- but nonetheless performs that test every time through the loop, when
calculating the address of the next pixel. Far better to perform the test only
once, outside the loop, as shown in Example 1.
Example 1: Performing the test every time through the loop isn't the most
efficient way.

 for (i=0; i<RunLength; i++)
 {
 *WorkingScreenPtr = Color;
 if (XDelta > 0)
 {
 WorkingScreenPtr++;
 }
 else
 {
 WorkingScreenPtr--;
 }
 }

Example 2: Performing the test only once, outside the loop, is more efficient
than Example 1.

 if (XDelta > 0)
 {
 for (i=0; i<RunLength; i++)
 {
 *WorkingScreenPtr++ = Color;
 }
 }
 else
 {
 for (i=0; i<RunLength; i++)
 {
 *WorkingScreenPtr-- = Color;
 }
 }

Think of it this way. A program is a state machine. It takes a set of inputs
and produces a corresponding set of outputs by passing through a set of
states. Your primary job as a programmer is to implement the desired state
machine. Your additional job as a performance programmer is to minimize the
lengths of the paths through the state machine. This means performing as many
tests and calculations as possible outside the loops, so that the loops
themselves can do as little work--pass through as few states--as possible.
Which brings us full circle to Bresenham's run-length slice line-drawing
algorithm, which just happens to be an excellent example of a minimized state
machine. In case you're fuzzy on the good/bad performance thing, that's
"good"--as in fast.


Run-length Slice Fundamentals



First off, I have a confession to make: I'm not sure that the algorithm I'll
discuss is actually, precisely Bresenham's run-length slice algorithm. It's
been a long time since I read about this algorithm; in the intervening years,
I've misplaced Bresenham's article, and I was unable to locate it in time for
this column. (Vermont libraries leave something to be desired in the high-tech
area.) As a result, I had to derive the algorithm from scratch, which was
admittedly more fun than reading about it, and also ensured that I understood
it inside and out. The upshot is that what I discuss may or may not be
Bresenham's run-length slice algorithm--but it surely is fast.
The place to begin understanding the run-length slice algorithm is the
standard Bresenham's line-drawing algorithm. (I discussed the standard
Bresenham's algorithm at length in the May 1989 issue of the now-defunct
Programmer's Journal.) The basis of the standard approach is stepping one
pixel at a time along the major axis (the longer dimension of the line), while
maintaining an integer error term that indicates at each major-axis step how
close the line is to advancing halfway to the next pixel along the minor axis.
Figure 1 illustrates standard Bresenham's line drawing. The key point here is
that a calculation and a test are performed once for each step along the major
axis.
The run-length slice algorithm rotates matters 90 degrees, with salubrious
results. The basis of the run-length slice algorithm is stepping one pixel at
a time along the minor axis (the shorter dimension), while maintaining an
integer error term indicating how close the line is to advancing an extra
pixel along the major axis, as illustrated by Figure 2.
Consider this: When you're called upon to draw a line with an X-dimension of
35 and a Y-dimension of 10, you have a great deal of information available,
some of which is ignored by standard Bresenham's. In particular, because the
slope is between 1/3 and 1/4, you know that every single run--a run being a
set of pixels at the same minor-axis coordinate--must be either three or four
pixels long. No other length is possible, as shown in Figure 3 (apart from the
first and last runs, which are special cases that I'll discuss shortly).
Therefore, for this line, there's no need to perform an error-term calculation
and test for each pixel. Instead, we can just perform one test per run, to see
whether the run is three or four pixels long, thereby eliminating about 70
percent of the calculations in drawing this line.
Take a moment to let the idea behind run-length slice drawing soak in.
Periodic decisions must be made to control pixel placement. The key to speed
is to make those decisions as infrequently and quickly as possible. Of course,
it will work to make a decision at each pixel--that's standard Bresenham's.
However, most of those per-pixel decisions are redundant, and in fact we have
enough information before we begin to know which are the redundant decisions.
Run-length slice drawing is exactly equivalent to standard Bresenham's, but it
pares the decision-making process down to a minimum. It's somewhat analogous
to the difference between finding the greatest common divisor of two numbers
using Euclid's algorithm and finding it by trying every possible divisor. Both
approaches produce the desired result, but that which takes maximum advantage
of the available information and minimizes redundant work is preferable.


Run-length Slice Implementation


We know that for any line, a given run will always be one of two possible
lengths. How, though, do we know which length to select? Surprisingly, this is
easy to determine. For the following discussion, assume that we have a slope
of 1/3.5, so that X is the major axis; however, the discussion also applies to
Y-major lines, with X and Y reversed.
The minimum possible length for any run in an X-major line is
int(XDelta/YDelta), where XDelta is the X-dimension of the line and YDelta is
the Y-dimension. The maximum possible length is int(XDelta/YDelta)+ 1. The
trick, then, is knowing which of these two lengths to select for each run. To
see how we can make this selection, refer to Figure 4. For each one-pixel step
along the minor axis (Y, in this case), we advance at least three pixels. The
full advance distance along X (the major axis) is actually three-plus pixels,
because there is also a fractional portion to the advance along X for a
single-pixel Y step. This fractional advance is the key to deciding when to
add an extra pixel to a run. The fraction indicates what portion of an extra
pixel we advance along X (the major axis) during each run. If we keep a
running sum of the fractional parts, we have a measure of how close we are to
needing an extra pixel; when the fractional sum reaches 1, it's time to add an
extra pixel to the current run. Then we can subtract 1 from the running sum
(because we just advanced one pixel), and continue on.
Practically speaking, however, we can't work with fractions because
floating-point arithmetic is slow and fixed-point arithmetic is imprecise.
Therefore, we take a cue from standard Bresenham's and scale all the
error-term calculations up so that we can work with integers. The fractional X
(major axis) advance per one-pixel Y (minor axis) advance is the fractional
portion of XDelta/YDelta. This value is exactly equivalent to (XDelta %
YDelta)/YDelta. We'll scale this up by multiplying it by YDelta*2, so that the
amount by which we adjust the error term up for each one-pixel minor-axis
advance is (XDelta % YDelta)*2.
We'll similarly scale up the one pixel by which we adjust the error term down
after it turns over, so our downward error-term adjustment is YDelta*2.
Therefore, before drawing each run, we'll add (XDelta % YDelta)*2 to the error
term. If the error term runs over (reaches one full pixel), then we'll
lengthen the run by 1, and subtract YDelta*2 from the error term. (All values
are multiplied by 2 so that the initial error term, which involves a 0.5 term,
can be scaled up to an integer, as discussed below.)
This is not a complicated process, involving only integer addition and
subtraction and a single test, and it lends itself to many and varied
optimizations. For example, you could break out hardwired optimizations for
drawing each possible pair of run lengths. For the aforementioned line with a
slope of 1/3.5, for example, you could have one routine hardwired to blast in
a run of three pixels as quickly as possible, and another hardwired to blast
in a run of four pixels. These routines would ideally have no looping, but
rather just a series of instructions customized to draw the desired number of
pixels at maximum speed. Each routine would know that the only possibilities
for the length of the next run would be three and four, so they could
increment the error term, then jump directly to the appropriate one of the two
routines depending on whether the error term turned over. Properly
implemented, it should be possible to reduce the average per-run overhead of
line drawing to less than one branch, with only two additions and two tests
(the number of runs must also be counted down), plus a subtraction half the
time. On a 486, this amounts to something on the order of 150 nanoseconds of
overhead per pixel, exclusive of the time required to actually write the pixel
to display memory.
That's good.


Run-length Slice Details


A couple of run-length slice implementation details yet remain. First is the
matter of how error-term turnover is detected. This is done in much the same
way as it is with standard Bresenham's: The error term is initialized to a
value equivalent to -1 pixel and incremented for each step; when the error
term reaches 0, we've advanced one full pixel along the major axis, and it's
time to add an extra pixel to the current run. This means that we only have to
test the sign of the error term after advancing it to determine whether or not
to add an extra pixel to each run.
The second and more difficult detail is balancing the runs so that they're
centered around the ideal line, and therefore draw the same pixels that
standard Bresenham's would draw. If we just drew full-length runs from the
start, we'd end up with an unbalanced line, as shown in Figure 5. Instead, we
have to split the initial pixel plus one full run as evenly as possible
between the first and last runs of the line, and adjust the initial error term
appropriately for the initial half-run.
The initial error term is simply one-half of the normal fractional advance
along the major axis, because the initial step is only one-half pixel along
the minor axis. This half-step gets us exactly halfway between the initial
pixel and the next pixel along the minor axis. All the error-term adjusts are
scaled up by two times precisely so that we can scale up this halved error
term for the initial run by two times, and thereby make it an integer.
The other trick here is that if an odd number of pixels are allocated between
the first and last partial runs, we'll end up with an odd pixel, since we are
unable to draw a half-pixel. This odd pixel is accounted for by adding half a
pixel to the error term.
That's all there is to run-length slice line drawing; the partial first and
last runs are the only tricky part. Listing One (page 190) is a run-length
slice implementation in C. This is not an optimized implementation, nor is it
meant to be; this listing is provided so that you can see how the run-length
slice algorithm works. Next month, I'll move on to an optimized version, but
this month's listing will make it much easier to grasp the principles of
run-length slice drawing, and to understand next month's code.
Notwithstanding that it's not optimized,Listing One is reasonably fast. If you
run Listing Two (page 191), a sample line-drawing program that you can use to
test-drive Listing One, you may be as surprised as I was at how quickly the
screen fills with vectors, considering that Listing One is entirely in C and
has some redundant divides. Or perhaps you won't be surprised, in which case I
suggest you check back next month.


Next Time


Next month, I'll switch to assembly language and speed up run-length slice
lines considerably. I'll also spend some time discussing the limitations of
run-length slice drawing, and I'll look at possible further optimizations.
After that, perhaps we'll have a look at seed fills, or more 3-D animation, or
some new 2-D animation topics--or maybe something completely different. Your
suggestions are, as always, welcome.


_GRAPHICS PROGRAMMING COLUMN_
by Michael Abrash

[LISTING ONE]

/* Run-length slice line drawing implementation for mode 0x13, the VGA's
320x200 256-color mode. Not optimized! Tested with Borland C++ 3.0 in
the small model. */

#include <dos.h>

#define SCREEN_WIDTH 320
#define SCREEN_SEGMENT 0xA000

void DrawHorizontalRun(char far **ScreenPtr, int XAdvance, int RunLength,
 int Color);
void DrawVerticalRun(char far **ScreenPtr, int XAdvance, int RunLength,
 int Color);
/* Draws a line between the specified endpoints in color Color. */
void LineDraw(int XStart, int YStart, int XEnd, int YEnd, int Color)
{
 int Temp, AdjUp, AdjDown, ErrorTerm, XAdvance, XDelta, YDelta;
 int WholeStep, InitialPixelCount, FinalPixelCount, i, RunLength;
 char far *ScreenPtr;


 /* We'll always draw top to bottom, to reduce the number of cases we have to
 handle, and to make lines between the same endpoints draw the same pixels */
 if (YStart > YEnd) {
 Temp = YStart;
 YStart = YEnd;
 YEnd = Temp;
 Temp = XStart;
 XStart = XEnd;
 XEnd = Temp;
 }
 /* Point to the bitmap address first pixel to draw */
 ScreenPtr = MK_FP(SCREEN_SEGMENT, YStart * SCREEN_WIDTH + XStart);

 /* Figure out whether we're going left or right, and how far we're
 going horizontally */
 if ((XDelta = XEnd - XStart) < 0)
 {
 XAdvance = -1;
 XDelta = -XDelta;
 }
 else
 {
 XAdvance = 1;
 }
 /* Figure out how far we're going vertically */
 YDelta = YEnd - YStart;

 /* Special-case horizontal, vertical, and diagonal lines, for speed
 and to avoid nasty boundary conditions and division by 0 */
 if (XDelta == 0)
 {
 /* Vertical line */
 for (i=0; i<=YDelta; i++)
 {
 *ScreenPtr = Color;
 ScreenPtr += SCREEN_WIDTH;
 }
 return;
 }
 if (YDelta == 0)
 {
 /* Horizontal line */
 for (i=0; i<=XDelta; i++)
 {
 *ScreenPtr = Color;
 ScreenPtr += XAdvance;
 }
 return;
 }
 if (XDelta == YDelta)
 {
 /* Diagonal line */
 for (i=0; i<=XDelta; i++)
 {
 *ScreenPtr = Color;
 ScreenPtr += XAdvance + SCREEN_WIDTH;
 }
 return;

 }

 /* Determine whether the line is X or Y major, and handle accordingly */
 if (XDelta >= YDelta)
 {
 /* X major line */
 /* Minimum # of pixels in a run in this line */
 WholeStep = XDelta / YDelta;

 /* Error term adjust each time Y steps by 1; used to tell when one
 extra pixel should be drawn as part of a run, to account for
 fractional steps along the X axis per 1-pixel steps along Y */
 AdjUp = (XDelta % YDelta) * 2;

 /* Error term adjust when the error term turns over, used to factor
 out the X step made at that time */
 AdjDown = YDelta * 2;

 /* Initial error term; reflects an initial step of 0.5 along the Y
 axis */
 ErrorTerm = (XDelta % YDelta) - (YDelta * 2);

 /* The initial and last runs are partial, because Y advances only 0.5
 for these runs, rather than 1. Divide one full run, plus the
 initial pixel, between the initial and last runs */
 InitialPixelCount = (WholeStep / 2) + 1;
 FinalPixelCount = InitialPixelCount;

 /* If the basic run length is even and there's no fractional
 advance, we have one pixel that could go to either the initial
 or last partial run, which we'll arbitrarily allocate to the
 last run */
 if ((AdjUp == 0) && ((WholeStep & 0x01) == 0))
 {
 InitialPixelCount--;
 }
 /* If there're an odd number of pixels per run, we have 1 pixel that can't
 be allocated to either the initial or last partial run, so we'll add 0.5
 to error term so this pixel will be handled by the normal full-run loop */
 if ((WholeStep & 0x01) != 0)
 {
 ErrorTerm += YDelta;
 }
 /* Draw the first, partial run of pixels */
 DrawHorizontalRun(&ScreenPtr, XAdvance, InitialPixelCount, Color);
 /* Draw all full runs */
 for (i=0; i<(YDelta-1); i++)
 {
 RunLength = WholeStep; /* run is at least this long */
 /* Advance the error term and add an extra pixel if the error
 term so indicates */
 if ((ErrorTerm += AdjUp) > 0)
 {
 RunLength++;
 ErrorTerm -= AdjDown; /* reset the error term */
 }
 /* Draw this scan line's run */
 DrawHorizontalRun(&ScreenPtr, XAdvance, RunLength, Color);
 }

 /* Draw the final run of pixels */
 DrawHorizontalRun(&ScreenPtr, XAdvance, FinalPixelCount, Color);
 return;
 }
 else
 {
 /* Y major line */

 /* Minimum # of pixels in a run in this line */
 WholeStep = YDelta / XDelta;

 /* Error term adjust each time X steps by 1; used to tell when 1 extra
 pixel should be drawn as part of a run, to account for
 fractional steps along the Y axis per 1-pixel steps along X */
 AdjUp = (YDelta % XDelta) * 2;

 /* Error term adjust when the error term turns over, used to factor
 out the Y step made at that time */
 AdjDown = XDelta * 2;

 /* Initial error term; reflects initial step of 0.5 along the X axis */
 ErrorTerm = (YDelta % XDelta) - (XDelta * 2);

 /* The initial and last runs are partial, because X advances only 0.5
 for these runs, rather than 1. Divide one full run, plus the
 initial pixel, between the initial and last runs */
 InitialPixelCount = (WholeStep / 2) + 1;
 FinalPixelCount = InitialPixelCount;

 /* If the basic run length is even and there's no fractional advance, we
 have 1 pixel that could go to either the initial or last partial run,
 which we'll arbitrarily allocate to the last run */
 if ((AdjUp == 0) && ((WholeStep & 0x01) == 0))
 {
 InitialPixelCount--;
 }
 /* If there are an odd number of pixels per run, we have one pixel
 that can't be allocated to either the initial or last partial
 run, so we'll add 0.5 to the error term so this pixel will be
 handled by the normal full-run loop */
 if ((WholeStep & 0x01) != 0)
 {
 ErrorTerm += XDelta;
 }
 /* Draw the first, partial run of pixels */
 DrawVerticalRun(&ScreenPtr, XAdvance, InitialPixelCount, Color);

 /* Draw all full runs */
 for (i=0; i<(XDelta-1); i++)
 {
 RunLength = WholeStep; /* run is at least this long */
 /* Advance the error term and add an extra pixel if the error
 term so indicates */
 if ((ErrorTerm += AdjUp) > 0)
 {
 RunLength++;
 ErrorTerm -= AdjDown; /* reset the error term */
 }
 /* Draw this scan line's run */

 DrawVerticalRun(&ScreenPtr, XAdvance, RunLength, Color);
 }
 /* Draw the final run of pixels */
 DrawVerticalRun(&ScreenPtr, XAdvance, FinalPixelCount, Color);
 return;
 }
}
/* Draws a horizontal run of pixels, then advances the bitmap pointer to
 the first pixel of the next run. */
void DrawHorizontalRun(char far **ScreenPtr, int XAdvance,
 int RunLength, int Color)
{
 int i;
 char far *WorkingScreenPtr = *ScreenPtr;

 for (i=0; i<RunLength; i++)
 {
 *WorkingScreenPtr = Color;
 WorkingScreenPtr += XAdvance;
 }
 /* Advance to the next scan line */
 WorkingScreenPtr += SCREEN_WIDTH;
 *ScreenPtr = WorkingScreenPtr;
}
/* Draws a vertical run of pixels, then advances the bitmap pointer to
 the first pixel of the next run. */
void DrawVerticalRun(char far **ScreenPtr, int XAdvance,
 int RunLength, int Color)
{
 int i;
 char far *WorkingScreenPtr = *ScreenPtr;

 for (i=0; i<RunLength; i++)
 {
 *WorkingScreenPtr = Color;
 WorkingScreenPtr += SCREEN_WIDTH;
 }
 /* Advance to the next column */
 WorkingScreenPtr += XAdvance;
 *ScreenPtr = WorkingScreenPtr;
}


[LISTING TWO]

/* Sample line-drawing program. Adapted from code that appeared in the
 _On Graphics_ column in Programmer's Journal. Tested with
 Borland C++ 3.0 in the small model. */

#include <dos.h>

#define GRAPHICS_MODE 0x13
#define TEXT_MODE 0x03
#define BIOS_VIDEO_INT 0x10
#define X_MAX 320 /* working screen width */
#define Y_MAX 200 /* working screen height */

extern void LineDraw(int XStart, int YStart, int XEnd, int YEnd, int Color);


/* Subroutine to draw a rectangle full of vectors, of the specified
 * length and color, around the specified rectangle center. */
void VectorsUp(XCenter, YCenter, XLength, YLength, Color)
int XCenter, YCenter; /* center of rectangle to fill */
int XLength, YLength; /* distance from center to edge of rectangle */
int Color; /* color to draw lines in */
{
 int WorkingX, WorkingY;

 /* Lines from center to top of rectangle */
 WorkingX = XCenter - XLength;
 WorkingY = YCenter - YLength;
 for ( ; WorkingX < ( XCenter + XLength ); WorkingX++ )
 {
 LineDraw(XCenter, YCenter, WorkingX, WorkingY, Color);
 }
 /* Lines from center to right of rectangle */
 WorkingX = XCenter + XLength - 1;
 WorkingY = YCenter - YLength;
 for ( ; WorkingY < ( YCenter + YLength ); WorkingY++ )
 {
 LineDraw(XCenter, YCenter, WorkingX, WorkingY, Color);
 }
 /* Lines from center to bottom of rectangle */
 WorkingX = XCenter + XLength - 1;
 WorkingY = YCenter + YLength - 1;
 for ( ; WorkingX >= ( XCenter - XLength ); WorkingX-- )
 {
 LineDraw(XCenter, YCenter, WorkingX, WorkingY, Color);
 }
 /* Lines from center to left of rectangle */
 WorkingX = XCenter - XLength;
 WorkingY = YCenter + YLength - 1;
 for ( ; WorkingY >= ( YCenter - YLength ); WorkingY-- )
 {
 LineDraw(XCenter, YCenter, WorkingX, WorkingY, Color);
 }
}
/* Sample program to draw four rectangles full of lines. */
int main()
{
 union REGS regs;

 /* Set graphics mode */
 regs.x.ax = GRAPHICS_MODE;
 int86(BIOS_VIDEO_INT, &regs, &regs);

 /* Draw each of four rectangles full of vectors */
 VectorsUp(X_MAX / 4, Y_MAX / 4, X_MAX / 4, Y_MAX / 4, 1);
 VectorsUp(X_MAX * 3 / 4, Y_MAX / 4, X_MAX / 4, Y_MAX / 4, 2);
 VectorsUp(X_MAX / 4, Y_MAX * 3 / 4, X_MAX / 4, Y_MAX / 4, 3);
 VectorsUp(X_MAX * 3 / 4, Y_MAX * 3 / 4, X_MAX / 4, Y_MAX / 4, 4);

 /* Wait for a key to be pressed */
 getch();

 /* Back to text mode */
 regs.x.ax = TEXT_MODE;
 int86(BIOS_VIDEO_INT, &regs, &regs);

}


Example 1:

for (i=0; i<RunLength; i++)
{
 *WorkingScreenPtr = Color;
 if (XDelta > 0)
 {
 WorkingScreenPtr++;
 }
 else
 {
 WorkingScreenPtr--;
 }
}


Example 2:


if (XDelta > 0)
{
 for (i=0; i<RunLength; i++)
 {
 *WorkingScreenPtr++ = Color;
 }
}
else
{
 for (i=0; i<RunLength; i++)
 {
 *WorkingScreenPtr-- = Color;
 }
}


























November, 1992
PROGRAMMER'S BOOKSHELF


Undocumented Windows




Ray Duncan


Technical writing is a demanding, yet largely unappreciated vocation. When you
further restrict the focus to technical writing about operating systems,
programming interfaces, and development tools for trade magazines and book
publishers, you've got a vocation that is not only demanding and
unappreciated, but decidedly peculiar as well. There's probably no other field
where an author can reach so massive and sophisticated an audience and, at the
same time, be obligated to contend with such frequent technological advances,
short product life cycles, shoddy vendor documentation, hair-raising
nondisclosure agreements, fuzzy facts (and even more fuzzy release dates),
poorly understood market forces, and high socioeconomic stakes. It's a dirty
job, scorned by academia and "real" programmers, and the author who tries to
stay on the cutting edge often finds himself on the bleeding edge instead.
To be perfectly fair, one of the reasons that technical writing carries so
little prestige in the programming field is undoubtedly because it so
faithfully follows the 90/10 rule. At least 90 percent of the articles and
books on programming are (to put it in terms suitable for a family magazine)
mere dreck: mindless verbiage generated by hacks who simply rehash company
backgrounders and product manuals without adding any value whatsoever, or by
journeymen who grind out boilerplates according to editorial outlines. The 10
percent of the programming articles and books which can be said to have some
redeeming value come from two types of authors: "shooting stars" who make one
or two significant contributions and are never heard from again, and a
remarkably small core group of superstars who steadily produce excellent
articles and books year-in and year-out.
What characterizes the superstar technical writers? They all have an obvious
affection for programming coupled with years of experience in the front lines
of software development. They exhibit an attention to detail and respect for
accuracy that borders on fanatical. They write about things they've done, not
things they've heard. They are quick to acknowledge the ideas and
accomplishments of others, wasting little time on turf wars or battles for
precedence. They are open-minded, eclectic, widely read, and historically
savvy. They are especially facile at deducing a logical structure from
scattered fragments of information (or imposing a structure, if necessary) and
explaining this structure to others. They feel a deep compulsion to write and
are highly efficient at it. And finally, they each have a unique style and a
genuine gift for the beauty of the language (many have a wicked sense of humor
as well). In short, we're talking about people like Jeff Duntemann, Charles
Petzold, Michael Abrash, Jeff Prosise, and Andrew Schulman.
Andrew Schulman has a distinctive approach that is reminiscent of the Greek
classics: He doesn't just give you the facts, he grabs you by the collar and
shows you (in sometimes painful detail) where, why, and how he got the facts,
embedding them in a philosophical framework that explains why they are
important and how they may safely be applied. Over the last few years, Andrew
has contributed a series of brilliant essays to BYTE, DDJ, and PC Magazine,
coauthored and edited the fascinating book Undocumented DOS, and saved my
personal bacon on the book Extending DOS--all this in his "spare time" while
gainfully employed at Lotus and then at Phar Lap Software. More recently,
Andrew has gone into the writing racket full time, and his latest book (with
David Maxey and Matt Pietrek) is Undocumented Windows.
Before you API purists out there decide to turn the page, let me say that I,
too, have little patience with books obsessed with forbidden lore for its own
sake. (The childish and tiresome New Hackers Dictionary is a perfect example.)
However, Undocumented Windows does not fall into this category. Dealing with
the Windows API as a pristine entity is not a practical strategy; many of the
functions overlap, interact, or have nonintuitive side effects. Additionally,
functions or behaviors that are undocumented today may well be documented
tomorrow, as Microsoft's agenda vacillates or goes through one of its periodic
startling metamorphoses. The first-generation Windows books--Petzold, Yao,
Richter, and Heller--supplemented the Microsoft Windows SDK, but made little
attempt to go beyond it. I believe that Undocumented Windows is the first true
example of a second-generation Windows programming book, because it takes you
behind the scenes to show how the various Windows modules are put together,
how they depend on each other, and how they have evolved from the
real-mode-only Windows version 1.03 to the protected-mode-only Windows version
3.1.
Of course, Andrew being a fellow cursed with insatiable curiosity, a liberal
helping of forbidden lore is bundled in as well--orphaned or senseless code
fragments from the Windows kernel, details of the data structures behind the
various types of Windows handles, and documentation for hundreds of previously
undocumented functions with evocative names like Death, Resurrection,
PrestoChangoSelector, TabTheTextOutForWimps, WinOldAppHackOMatic,
UserSeeUserDo, Bunny_351, Brute, and FixUpBogusPublisherMetaFile. The number
of hours the authors must have spent disassembling object code, tracing
program execution, poring over memory dumps, combing through Windows SDK and
DDK source code and header files, and attempting to correlate vaguely related
interfaces such as OS/2's DosPTrace to create this book simply boggles the
mind. Chapter 1 of Undocumented Windows, entitled "This Was Not Supposed to
Happen," includes an apologia that clarifies the authors' motivation:
A key goal of Microsoft Windows is to be more orderly than MS-DOS. DOS is a
"house of cards," with memory-resident (TSR) programs, device drivers, disk
caches, memory managers, DOS extenders, networks, and multitasking
environments (such as Windows itself) all competing for control of your
machine. From the software developer's perspective, Windows often looks a lot
saner. It provides a wide-ranging and seemingly all-inclusive collection of
services--such as protected mode, multitasking, dynamic linking, window
management, and graphics--that plain-vanilla DOS doesn't offer. Often, Windows
lets developers concentrate on making a program do what it is supposed to do
rather than on the underhanded shenanigans--including the use of undocumented
system functions--that are necessary to create a great DOS application.
The idea of "undocumented Windows," then, is really somewhat alarming. Using
undocumented functions is exactly the sort of problem Windows was supposed to
solve! Making use of functions that Microsoft has implemented but not
documented fits in perfectly with the free-wheeling style of DOS, but it seems
to contradict the entire spirit and purpose of Windows. By providing an API
much more extensive and capable than DOS's, Windows is supposed to make such
tricks unnecessary.
But how likely a scenario is this? How many commercial Windows applications
can really "play by the rules" and still be marketable, with decent
performance and with the features users expect? The idea that the Windows API
can totally replace low-level coding seems, unfortunately, no more reasonable
than the idea that C++ can totally replace assembly language--in other words,
not very reasonable at all.
What we will see in this chapter is that key commercial Windows applications,
including Microsoft's own, use undocumented API calls. In some cases, these
calls have since been documented by Microsoft, though only after developers
went ahead and used them anyway, without Microsoft's blessing. In other words,
real-world use of the Windows API has driven the documentation, rather than
the other way around. Writing only to the documented Windows API sounds great,
but has failed in the real world.
What went wrong with the lovely notion of Windows programming without tricks,
without low-level, nonportable code, without undocumented shenanigans? What
went wrong, mostly, is that Windows succeeded. By winning the operating system
wars, Windows is now paying the price of success: large numbers of programmers
are banging on the system, and they need to make it do all sorts of things for
which it was probably never intended. The use of undocumented features, in
other words, is the inevitable price of success. MS-DOS paid this price, and
now Windows will. Interestingly, Windows too is now being called a "house of
cards."
The remainder of Chapter 1 rambles through a variety of topics: dynamic
linking, protected mode, the Microsoft "Open Tools" strategy, portrayals of
the use of undocumented API functions by Norton Desktop and certain Microsoft
products, the FTC investigation, Microsoft's celebrated (and in my judgment
totally mythical) "Chinese Wall," and the first-ever in-print account of
Microsoft's shameful bullying of famous Windows guru Michael Geary. (How a
gentleman like Mike Geary, who has devoted untold thousands of hours to
helping other Windows programmers on Genie and CompuServe, could ever find
himself in the gunsights of Microsoft's corporate lawyers is an unfathomable
mystery and a chilling manifestation of the Dark Side of the Force at
Microsoft.)
Chapters 2, 3, and 4 of Undocumented Windows are methodological; they explain
the static analysis of Windows executables and how to use various debuggers,
"spy" programs, and other utilities to poke around in Windows' innards.
Chapter 3 is particularly interesting because it constitutes, in essence, a
crash course in how to disassemble Windows DLLs and applications back to their
source code. Each subtask--from identifying callback procedures to dumping out
the file's binary resources--is carefully explained, using TASKMAN.EXE
(Microsoft Windows Task Manager) as a practical example. In fact, you could
pretty quickly write your own customized Task Manager from the annotated
TASKMAN source code included in this book.
Chapters 5 through 8 are devoted to in-depth examinations of Windows
KERNEL.EXE, USER.EXE, undocumented messages, and GDI.EXE, respectively. All of
the crucial hidden data structures are dissected, and each exported API
function not found in the official Microsoft manuals is documented here with
its parameters, flags, return values, and bugs. Most function entries are
accompanied by source code for a short program that exercises the function.
Much "legal" information that is inexplicably missing, incompletely
documented, or incorrectly documented in the Microsoft manuals is also found
here. For example, the enigmatic "free system resources," the structure of
local heaps and atom tables, transformations between task, module, instance,
and window handles, and a method for determining whether your application is
running in an OS/2 2.0 Windows-compatibility session. In the course of all
this, Schulman and Maxey occasionally take Microsoft to task for its "do as we
say, not as we do" attitude:
SetSelectorLimit() should be equivalent in functionality to the DPMI Set
Segment Limit (Int 31H AX=08H) function. However, KERNEL seems almost never to
rely on one documented DPMI function, when multiple DPMI functions, or
avoiding DPMI altogether, will do. Go figure.
The last two chapters of Undocumented Windows cover SYSTEM.DRV and
TOOLHELP.DLL, and they are followed by two appendices. The first appendix is a
WINIO library reference; I'll say more about WINIO in a moment. The second
appendix is an annotated bibliography of articles, books, samizdat documents,
software, and other primary sources invaluable to any Windows reverse-engineer
wanna-be. The bibliography is by turns amusing, informative, and ironic, and
near the end tips his hat to Hofstadter by reviewing the same book he is
writing.
A jumble of material on undocumented functions and internal data structures in
KERNEL, USER, and GDI. Apparently a second book is planned to cover Windows
DLLs such as SHELL, 16-bit device drivers, 32-bit VxDs, DPMI, interrupts, and
other lower-level aspects of Windows. Contains an extensive bibliography, with
only one recursive self-reference, which one of the coauthors uses just to
talk about different books and articles he likes: some of them don't even have
anything to do with Windows!
I should mention that the example programs in Undocumented Windows resemble no
other Windows application source code you've ever seen or are likely to ever
see again. Not only do they exploit a bewildering variety of undocumented API
functions that you've never heard of outside this book, but they don't use the
documented functions that you've always heard of. This is because the programs
are written with the aid of an interface library-cum-application framework,
called WINIO, that was first described by Schulman and Maxey in Microsoft
Systems Journal (July 1991). WINIO conceals the Windows API and the
message-oriented, event-driven nature of Windows applications beneath
replacements for the more familiar C runtime-library functions such as gets()
and printf(). The authors defend their approach thusly:
That a Windows application does not have to directly use the Windows API, that
you can put a layer on top of this API, surprises so many Windows programmers
that we could almost claim that this fact is "undocumented." Certainly
Microsoft's SDK manuals never suggest that you could write a Windows
application in any way other than peppering your code with direct calls to
TextOut(), BeginPaint(), and so on. The idea that a Windows program must
contain direct, explicit Windows API calls--that it's not a "true" Windows
application if it isn't descended from the original GENERIC.C--is part of the
same reverence for the Windows API that we seek to undermine by disassembling
this API and looking at the code. It may seem odd to introduce a way of hiding
the Windows API, in a book otherwise devoted to exposing even lower level
portions of it. However, revealing undocumented Windows API calls and then
covering up the existing documented ones are really just two sides of the same
coin: questioning the Windows API, instead of taking it on face value. The API
is just code; we can do with it what we will.
Although I share Andrew's antipathy for the Hungarian gibberish and the
multipage switch() statements found in Microsoft SDK sample programs, I
sometimes feel that WINIO verges on throwing the baby out with the bath water.
It's nice to make things easier for traditional C programmers, but they're
going to lose out by not learning and using the new concepts inherent in the
Windows programming model. Still, I must admit that the authors' use of WINIO
substantially simplifies their example source code, and therefore, on balance,
must be considered an asset to the book. A companion disk bound into the book
contains source code for the example programs, the WINIO library, and
miscellaneous Windows spelunking aids. Some of the more interesting utilities
on the disk include:
EXEUTIL--displays the names of undocumented Windows functions called by a
program or DLL. EXEHDR's evil twin.
RESDUMP--decompiles and displays the menu, dialog, and string-table resources
in a program or DLL.
CALLFUNC--a simple interpreter that lets you run Windows API functions by
typing in their names and parameters.
SNOOP--the counterpart of Microsoft's SPY.EXE, but displays undocumented
messages.
WISPY--monitors user-designated interrupts called by Windows programs.
Comparable to the ISPY utility in Undocumented DOS.
Undocumented Windows is not completely flawless. The presentation is
resolutely centered on the C programming language and relies extensively on C
pointer idioms even within the narrative text, which makes the book difficult
for Visual Basic, Turbo Pascal/Windows, or Smalltalk programmers to digest.
The commenting and formatting of the C source code is uneven. There are few
diagrams, and the production values are disappointing; the book would have
benefited enormously from the attention of a skilled book designer, a
strong-willed manuscript editor, and a few diligent copy editors. (It will be
interesting to see whether Addison-Wesley awakens to Schulman's importance
before he decides to take his projects to a publisher that will give them more
professional treatment.) But these quibbles do not change the fact that the
release of Undocumented Windows is one of most important events of 1992 for
our little corner of the universe, and every serious Windows programmer should
own a copy of this book. The days when Microsoft could get away with saying,
"Pay no attention to that man behind the curtain" are over.























November, 1992
EXTENDING TURBO VISION


Replacing the Idle method




Scott Nichol


Scott works as a consultant in the Philadelphia area and specializes in
networks, communications, and RDBMS. You can reach him on CompuServe at
72611,2511.


Borland's application framework, Turbo Vision, is based upon the event-driven
paradigm and has a particularly elegant application in the object-oriented
model. Objects derived from the TView base class receive notification of
external events as messages. But because of its object-oriented and
event-driven nature, programmers moving from traditional environments to Turbo
Vision must rethink some of the most fundamental techniques they apply.
This article presents a basic programming scenario that must be reworked in
Turbo Vision and explores Turbo Vision's event-generation and event-handling
methodologies. I'll also show how to extend the framework by adding an event
based on the BIOS timer-tick counter that replaces the TApplication .Idle
method.


Traditional Programming


Figure 1 provides an example of a traditional programming method that must be
retooled in an event-driven environment. I've used the GetKey function in
Figure 1 in almost every program I've written for DOS. Its purpose is to get
user input from the keyboard while using the idle time, during which there is
no keypress to perform other work. In this case, the function updates the
time-of-day display on the screen. GetKey contains a loop which polls for the
occurrence of one or more events. As written, such a loop should not be used
in an event-driven environment, because it is the environment's job to do the
polling for exterior events and notify the program when they occur. (I
emphasize should, because Turbo Vision allows the recalcitrant programmer to
call GetEvent directly, thus circumventing the event-driven framework.)
Figure 1: GetKey is a typical example of traditional programming methods that
must be retooled in an event-driven environment

 function GetKey: Integer;
 var
 Ch: Char;
 begin
 while not KeyPressed do
 UpdateScreenTime;
 Ch := ReadKey;
 if Ch = #0 then
 GetKey := ord (ReadKey) or $80;
 else
 GetKey := ord (Ch);
 end;



The Event-driven Approach


Given that I cannot use the code in Figure 1 in an event-driven environment,
how do I achieve the same result; that is, how do I maintain a current time
display? The Windows environment provides applications with a set of timers
that can be used to generate a periodic message for either the application
itself or a particular window within it. The procedure to update the time can
be invoked whenever a timer message is received from Windows. Turbo Vision
provides a natural, but slightly less elegant means by which the application
can handle such periodic tasks.
The Turbo Vision processes of event generation and handling are encapsulated
in the TGroup.Execute method. TApplication.Run, for example, simply invokes
the Execute method. Other TGroup descendants such as TDialog will run Execute
whenever ExecView is called for them. The "current modal view" is the TView
whose Execute method was last entered. Figure 2 is a simplified representation
of how the Execute method operates. The exact functionality of the GetEvent
and HandleEvent methods will vary, depending on the nature of the TGroup
descendant that is executing. The HandleEvent method is where most of the
program functionality is actually hooked into the framework. The GetEvent
method, on the other hand, is more of a black box, as it is seldom overridden.
TView.GetEvent simply calls Owner^.GetEvent. Since virtually no descendant of
TView overrides this, most, if not all GetEvent calls eventually chain back to
the TApplication view. In other words, it is almost always safe to think of
any GetEvent method as synonymous with Application^.GetEvent.
Figure 2: Representation of the Execute method operation.

 EndState := 0;
 repeat
 GetEvent (Event);
 HandleEvent (Event);
 until EndState <> 0:



Why not Idle?



Figure 3 is a simplified representation of TApplication.GetEvent and refers to
the TApplication.Idle method, which gives the programmer a hook into the idle
time between external events. The TVDEMO program provided by Borland, for
example, overrides the Idle method to update displays of the current
time-of-day and heap remaining.
Figure 3: Representation of TApplication.GetEvent method operation.

 GetMouseEvent (Event);
 if Event.What = evNothing then begin
 GetKeyboardEvent (Event);
 if Event.What = evNothing then
 Idle;
 end;

There are cases in which you might not wish to use the TApplication.Idle
method and therefore must create a new methodology for performing periodic
tasks. One reason for not using the Idle method is that Windows has no analog,
so in order to port Turbo Vision code to Windows, a different methodology must
be used to perform idle tasks. The second reason may be more compelling,
especially to object-oriented purists. In order to exploit the Idle method,
TApplication must have knowledge of all the views that want to process during
the idle time, whether those views are subviews of TApplication or of a TGroup
several groups removed from it in the view chain. This violates the broad
sense of the concept of encapsulation. A far more elegant approach is to allow
views to hook into idle processing via their own HandleEvent methods.


Extending Turbo Vision


It's important to realize that with Turbo Vision, it is code within the
application itself that supports the event-driven paradigm, rather than an
operating environment external to the program. Because of this, it is possible
to extend the process in ways that Windows, for example, cannot support at the
application level. In order to abate both of my Idle method objections, I have
extended TApplication.GetEvent to add a new timer event analogous to, but not
completely compatible with that of Windows.
Listing One (page 198) is a small unit to support the new event. The
GetBiosTickEvent procedure will generate an event whenever it finds that the
BIOS timer-tick counter has changed since the last call to the procedure.
While this will nominally produce an event every 55 milliseconds, this
frequency need not be very accurate; entire ticks may be lost if the procedure
is not called often enough. For this reason, the BIOS tick event uses the
Event.InfoLong field to carry the BIOS tick count at the moment the event was
generated. The unit also provides a GetBiosTicks function so that the current
tick count can be obtained at any point in the program.
I've included a demo program with this article called TVTIME.PAS; see
"Availability" on page 5. To actually generate the new events, I override the
application's GetEvent method. In TVTIME, the new GetEvent method calls
TApplication.GetEvent to check for mouse and keyboard events. If no such
events are found, Event. What is evNothing and I check for a BIOS tick event.
If there is still no event, GetEvent generates a new Idle event. While the
application described shortly does not actually use this, I have included the
Idle event for those programs that may have good use for the idle time or for
which the 55-millisecond clock resolution is too coarse for good performance.
Since timer ticks are polled after the mouse and keyboard, they have a lower
priority. This priority can be reversed simply by reversing the order of the
calls; I chose lower priority to parallel the Windows scheme.


Event Handling and Modality


The GetEvent method in TVTIME is not quite as simple as described in the
preceding paragraph, due to the effect modality has on event handling. You
will recall from the description of the TGroup.Execute method that the current
modal view is in its Execute method. The call to GetEvent from this method
eventually propagates to our new application GetEvent so that our new events
can be generated even when the current modal view is not TApplication.
HandleEvent, on the other hand, is not as cooperative. For the typical view,
this method does some specialized handling, such as TDialog handling a
cmCancel command, then calls the event handlers of each of its subviews. When
TApplication is the current modal view, all views in the application will have
a chance to handle the event. However, any other current modal view will hog
events for itself and it subviews.
However, I want the timer event to be available to all views at all times.
Otherwise, my display clock, for example, will stop ticking every time I open
a modal dialog box. For this reason, I defined a new event class
evMetaBroadcast that will be sent to all views, regardless of the current
modal view. To support the metabroadcast concept, GetEvent checks whether the
application is the current modal view, a pointer to which is available from
the TView.Top-View method. If the GetEvent call is from another modal view,
TApplication.GetEvent handles metabroadcast events itself by directly calling
TApplication.HandleEvent. Doing so allows all views in the application to
receive the event.


The Example Application


The TVTIME demo is trivial by Turbo Vision standards. It has one menu, an
About dialog box, and a few hot spots on the status bar. Its main purpose is
to demonstrate two of the objects in Listing Two (page 198). These are
variants of the clock and heap display objects from the GADGETS unit used by
TVDEMO. Rather than rely on the application to tell them to update during idle
processing, these update themselves when triggered by the timer event.
Both TClockView and THeapView are descendants of TTickView, which provides the
skeleton for a view that is updated at each timer-tick event. An essential
feature of the TTickView is that its constructor sets the evMetaBroadcast bit
in its EventMask flag. This allows the parent TGroup to pass this event to the
view.
Note that it is unlikely you would ever want to do a ClearEvent after handling
an evMetaBroadcast. The idea is that every view gets notified of an event of
this class. Also note that I have made TTickView a true abstract object by
calling the Turbo Pascal Abstract procedure from methods that must be
overridden. If they are not, a runtime error is generated.
The program allows you to disable either or both the clock and/or heap
displays. It also allows you to disable support for metabroadcasting. By doing
this, you can see the effect of modality. When TApplication is not the current
modal view, the clock stops. I was interested to discover in this way that the
menu bar is itself a modal view.


Conclusion


I have used the timer tick and idle events in other programs to support
serial-input polling. They could also be used to perform lengthy calculations
in the background while still allowing the user to abort at any time. You can
also extend what I've presented here to add still more events to your Turbo
Vision environment. For example, you can use the alarm capabilities of the
146818A real-time clock chip, directly or via BIOS interrupt 1AH, to generate
alarm events. Or you could add a GetSerialEvent procedure to directly generate
serial-data input events rather than polling for input during timer tick or
idle events.


Bibliography


Duntemann, Jeff. "Stuck Windows." DDJ (February, 1992).
Duntemann, Jeff. "Chewing the Wrapper." DDJ (January, 1992).
Duntemann, Jeff. "The Tragedy of the Black Box." DDJ (December, 1991).
Duntemann, Jeff. "Waves in What?." DDJ (November, 1991).
Roach, Kenneth. "Using the Real-Time Clock." DDJ (June, 1991)
Frid-Nielsen, Lars and Alex Lane. "Celestial Programming With Turbo Pascal."
DDJ (June, 1991).


_EXTENDING TURBO VISION_
by Scott Nichol


[LISTING ONE]



{***********************************************************************}
{ BIOSTICK.PAS }
{ }
{ Support for BIOS tick counter. The new BIOS tick event is of class }
{ evMetaBroadcast, command cmBiosTick. The Event.InfoLong field }
{ contains the tick counter value at the time of the event. The }
{ current value can be obtained using the GetBiosTicks function. }
{ Because this event is generated on a cooperative rather than }
{ preemptive basis, there may not be an event generated for every }
{ tick of the counter. Nor should any assumptions be made about the }
{ accuracy of the periodicity of the event: the nominal periodicity }
{ of 55 milliseconds will only be obtained when no other events are }
{ generated and cmBiosTick handling takes under 55 milliseconds. }
{***********************************************************************}

{$R-,S-}

unit
 BiosTick;

interface

uses
 Drivers;

procedure GetBiosTickEvent(var Event: TEvent);
function GetBiosTicks: LongInt;

implementation

uses
 Cmds;

var
 BiosTicks: LongInt absolute $40:$6c;

procedure GetBiosTickEvent(var Event: TEvent);
const
 OldTicks: LongInt = 0;
begin
 if BiosTicks <> OldTicks then begin
 OldTicks := BiosTicks;
 with Event do begin
 What := evMetaBroadcast;
 Command := cmBiosTick;
 InfoLong := OldTicks;
 end;
 end else
 Event.What := evNothing;
end;

function GetBiosTicks: LongInt;
begin
 GetBiosTicks := BiosTicks;
end;

end.




[LISTING TWO]

{***********************************************************************}
{ TICKVIEW.PAS }
{ }
{ Views to be driven by cmBiosTick. The heap and clock views were }
{ inspired by the Gadgets unit provided by Borland in the TVDEMOS }
{ subdirectory of Turbo Pascal 6.0. }
{***********************************************************************}

unit TickView;

{$R-,S-,V-}

interface

uses
 Drivers, Objects, Views, App;

type
 PTickView = ^TTickView;
 TTickView = object(TView)
 Display: Boolean;
 constructor Init(var Bounds: TRect);
 procedure Draw; virtual;
 procedure HandleEvent(var Event: TEvent); virtual;
 function DoDraw: Boolean; virtual;
 procedure DrawInfo(var S: String); virtual;
 procedure ToggleDisplay; virtual;
 end;

 PHeapView = ^THeapView;
 THeapView = object(TTickView)
 OldMem: LongInt;
 constructor Init(var Bounds: TRect);
 function DoDraw: Boolean; virtual;
 procedure DrawInfo(var S: String); virtual;
 end;

 PClockView = ^TClockView;
 TClockView = object(TTickView)
 OldTime: LongInt;
 TimeStr: String[8];
 constructor Init(var Bounds: TRect);
 function DoDraw: Boolean; virtual;
 procedure DrawInfo(var S: String); virtual;
 end;

implementation

uses
 Dos,
 BiosTick, Cmds;

{------ TTickView (abstract) ------}


constructor TTickView.Init(var Bounds: TRect);
begin
 TView.Init(Bounds);
 EventMask := EventMask or evMetaBroadcast;
 Display := True;
end;

procedure TTickView.Draw;
var
 S: String;
 B: TDrawBuffer;
 C: Byte;
begin
 C := GetColor(2);
 MoveChar(B, ' ', C, Size.X);
 DrawInfo(S);
 if Display then
 MoveStr(B, S, C);
 WriteLine(0, 0, Size.X, 1, B);
end;

procedure TTickView.HandleEvent(var Event: TEvent);
begin
 TView.HandleEvent(Event);
 if Event.What = evMetaBroadcast then
 case Event.Command of
 cmBiosTick:
 if DoDraw then DrawView;
 end;
end;

function TTickView.DoDraw: Boolean;
begin
 Abstract;
end;

procedure TTickView.DrawInfo(var S: String);
begin
 Abstract;
end;

procedure TTickView.ToggleDisplay;
begin
 Display := not Display;
 DrawView;
end;

{----------- THeapView ------------}

constructor THeapView.Init(var Bounds: TRect);
begin
 TTickView.Init(Bounds);
 OldMem := 0;
end;

function THeapView.DoDraw: Boolean;
begin
 DoDraw := OldMem <> MemAvail;
end;


procedure THeapView.DrawInfo(var S: String);
begin
 OldMem := MemAvail;
 Str(OldMem: Size.X, S);
end;

{---------- TClockView ------------}

constructor TClockView.Init(var Bounds: TRect);
begin
 TTickView.Init(Bounds);
 OldTime := 0;
end;

function TClockView.DoDraw: Boolean;
begin
 DoDraw := (GetBiosTicks - OldTime) >= 18;
end;

procedure TClockView.DrawInfo(var S: String);
var
 Hour, Minute, Second, Sec100: Word;
 Param: record
 Hr, Min, Sec: LongInt;
 end;
begin
 OldTime := GetBiosTicks;
 GetTime(Hour, Minute, Second, Sec100);
 with Param do begin
 Hr := Hour;
 Min := Minute;
 Sec := Second;
 end;
 FormatStr(S, '%02d:%02d:%02d', Param);
end;

end.





[LISTING ONE - EXTRA]

{***********************************************************************}
{ TVTIME.PAS }
{ }
{ A short program to demonstrate the addition of a new TV event class }
{ that can be broadcast outside of the event chain focus. It uses a }
{ specific command based on the BIOS timer tick counter. }
{ }
{ Copyright (c) 1992 Charles Scott Nichol. All rights reserved. }
{***********************************************************************}

{$R-,S-,X+}

program
 TVTime;


uses
 App, Dialogs, Drivers, Menus, MsgBox, Objects, Views,
 BiosTick, Cmds, TickView;

type
 TTimeApp = object(TApplication)
 MetaSupport: Boolean;
 Clock: PClockView;
 Heap: PHeapView;
 constructor Init;
 procedure GetEvent(var Event: TEvent); virtual;
 procedure HandleEvent(var Event: TEvent); virtual;
 procedure InitDeskTop; virtual;
 procedure InitMenuBar; virtual;
 procedure InitStatusLine; virtual;
 procedure OutOfMemory; virtual;
 end;

const
 cmAbout = 100;
 cmToggleClock = 101;
 cmToggleHeap = 102;
 cmToggleMeta = 103;

{----------- TTimeApp ------------}

constructor TTimeApp.Init;
var
 R: TRect;
begin
 TApplication.Init;

 MetaSupport := True;

 GetExtent(R);
 R.A.X := R.B.X - 8; R.B.Y := R.A.Y + 1; {End of top line}
 Clock := New(PClockView, Init(R));
 if ValidView(Clock) = nil then
 Fail;
 Insert(Clock);

 GetExtent(R);
 R.A.X := R.B.X - 8; R.A.Y := R.B.Y - 1; {End of bottom line}
 Heap := New(PHeapView, Init(R));
 if ValidView(Heap) = nil then begin
 Dispose(Clock);
 Fail;
 end;
 Insert(Heap);
end;

procedure TTimeApp.GetEvent(var Event: TEvent);
begin
 TApplication.GetEvent(Event);
 if Event.What = evNothing then begin
 GetBiosTickEvent(Event); {Hook to add the BIOS tick event}
 if Event.What = evNothing then begin
 Event.What := evMetaBroadcast;

 Event.Command := cmIdle; {Alternative to .Idle method}
 end;
 if MetaSupport and (Event.What = evMetaBroadcast) then begin
 if TopView <> @Self then begin {We are not the current modal view}
 HandleEvent(Event); {Force meta broadcast of event}
 ClearEvent(Event); {Prevent redundant processing}
 end;
 end;
 end;
end;

procedure TTimeApp.HandleEvent(var Event: TEvent);

 procedure About;
 const
 S1 = #3'Bios Tick Time/Heap Display Demo';
 S2 = #13#3'Copyright (c) 1992 Charles Scott Nichol';
 S3 = #13#3'All rights reserved';
 S4 = #13#3'Meta support is ';
 var
 D: PDialog;
 R: TRect;
 S5: String[15];
 begin
 R.Assign(0,0,49,10);
 D := New(PDialog, Init(R, 'About'));
 if MetaSupport then
 S5 := 'enabled'
 else
 S5 := 'disabled';
 with D^ do begin
 Options := Options or ofCentered;
 R.Assign(3, 2, Size.X - 2, Size.Y - 4);
 Insert(New(PStaticText, Init(R, S1+S2+S3+S4+S5)));
 R.Assign(19, 7, 29, 9);
 Insert(New(PButton, Init(R, 'O~k~', cmOK, bfDefault)));
 SelectNext(False);
 end;
 if ValidView(D) <> nil then begin
 DeskTop^.ExecView(D);
 Dispose(D, Done);
 end;
 end;

 procedure ToggleMeta;
 begin
 MetaSupport := not MetaSupport;
 end;

begin
 TApplication.HandleEvent(Event);
 if Event.What = evCommand then begin
 case Event.Command of
 cmAbout:
 About;
 cmToggleClock:
 Clock^.ToggleDisplay;
 cmToggleHeap:
 Heap^.ToggleDisplay;

 cmToggleMeta:
 ToggleMeta;
 end;
 ClearEvent(Event);
 end;
end;

procedure TTimeApp.InitDeskTop;
var
 R: TRect;
begin
 GetExtent(R);
 R.Grow(0,-1); {Leave room for menu bar and status line}
 DeskTop := New(PDeskTop, Init(R));
end;

procedure TTimeApp.InitMenuBar;
var
 R: TRect;
begin
 GetExtent(R);
 R.B.Y := R.A.Y + 1; {Top line only}
 MenuBar := New(PMenuBar, Init(R, NewMenu(
 NewSubMenu('~'#240'~', hcNoContext, NewMenu(
 NewItem('~A~bout', '', kbNoKey, cmAbout, hcNoContext,
 NewItem('Toggle ~C~lock Display', '', kbNoKey, cmToggleClock, hcNoContext,
 NewItem('Toggle ~H~eap Display', '', kbNoKey, cmToggleHeap, hcNoContext,
 NewLine(
 NewItem('E~x~it', '', kbNoKey, cmQuit, hcNoContext, nil)))))),
 nil))));
end;

procedure TTimeApp.InitStatusLine;
var
 R: TRect;
begin
 GetExtent(R);
 R.A.Y := R.B.Y - 1; {Bottom line only}
 StatusLine := New(PStatusLine, Init(R,
 NewStatusDef(0, $FFFF,
 NewStatusKey('~Alt-X~ Exit', kbAltX, cmQuit,
 NewStatusKey('~Alt-M~ Toggle Meta Support', kbAltM, cmToggleMeta,
 NewStatusKey('~F10~ Menu', kbF10, cmMenu, nil))),
 nil)));
end;

procedure TTimeApp.OutOfMemory;
begin
 MessageBox(#3'Insufficient memory to complete operation', nil,
 mfError + mfOkButton);
end;

{----------- Program ------------}

var
 TimeApp: TTimeApp;
begin
 if TimeApp.Init then begin
 TimeApp.Run;

 TimeApp.Done;
 end;
end.


[LISTING TWO - EXTRA]

{***********************************************************************}
{ CMDS.PAS }
{ }
{ Constants for event and commands added. }
{ }
{ Copyright (c) 1992 Charles Scott Nichol. All rights reserved. }
{***********************************************************************}

unit
 Cmds;

interface

const
 evMetaBroadcast = $400; {Use an unallocated bit from Event.What}

const
 cmBiosTick = 1000; {These commands are for evMetaBroadcast}
 cmIdle = 1001;

implementation

end.
































November, 1992
OF INTEREST





Now shipping from Softway is HSWIN, a Windows add-on module for HI-SCREEN Pro
II, Softway's language-independent user-interface system. Using HSWIN together
with HI-SCREEN Pro, you can develop user interfaces directly for Windows and
port DOS applications to Windows without modifying the source code.
To port HI-SCREEN Pro-generated DOS apps to Windows, HSWIN automatically
converts interface objects such as screens, menus, and icons into Windows
resources using a built-in library generator. You then simply recompile the
existing code with a Windows-compatible compiler.
HSWIN's suggested retail price is $250.00. Reader service no. 20.
Softway Inc. 185 Berry Street, Suite 5411 San Francisco, CA 94107 415-896-0708
Symantec has announced version 2.0 of its MultiScope Debuggers, which now
support Borland C++ and Microsoft C 6.0 and C/C++ 7.0 for Windows and DOS. In
the new, Windows-hosted interface, you can organize different types of
information in multiple windows on the same screen. New features include a
point-and-shoot collapse-and-expand C++ class hierarchy browser, C++ object
browsing, automatic C++ object mapping, alternative C++ class-information
member-scope display, breakpoints directly on object methods, direct browsing
of member pointers, complete C++ expression evaluation, and the capability to
update the source window to the actual function by selecting the method while
browsing.
The debuggers' suggested retail price is $379.00 for Windows and $179.00 for
DOS; upgrades are free for registered users. Reader service no. 21.
Symantec 10201 Torre Avenue Cupertino, CA 95014-2132 408-253-9600
AGE Logic has released XoftWare, a family of X-server software that
incorporates Novell's TCP/IP networking software and lets you access
applications on UNIX and VMS host systems from your PC.
Installing the X server is simple: Novell's TCP/IP drivers are integrated into
XoftWare, so you can install them both in one step from the XoftWare screen.
Besides the TCP/IP files, the Open Datalink Interface driver set and its IPX
file are also included, allowing XoftWare to support Token Ring and ARCnet as
well as Ethernet.
XoftWare is available for Windows, DOS, and TIGA/DOS. For Windows, the cost is
$595.00; for DOS, $495.00; for TIGA/DOS, $595.00; without TCP/IP, subtract
$100.00. Reader service no. 22.
AGE Logic 9985 Pacific Heights Boulevard San Diego, CA 92121 619-455-8600
INTRCPT is Hackensack's memory-resident interrupt trapper and debugger for DOS
and Windows. It includes an integrated memory map, vector map, and
user-maintained function database. You can monitor, trap, call, and log
interrupts at the interrupt, function, or sub-function level, providing over
16 million breakpoints. Logged interrupts are written to a disk file with all
register and buffer values shown at the call and return. Function calls are
identified by searching the function database, and there is a trap option that
puts INTRCPT in debugger mode, allowing manipulation of the function's
register and memory values.
The debugger has a load and execute option that allows real-time disassembly
of the executing target application, while the call option can invoke any
interrupt and display the registers and memory buffer on return.
INTRCPT works with Btrieve, NetWare, and NetBIOS calls and logs Windows
enhanced-mode interrupts. The cost is $99.00. Reader service no. 23.
Hackensack 6905 Silber Road, Suite 114 Arlington, TX 76017 800-325-4225
Microsoft has released Visual Basic for DOS. It provides the same collection
of objects as VB for Windows, including forms, menus, 15 standard controls,
and a compatible programming language that make it possible to develop
applications for DOS and Windows simultaneously. You can compile your finished
applications into native 8Ox86 standalone executables without using a runtime
library.
VB for DOS runs the Microsoft QuickBasic and Basic development systems and
QBasic interpreter code, so new features can be added to QuickBasic-based
applications without restructuring existing code.
The standard edition of VB includes a form designer, a multiwindow,
syntax-checking code editor, debugging tools, and a toolkit of commonly used
dialog boxes. The professional edition adds the following: 386/486 code
optimization; an integrated, high-speed ISAM engine; Microsoft Overlay
Environment technology; an alternate math library; charting, online help,
setup, and financial-function toolkits; a custom-control development kit; and
a source-code profiler.
Also available from Microsoft is the Windows Device Driver Kit (DDK), which
lets you develop, test, and debug drivers for a variety of peripherals. The
DDK enables development of virtual devices and supports the Windows Universal
Printer Driver.
The Visual Basic standard edition costs $199.00, the professional edition
costs $495.00, and the DDK is $500.00. Reader service no. 24.
Microsoft Corp. One Microsoft Way Redmond, WA 98052-6399 206-882-8080
Three reverse-engineering tools are now available from +1 Software
Engineering: Tree4c, Tree4Fortran, and Tree4Pascal. The tools parse existing
source code, displaying a program's structure chart to facilitate better
understanding and maintenance of existing code.
After you have reverse-engineered source code, +1's TreeSoft, a UNIX-based
software-engineering environment, makes that code available through "selective
reuse." Selective reuse lets you reuse project data by selecting any subtree
from any reverse-engineered program and then appending it to your current
program.
TreeSoft supports project modeling, default editing, makefile generation,
graphical viewpaths, configuration management, testing, user-defined and
filtered reuse libraries, problem-report management, profiling, project
communications, and report generation.
Tree4c, Tree4Fortran, and Tree4Pascal each cost $1500.00 and run on Sun
workstations. Reader service no. 25.
+1 Software Engineering 2510-G Las Posas Road, Suite 438 Camarillo, CA 93011
805-389-1778
Cabot Software has been granted a worldwide development and distribution
agreement for the University of California at San Diego (UCSD) Pascal compiler
and Open Systems development tools. Also included are the UCSD p-operating
systems and Modula-2, Fortran 77, and assembly language compilers.
UCSD development languages compile into p-code, affording portability of
compact, efficient code to multiple hardware platforms and operating systems.
The software is available for DOS, UNIX, DRMDOS, Macintosh, VAX, CPM, OS/2,
and Novell networks. Reader service no. 26.
Cabot Software The Vicarage, Stoke View Road Fishponds, Bristol BS16 3AE
England +44-272-586644
NETS 3.0 is a tool from the NASA Johnson Space Center (distributed by COSMIC)
for developing and evaluating neural networks. It provides a simulation of
neural-network algorithms and an environment for developing them. NETS uses
the back-propagation learning method for all the networks it creates and
allows better control of the learning process by providing features for saving
the weight values of a neural network during that process.
NETS is written in Standard C and runs on DOS, VMS, SunOS, and Cray's UNICOS.
Two executables for the PC are included: one compiled for floating-point
operations, and one for integer arithmetic. The price is $150.00. Reader
service no. 27.
COSMIC The University of Georgia 382 East Broad Street Athens, GA 30602-4272
706-542-3265
SoftPolish is a quality-assurance tool from Language Systems designed to
detect errors in Macintosh applications. SoftPolish performs four functions:
It tests an application's dialogs, windows, and controls for compliance with
Apple's Human Interface guidelines; it runs several hundred resource-validity
tests and identifies potential incompatibilities with other software; it spell
checks all string objects in application resources; and it automates file
cleaning for master disks. SoftPolish also automatically logs problems as they
are found and fixed.
SoftPolish costs $295.00 and includes an English dictionary; international
dictionaries are available for $49.00 each. Reader service no. 28.
Language Systems Corp. 441 Carlisle Drive Herndon, VA 22070-4802 703-478-0181
"A Comparison of Object-Oriented Development Methodologies," by Ronald
Schultz, is a report from Berard Software Engineering that compares the
methodologies of Booch, Martin and Odell, Coad and Yourdon, Rumbaugh, Shlaer
and Mellor, Wirfs-Brock, IBM, Berard, Colbert, and Kurtz.
The report compares and contrasts the various definitions of object-oriented
terms and discusses symbols and graphical notation. Each method's development
process is detailed and evaluated, and guidelines are provided for selecting a
methodology for an organization. Information about reusability, traceability,
testability, verification, validation, tools, training, and documentation is
also included.
The price for the report is $75.00. Reader service no. 29.
Berard Software Engineering Inc. 101 Lakeforest Boulevard, Suite 360
Gaithersburg, MD 20877 301-417-9884
PowerCode is an extensible application generator for Windows now shipping from
J Systems. Using PowerCode, you can design, generate, edit, compile, link, and
execute a Windows application. The design is prototyped via a point-and click
interface, and the interface can be tested interactively without compiling.
After you complete the design, PowerCode generates all source files necessary
for the target environment.
PowerCode is completely extensible: A high-level scripting language controls
all source-code generation, so the developer can modify script files to adapt
source code to meet individual needs. And because PowerCode is object
oriented, you can extend the definition of objects used in the design process.
Thus, source code that supports custom object classes can be generated.
PowerCode supports Borland C++ and OWL, Borland Turbo Pascal for Windows,
Microsoft C/C++ 7.0 and MFC, and ANSI C; it can be extended to support any
language or class library. Windows programming features such as tool bars,
custom controls, and MDI are also supported. The price is $395.00. Reader
service no. 30.
J Systems Inc. 4826 McAlpine Lane Charlotte, NC 28212 704-535-0079
The New 8051 Product Directory from Market Works lists hundreds of 8051
products such as chips, boards, emulators, compilers, debuggers, and real-time
kernels. A data sheet is provided for each product that includes information
about the company, the product's performance and price, and how to order. Also
listed are cross-reference guides, sources of information, and distribution
locations. The directory costs $24.00. Reader service no. 31.
Market Works 4040 Moorpark Avenue, Suite 203 San Jose, CA 95117 408-261-3333
Integrated Systems has announced the pSOSystem/386 real-time operating system,
a complete development environment for custom or PC 386/486 embedded designs.
Unlike traditional real-time operating systems, which use the BIOS to access
PC devices, PSOSystem/386 includes a set of protected-mode drivers for
standard PC devices such as keyboard, parallel and serial ports, monitor,
floppy and IDE hard-disk drives, and Ethernet cards.
pSOSystem/386 modules include the pSOS+ real-time kernel, TCP/IP networking, a
file-system manager, and a reentrant ANSI C runtime library. Support for
TCP/IP and standard C I/O facilitates quick integration of pSOSystem modules.
A compiler, assembler, linker, and the SoftProbe+ cross-debugger are also
provided. SoftProbe+ provides both source-level debug and system-level
multitasking debug capabilities, allowing monitoring of task-control blocks,
message queues, memory buffers, and so on. System-level breakpoints, task
scoping, and interactive system calls can be performed directly from the debug
command line.
Prices start at $7650.00. Reader service no. 32.
Integrated Systems Inc. 3260 Jay Street Santa Clara, CA 95054-3309
408-980-1500
A white paper entitled, "How to Get Optimum I/O Performance" is available from
Arnet. The paper details the history of the multiuser connectivity industry
and includes an overview of serial communications. Among the topics covered
are: implementations of connectivity solutions; the functions and features of
multiuser-board classes; and a comparison of multiuser-board families,
including relative performance levels and selection criteria.

Arnet is distributing the document free of charge. Reader service no. 33.
Arnet Corp. 618 Grassmere Park Drive, #6 Nashville, TN 37211 615-834-8000




























































November, 1992
SWAINE'S FLAMES


Jobs and Stuff




Michael Swaine


It's all about jobs, isn't it?
I mean, the recession is affecting everyone. Rosanne and Dan, the Bundys, the
Simpsons. Noticing the placement and size of a cover line on the September 7
issue of New York magazine, a Magazine Week writer said that it looked like
the magazine's name was "Getting Fired New York." Not a bad idea, the writer
thought: a line of regional magazines for the unemployed. Black humor.
Still, some people's job woes are a little special.
Take Gene Wang, who recently left Borland after four years to become
Symantec's vice president of development tools and productivity applications.
Now Borland is claiming that Wang sent proprietary Borland information to
Symantec CEO Gordon Eubanks via MCI Mail. Kinda casts a pall over the new job,
especially with the police raiding your house and all.
Or take Hillary Clinton, bright, talented, ambitious, and one of the 100 most
influential lawyers in the country, according to The National Law Journal. If
her husband gets elected President, she'll find herself in an unpaid job that
Martha Washington called "more like a state prisoner than anything else."
Maybe her husband will appoint her Attorney General.
One job woe that is becoming all too common is that of the mainframe
information-systems specialist who gets laid off and can't parlay 20 years of
experience into a job. ComputerWorld ran a series on the problem in August.
One laid-off $95,000-a-year veteran, after spending 18 months looking for
work, had to settle for a $20,000 job.
Hillary's husband has a lot of faith in retraining, but retraining will at
best put a worker on a par with recent graduates (college, graduate school,
business school, or trade school). Not many $95,000-a-year jobs at that level,
unless you're a lawyer.
Interestingly, all the evidence I can gather tells me that you don't care
beans about this. Like the rest of the news media, I won't let that stop me
from beating the subject to death, but if I'm right that you aren't worried
about your job security, why aren't you?
Maybe you think that you could launch a software company and be supporting
yourself in a few months if you had to. You know this lawyer or this person
who once wrote a business plan, so all you need is one brilliant idea. Well,
I'm not going to tell you you're wrong.
I had a brilliant idea once. It came from a science fiction story by Fritz
Lieber about a cat that would only drink from the toilet and kept tipping over
its water dish and batting at the water as it ran across the uneven kitchen
floor. I think it was drawn from Lieber's life.
It turned out that the batted water was an ephemeral artform. The cat couldn't
tip over the toilet, so it drank that water. This is a story fraught with
meaning, but what I picked up on was the idea of ephemeral art. Sand castles
on the beach. Performances where recording equipment is prohibited. Now,
computer art is nonephemeral by nature. My notion was to subvert that,
creating a document that would destroy itself after one reading.
Imagine my chagrin when I learned that science fiction writer William Gibson
has done it.
Gibson and artist Dennis Ashbaugh have created something called "Agrippa," a
multimedia thing that disappears after you look at it. It costs a lot of money
and is published by Kevin Begos, Jr., New York. Penn Jillette wrote about it
in the September PC Computing.
Moral: Don't let those brilliant ideas sit around. Or: Publish, copyright, and
sue.
Lotus won its suit with Borland (what goes around keeps going around) over the
1-2-3 interface, and has taken out self-congratulatory ads saying that
"Borland's copying is no different from someone plagiarizing The Grapes of
Wrath, changing the ending, and calling it a new novel. It's really that
simple."
Excuse me, but I've read John Steinbeck, and Lotus: You're no John Steinbeck.

































December, 1992
December, 1992
EDITORIAL


Winners and Losers




Jonathan Erickson


Join me in congratulating Kathleen Demain of Oceanside, California, and Karl
Gunderson of West Fargo, North Dakota, recipients of this year's Kent Porter
Scholarships. Both Kathleen and Karl, who are full-time students, full-time
parents, and longtime Dr. Dobb's readers, are managing to maintain a nearly
straight-A grade average while pursuing degrees in computer science. We're
happy to lend a helping hand and wish them the best of luck in attaining their
goals.


Hold that Thought


When we announced the DDJ Handprinting Recognition contest in July, we planned
on presenting the results and announcing the winner in this issue. What we
didn't plan on was the time it would take to evaluate all the entries, as well
as attending to other special projects, conferences, and the like. Look for
the results in our January 1993 issue.


Radio Days Revisited


In September 1991, I discussed the need to reallocate the radio spectrum if
the promise of wireless communication is to be realized. At issue was the need
to make room for services like digital personal communications.
This position was backed by the FCC and companies such as Apple, NCR, IBM,
Tandy, and Grid who want to set aside the 2-GHz RF band for emerging
technologies. The sticking point is that these frequencies are currently
allocated to others--railroads, utilities, and the like--who'll have to move
off that part of the spectrum before others can move on.
After intense lobbying by the railroads and utilities, Sen. Ernest Hollings
(D-S.C.) tacked an amendment onto a spending bill to force the FCC to maintain
the 2-GHz band for existing users. Personal-communication services would have
to sit and wait, causing U.S. developers to lose billions of dollars as
companies in other countries continue to move rapidly ahead. Luckily, the FCC
avoided indeterminate delays by proposing a three- to ten-year transition
period for current 2-GHz users to voluntarily move to other frequencies.
While the railroads and utilities were whimpering that reallocation would
seriously affect the safety of their operations, the real hue-and-cry involves
the monetary value of the radio spectrum--estimated upward of $200 billion.
Even though existing spectrum users don't "own" their frequencies, they have
first right to use them. You can bet that we'll pay millions of dollars to
entice them to move elsewhere on the spectrum.


Copyrights and Digital Royalties


Digital-recording manufacturers got a much-delayed shot in the arm while
consumers got a shot to the chops with the recently passed Congressional bill
covering home digital-recording equipment.
For the past five years, home digital recording has been on hold while lawyers
and legislators wrangled over copyright and royalty issues, stifling the
proliferation of digital-recording equipment until compensation for
songwriters and music publishers was guaranteed.
Under the terms of the recently passed legislation, royalties will not be paid
to individual artists in the music industry, but to an industry "association"
which will get a 3 percent royalty on all blank digital tapes and discs sold,
and a 2 percent royalty on all digital recorders. (That's an extra $14.00
tacked on to the price tag of a typical digital recorder.) The rationale for
this is it's impossible to collect royalties from everyone making copies, so
Congress assumed all digital-recording equipment users will be making copies
and is taxing us accordingly.
Additionally, the bill states that digital-recording machines must include
technology that would allow copying original prerecorded music, but not the
copying of copies.
Frankly, I'd like to know what's so special about the music industry that it
deserves special protection. If users of digital audio-recording equipment
have to pay taxes for blank tapes, why don't users of photocopiers or digital
scanners have to pay royalties for blank paper? Or, for that matter, why don't
PC users have to pay copy-protection taxes when they buy blank floppies? (Is
there a chance that what's really special about the $25 billion music industry
is a strong and well-financed lobbying effort in Congress that helps out with
re-election coffers?)
Perhaps what's most galling is the presumption of guilt inherent in this
upcoming law. Everyone, or so Congress seems to think, is guilty of copyright
infringement, so everyone should pay their fine up front.
Furthermore, digital data being digital data, a precedent is being set in
terms of software copying. I don't know what it will be (maybe you have some
thoughts--let me know), but I do know that it will likely mean someone else
dipping their hand into our pocketbooks.




















December, 1992
LETTERS







OO OSs


Dear DDJ,
Mike Matchett's letter in the August 1992 issue describing a "futuristic" OOP
system was a refreshing eye opener. I've spent the last five years developing
a system very similar to the one Mike describes as a commercial venture; and
am close to delivering a beta version.
My system is wholly objectized in that there is nothing but objects. The
programmer makes no distinction between an object and RAM and one on disk; the
system takes care of virtual memory details. I should point out for the record
that the original version of Smalltalk as implemented at Xerox PARC was wholly
objectized.
I refer to collection objects as "class records." Each class record is an
object which is an instance of the class "class record." Variable typing is
subordinate to classes; e.g., we have a byte class. Programs are composed of
module objects; the executable portion (object) of a module is called an
"executable record." Further, we can create and destroy objects and instances
thereof during run time. The system is capable of redefining itself during run
time. I actually did this when going from the prototype to the beta version.
A class record is, in part, a collection of member descriptors describing the
members of the class, including member functions or methods. (The system tends
to leave anthropocentric concerns like terminology to humans.) I haven't
considered a "qualification-for-membership" function, but it's a good idea.
The system does expect each class to provide 13 system service "actors" (more
of this strange need for labels); e.g., a printer which provides hardcopies of
class instances. Each member is implicitly--hopefully explicitly--an instance
of some class. The byte class is composed of eight instances of the bit class.
Object C having members A and B can invoke methods x(A) and y(B).
I deal with the "cast in stone" side effect of compilation by getting rid of
compilation. Code is reverse assembled in real time, with almost no noticeable
delay. Programming languages are implemented as groups of objects called
"macro descriptors," one macro descriptor per instruction. Modules are linked
dynamically at run time and need not be written using the same language. For
still more flexibility, a module may be written using more than one language.
A Lisp cdr instruction could be enclosed by a C for(;;) instruction. The
language purists among us will have more cause for complaint. Mike raises some
excellent possibilities to which I had not given serious thought. Foremost
among these is the notion of an interobject language which handles
translations among different dialects. I also like the idea of a printer
installing its own driver object. Sometimes when you go way out into science
fiction, you bump into reality.
Eric Young
Kalamazoo, Michigan


It's in the Numbers


Dear DDJ,
Reader Mike Matchett is quite right to wish for "an operating-system
environment wherein everything [is] an object." The full potential of
object-oriented technology won't be seen, let alone reached, so long as our OO
environments support only static classes defined at compile time. However, Mr.
Matchett is mistaken in claiming that to envision such a system entails "going
way out into science fiction." It's not science fiction, it's history! This
chapter of the Silicon Valley saga deserves to be better known.
Back in the late 1970s, while a couple of guys named Steve were showing off
their nifty gadgets at the Home Brew Computer Club, Tymshare was a thriving
high-tech venture headquartered in Cupertino, California. Tymshare had a
concern: The operating system it used to provide time-sharing on IBM
mainframes via its Tymnet network was VM/370. At that time there was serious
doubt as to whether IBM would go on supporting VM, since it competed with Big
Blue's MVS flagship. So Tymshare put a small group of their best systems gurus
to work developing a VM replacement. These people dubbed their creation the
Great New Operating System In the Sky, or GNOSIS for short. It was not only
object oriented, but it had a microkernel (long before Mach), was capability
based (and hence much more secure than any commercial OS), and featured
single-level storage with mirrored disks and built-in journaling. In
benchmarks, it processed transactions faster than CICS (IBM's standard TP
monitor) on the same 370-architecture hardware.
No special programming languages were needed to develop OO applications under
GNOSIS. Tymshare used unmodified IBM program-product compilers for 370
assembly language and PL/I, mainly because both had macro facilities. Just a
few macros extended PL/I to support what we would nowadays call the GNOSIS
API. In the summer of 1984, Tymshare hired a few people, myself included, to
test the commercial viability of GNOSIS. Management wanted to know whether
experienced procedural programmers unacquainted with OO concepts could be
productive after three or four weeks of training on GNOSIS. As the saying
goes, "the operation was a success but the patient died." The training went
well, several applications were built and tested in record time, and....
Shortly thereafter, McDonnell Douglas bought Tymshare, primarily to acquire
the thriving Tymnet business, and sent teams from St. Louis to Cupertino to
find out what else came with the package. After considerable benchmarking,
pondering and negotiating, MDC management determined that GNOSIS was "not
strategic" for their view of the future. The GNOSIS developers arranged to get
laid off, took their severance pay as earnest money, got venture capital
backing and started up Key Logic. Key Logic sold a system called KeyKos:
GNOSIS by another name.
Since then, McDonnell Douglas has pulled out of information services and gone
back to making airplanes. Key Logic stayed in business until late last year,
when it finally folded. The mainframe world had no interest in a better (but
different) OS, and Key Logic's resources were insufficient either to support
the missionary work needed to arouse such interest or to provide credibility
for long-term vendor support. This is the Catch 22 which keeps small startups
out of mature markets.
One of the last feats Key Logic performed before shutting its doors was to
recast KeyKos as a "nanokernel" running on RISC hardware. Above the nanokernel
ran an implementation of UNIX. Among other things, this allowed the computer
to be powered off while UNIX was chugging along with users updating files,
editing documents, etc. in full confidence that, once power was restored and
the machine was rebooted, UNIX would wake up and carry on from where it left
off, apart from a moment or two of amnesia. In April 1992, Alan Bomberger of
Key Logic (now at Amdahl Corp.) presented a paper on this at a USENIX workshop
on "Microkernel and Other Kernel Architectures" in Seattle. Norman Hardy, a
senior architect of GNOSIS (and of Tymnet itself), published a paper on the
KeyKos architecture in the September 1985 issue of the ACM's Operating Systems
Review.
There have been other proposals and designs for OO operating systems, but I
don't know of any that have gone as far as GNOSIS. Even leading academic
authorities seem to be unaware that such a thing has been done, not on the
scale of a laboratory proof-of-concept but as an industrial-strength
implementation on commercial hardware. As a case in point, let me quote
computer pioneer Maurice V. Wilkes on "Computer Security in the Business
World" in the "Computing Perspectives" column of Communications of the ACM,
April 1990 (Volume 33, Number 4). Dr. Wilkes wrote, "Much hope [for improved
computer security] was later based on the use of capabilities, or tickets, the
mere possession of which gives the right to make use of some resource.... Some
experimental systems were demonstrated in which the capabilities were
implemented in software, although it should have been clear from the beginning
that such systems could not, for performance reasons, be of more than
theoretical interest.... The final conclusion must be that...the capability
model...is of no use to us since efficient implementation is not possible."
How could an ACM Turing Award winner reach this totally mistaken conclusion
six years after benchmarks showed that the capability model, properly
implemented, outperformed standard IBM software on the same platform, and five
years after Hardy published his description of the architecture which enabled
this performance? One answer is that the benchmarks were confidential, and the
Operating Systems Review isn't as widely read as DDJ. Another is that
capabilities need to be implemented in a small, trusted kernel in conjunction
with interobject communication in order not to impose a performance penalty.
The studies on which Dr. Wilkes based his statements show only that
capabilities don't integrate well into conventional OS architectures.
Edward Syrett
Menlo Park, California
Dear DDJ,
I'm writing regarding "Numerical Extensions to C" by Robert Jervis in the
August 1992 DDJ, but find the need to ramble on about some other things as
well.
I began reading Dr. Dobb's around 1978, just after the IMSAI 8080 was in
production (I still have one in my closet), but before the Cromemco-Z8 had
been announced. My company started buying them with a little 4x4 inch
monochrome screen that at the time was readable without a magnifying glass. No
longer did 4000 people have to wait minutes for a response from the two Univac
1108s after pressing the Return key. DDJ was the hottest magazine around. It
contained the latest and greatest technical tips available.
Then came IBM. All of a sudden, IBM PCs proliferated on every desktop at the
laboratory where I work. They began to replace the Texas Instruments terminals
(TI Silent 700s) and Daisy Writer KSRs (keyboard send/receive units). In spite
of its Small-C compiler articles, Dr. Dobb's seemed to be lagging behind, and
I let my subscription lapse.
Several years later, a friend at work mentioned an article in Dr. Dobb's
Journal in response to a question I asked him. To be truthful, I didn't
realize that DDJ still existed. Happily, and much to my surprise it did, and I
found that the articles in it were still current and as pertinent as they
always were. BYTE magazine has turned to trash, and PC Magazine is becoming
questionable, but DDJ still addresses specific issues as well as broader,
almost philosophic concerns (and I don't have to contend with as much garbage
advertising and "blown in" trash mail as in most magazines). Embedded Systems
Programming is the only other computer magazine I read regularly.
Now back to the reason I started to write this letter: I've been exposed to
most computer languages and have written code in many of them, including
various assembly languages. Around 1980, I discovered the C programming
language. It was love at first sight. Not only was there a high-level language
that made programming easier, but it also made debugging easier. Whenever I
wrote a statement in C code, I could visualize the machine code that would be
generated. It is an elegant shorthand for writing programs that machines can
execute. There was a direct correspondence between C operators and the
instruction set of most machines that executed them, and the operators were
easily accessible, generally with one or two keystrokes. One of my pet peeves
with Pascal, in addition to its wordiness (Ada is even worse) was not being
able to shift left or right. Computers are much faster at shifting than they
are at multiplying, yet whenever I had to code something like x:=x*2, the
compiler would invariably generate a multiply instruction rather than the
shift that I wanted, as in x<<=2.
The relevance to this and the proposed extensions to the C language is that
the spirit of the original language should be preserved. I believe C was
intended to be not only portable across machines, but upwardly compatible with
new machines, not downwardly compatible with older ones. This is probably why
the size of an integer wasn't made part of the language by K&R. Instead, the
reader was cautioned that "int will normally reflect the most 'natural' size
for a particular machine," and that "all you should count on is that short is
no longer than long." A program that abided by these rules in the '80s runs on
any machine today. I've found myself declaring index variables as unsigned
char or short just because I know they wouldn't exceed 255 or 32,767, when in
fact it makes no difference to the computer. On a ma hine with a 32-bit data
bus (or 64, 128, etc.), it takes just as much time to add 1 to a byte as it
does to add 1 to an integer.
Probably one major concern of those considering extensions to the C language
is to minimize the addition of new reserved words because of the fear that
someone, somewhere, may have written a program that used that word as part of
their private program. Terseness is certainly one of C's desirable features;
however, if the language is to be extended to an entirely new class of
machines (massively parallel, with multiprecision complex floating-point
arithmetic and extended character sets), a few extra reserved words (and
perhaps operators, if any are left) will have to be added. I would rather see
this happen than to be forced to use some ill thought-out reincarnation of
Cobol (i.e., Ada). Even APL would be preferable. One last thought: At present,
I'm forced to program a 69R000 CPU (UTMC) in its minimalist RISC assembly
language (load, operate, store) because it's the most radiation-hardened CPU
there is. The original Fortran (yuck) algorithm used double-precision complex
floating-point arithmetic on multidimensional matrices (8 x 3 x 3). I would
give my left foot to have any kind of C compiler for this machine. While
everyone is debating the direction of programming languages for the next
century, try to keep in mind that there are those of us who are less
privileged.
Ron Dotson
La Crescenta, California















December, 1992
SPATIAL DATA AND THE VORONOI TESSELLATION
 This article contains the following executables: SPATIAL.ZIP COASTS.ZIP


Hrvoje Lukatela and John Russell


Hrvoje Lukatela holds a Masters Degree in Geodetic Engineering from the
University of Zagreb and has practiced as a survey engineer, software
engineer, and database designer for more than 20 years. He is the author of
the Hipparchus Geopositioning Model. John Russell began work as a programmer
for IBM in Toronto in 1957 and was IBM Canada's first senior systems engineer.
More recently, John was vice president, technology for the 1988 Calgary
Olympic Winter Games. They can be contacted at Geodyssey Ltd., 300 815 8th
Ave. SW, Calgary, Alberta, Canada T2P 3P2, Fax: 403-266-7117.


In the early days of computing, the data we worked with consisted of integers,
real numbers, and characters. Later, we moved on to time and money data.
Today, as we increasingly deal with environmental and other geographic
information, we need new ways of looking at spatial data.
For millennia, cartographers have attempted to represent the round Earth on
flat maps. The first four decades of geographic information systems (GIS) have
attempted to automate this process, typically using a "flat Earth" paradigm of
map sheets and two-dimensional coordinates. The result has been an unwieldy
collection of complex math, preset views, and location-dependent precision.
An alternative is to model the Earth using a "round Earth" paradigm. In this
way, we can roam freely with our geographic applications, modeling surface
features without restriction, and calculating spatial relationships with
uniform high precision.
In this article we'll demonstrate an approach to representing the location,
storage, retrieval, and manipulation of data in terms of its spatial
relationships. We'll use elementary trigonometry and three-dimensional vector
algebra to develop programs that demonstrate the key ideas. Then we'll build
on these concepts to show how you can develop a complete GIS that has
unprecedented speed and precision, without the use of a conventional GIS
solution.


A Simple Application


To illustrate these concepts, let's build a simple geographical atlas that
lets you roam anywhere on the globe, viewing surface features at varying
scales. In the general case, we would model our geographic features of
interest as points, lines, areas, or volumes.
Points might represent cities or survey monuments.
Lines might represent roads or flight paths.
Areas might represent islands or properties.
Volumes might represent geological formations or controlled airspaces.
For simplicity, this application will deal only with line objects. The
geographic location of a line object can be given by an ordered set of vertex
coordinates. Figure 1 illustrates some sample application objects. Listing One
(page 96) provides their numeric specification in the familiar terms of
latitude and longitude--the angles that give the location of geographic
features relative to the equator and a prime meridian. The frame of reference
is geocentric, meaning that the angles are measured from the center of the
Earth; see Figure 2. Latitude is labeled hi and longitude is labeled lambda.
While early scientists thought of the planet as a perfect sphere, we now know
it is somewhat flattened at the poles, an "ellipsoid of rotation." However,
since the eccentricity of the Earth is not great (less than a third of one
percent), we'll assume for the moment that the Earth is indeed a perfect
sphere.


Vector Algebra


Since latitudes and longitudes are angles, when we work with them we must be
prepared to calculate sines, cosines, tangents, arc tangents, and the like.
Even with today's math coprocessors, this can get messy. For instance, have
you ever tried to find the tangent of 90 degrees? You will if your application
deals with objects in the polar regions. Generally, such calculations lack a
geographically uniform distribution of precision. Luckily, a point's location
on the Earth's surface can be represented in other ways.
Consider a 3-D geocentric space having three orthogonal axes projecting
through the equator and the poles. Call these axes X, Y, and Z. Now we can
locate a point on the surface with the three coordinates x,y,z; see Figure 3.
The X axis projects through the Atlantic ocean just off West Africa, the Y
axis projects through the Indian ocean just west of Sumatra, and the Z axis
projects through the North Pole. The pictured surface point P(x,y,z) might be
somewhere in northern Afghanistan.
Given the 3-D space just described, there's another way to describe the
location of a surface point. Instead of referring its coordinates directly, we
could describe the vector perpendicular to the surface at that point. For a
perfectly spherical Earth, this normal would pass through the center. It has
unit length, and its direction is defined by the angles formed between it and
the X, Y, and Z axes. These angles are called direction angles.
We'll be working with the cosines of the direction angles--direction
cosines--labeled di, dj, and dk, respectively. The point in Afghanistan can
now be referred to as P(di,dj,dk); the point off West Africa has the
coordinates (1, 0, 0); the point in the Indian ocean is at (0,1,0); the North
Pole point is at (0,0,1); and the South Pole point is at (0,0,-1); see Figure
4.
Recording direction cosines as double types in C typically provides
sub-millimetric precision. This usually surpasses the precision of your very
best field data. Listing Two (page 96) shows some geometric vector-algebra
functions and their supporting structures and constants.


Converting Latitudes and Longitudes


Most developers are familiar with latitude and longitude. In addition, there
are "flat-Earth" coordinate systems such as UTMs and State Plane that are used
by surveyors and map makers. Few, however, are familiar with direction
cosines. Consequently, if our new system is to be of any use, we'll need an
efficient method of converting between direction cosines and these other
coordinates. For simplicity, we'll restrict input to latitudes and longitudes.
We begin by converting a file of geographic data to direction cosines.
At first glance, using direction cosines to locate a point on the Earth's
surface seems inefficient, since we're trading two items (latitude and
longitude) for three. But in modeling geographic objects, we often have
multiple locations associated with specific objects. For example, a line
object such as a coastline, river, or road is usually modeled as an ordered
sequence of connected vertices. In such instances, we might select a single,
"central" location and relate all the associated vertices to it. But will this
"differential" position encoding be effective?
In developing planar projections, map makers look for the recognizability of
shapes (conformity) and the uniformity of scale in all directions (isometry).
One of their best efforts is the stereographic projection which, over moderate
distances, produces a view of the Earth that's both conformal and isometric.
(Despite its name, this projection of 3-D onto 2-D provides no depth
perception.)
If an object is restricted in size, it can be represented in the plane of a
stereographic projection without significant distortion. This means we can use
a specific scheme of differential location recording in which each vertex of
the line is encoded as a stereographic planar displacement from some central
position. As such, this differential value will have just two components, say
dx and dy.
Using only short int types for dx and dy, resolution of better than a meter
can be maintained for surface objects as large as ten kilometers in extent.
For better resolution, we can use float or long int types; for poorer, we can
use signed char.
So, typically, we'll have traded in three doubles for two short ints, a
significant reduction in storage requirements. We refer to these
differentially encoded coordinates as local coordinates. The full,
three-element direction cosine global coordinates can easily be reconstructed
at any time, using only elementary vector algebra.


Building an Object-oriented Database


Since we're creating an application to select and display terrestrial
"objects," it makes sense to store the data externally under some kind of
object-oriented scheme. But how should the objects be indexed?
Conventional wisdom suggests that we index our data on the basis of
decomposition (or hashing) of the object's coordinates. For this application,
however, let's try something different.
First, let's establish a file as the general repository for the local
coordinates of all the vertices of all of the objects modeled. We'll provide
access to the individual parts of this file using file pointers.
Next, let's set up an index file of object headers. Each header will hold the
object's identifier, the global coordinates of its center, a file pointer to
the local coordinates of its vertices, and the vertices count. The object's
identifier can serve as a link to its other attributes (if any). The header
will also contain an estimate of the object's geographic extent, described
shortly.
When we load the database with an object, we can determine its "center" by
calculating the "vector mean" of the direction vertices' cosines. We can then
use this center to differentially encode coordinates for the vertices.

Because this application will let you zoom in and out through a wide range of
scales, we're providing two classes of line objects: those required for
close-ups (dense) and those needed only for wide-area presentations (sparse).
Since the application is to be interactive, we'll want to reduce unnecessary
data retrieval and processing time (especially if we don't have floating-point
hardware).


Calculating Surface Distances


To determine if objects are "onscreen" or not, the application will need to
know their geographic extents. Using vector algebra, we can calculate these as
surface distances, for which we'll need arc (or great circle) distances. While
we're loading objects into the database, it will prove useful to calculate and
store the maximum great circle distance that any vertex is displaced from the
object's center; see Figure 5.
Listing Three (page 98) and the called functions in Listing Two provide code
to read and convert location data to direction cosines, differentially encode
them, calculate distances, and build a location-dependent database.


Selecting Objects for Presentation


Our application provides a window on the world, so to speak, by displaying
objects that come within a field of view you select. A view is defined in
terms of location and scale. The location of the display's center can be
expressed as a latitude and longitude. Scale can be expressed as the ratio
between a distance on the screen and a distance on the ground. Figure 6
illustrates such a window, while Listing Four (page 99) and the called
functions in Listing Two show the code needed to establish an initial view and
scale.
Now that field of view is defined, we can locate objects that might come into
that field using distance calculations. If you think of the display as
circular rather than rectangular, then you can calculate a maximum radius for
the display. You can go to the database and select those objects that might be
displayed. (The graphics-library clipping function will fine-tune the
selection later.)
The header for each object contains the maximum distance of any vertex from
the object's center. This was calculated and stored when we loaded the
database. So, to determine if the object might be in the field of view,
simply: 1. find the distance between its center point and that of the display;
2. subtract the maximum radius for the object; and, 3. subtract the maximum
radius for the display.
If the result is negative, you'll want to retrieve the object from the
database for further processing; otherwise, ignore it. Figure 7 illustrates
both of these conditions. Listings Four and Two provide code to make the
selection and bring the selected objects into memory.


Drawing Objects


Next we need to project each object's vertices into the plane of the display
(projection plane), which is generally not the same as that of the display.
For simplicity, we'll go back to the sphere and reproject the object's
vertices using, for this example, the stereographic projection. Other
projections--gnomonic, orthographic, Mercator, and the like -- might also be
used. Gnomonic is the easiest, but stereographic looks better and is worth the
effort. Listings Four and Two give the code to perform the projections and
draw the objects. (For more about map projections, see "Map Projections Used
By The U.S. Geological Survey, Sec. Ed.," Geological Survey Bulletin 1532,
Department of the Interior, U.S. Government Printing Office, Washington, DC,
1984.)


Panning and Zooming


Suppose you want to change the scale or view of the display. Simply modify
these items and repeat the previous operations. A simple outside loop that
changes the scale or map center point will work. Listing Four shows code to
accept changes via the sign and arrow keys.
That completes our simple atlas application. Even with the slowest PC, you can
now inspect the world's coastlines without preselection of view or scale. To
more fully exercise the system, raw world-coastline data (in ASCII form) is
available electronically, as is a prebuilt world-coastline database (in
binary); an executable View program in DOS real mode, compiled for VGA with
math coprocessor emulation; and ASCII source code for the programs; see
"Availability" on page 5.


Ellipsoidal Vector Algebra and the Voronoi Tessellation


Since the Earth is closer to an ellipsoid of rotation than a sphere, we'll
need to extend our vector algebra. The required quadratic vector algebra has
been fully implemented in the Hipparchus Geopositioning Model with significant
improvements in speed and precision over conventional geodesy methods. (See
Geodesy, by Henry D. Bomford, Oxford University Press, 1973.)
For this sample application, we calculated a local center point for each
object and then used this to select objects from the database. We also used
these center points to encode the large number of vertex coordinates
associated with our objects.
Suppose we could precalculate a set of center points that would serve the same
purposes for all the objects in the database. Ideally, such a set of center
points would provide both fast spatial indexing and a flexible association
with objects. In such a spatial index, each indexed database "bucket" would
hold some prescribed maximum number of object-defining coordinates. Then we
could have geographically large cells for surface regions where we've little
or no data and geographically small cells where we have a lot of data.
The Hipparchus Geopositioning Model implements just such a scheme using a
flexible partitioning system called a "Voronoi cell structure." Figure 8 shows
one such tessellation of the Earth. The structure illustrated would be
suitable for indexing population-related data objects. Voronoi cell structures
are always global, even if the application is localized. A cell structure is
defined by its cell center points. For each cell, the structure includes a
unique cell identifier, the global coordinates of the cell's center point, the
cell's maximum radius, and an ordered list of neighbor-cell identifiers. The
boundaries between cells exist only mathematically.
The special property of the Voronoi cell structure is that any surface point
can be classified unambiguously as belonging to one cell or another on the
basis of surface distance. A point is always closer to the center point of its
"owner" cell than to the center point of any other cell. For a discussion of
the Voronoi tessellation of the plane, see Algorithms, Second Edition, by
Robert Sedgewick (Addison-Wesley, 1988). For a description of its adaptation
to the surface of the ellipsoid, see "Hipparchus Geopositioning Model: An
Overview," by H. Lukatela in Proceedings of Auto Carto 8 (American Society for
Photogrammetry and Remote Sensing, 1987), and the Hipparchus Tutorial, by Ron
V. Gilmore (Geodyssey, 1992).


Objects in the Voronoi Context


In the context of a Voronoi cell structure, an object's vertices are
associated with their closest-cell center points as well as an object header.
Objects can then be defined without geographic size restriction of any kind.
Objects can consist of sets of points, lines, or regions spanning any number
of cells. Regions can be nonsimply connected: An island group can be modeled
as a single object, islands can have interior lakes with islands, and so on.
Volumes can be modeled as regions having elevation or depth attributes. Figure
9 shows the intersection of two overlapping region objects in the Voronoi
context.
Cell center points rather than object centers are used for the differential
encoding of coordinates. Lists are maintained for each case:
For point set objects, lists of cells occupied by their points.
For line set objects, lists of cells traversed by their line segments.
For region objects, lists of their interior cells as well as the cells
traversed by their boundary rings.
For more about these data structures, refer again to "Hipparchus
Geopositioning Model: An Overview" and the Hipparchus Tutorial.


Voronoi Navigation


When used as an index to objects stored externally, the Voronoi cell structure
proves remarkably effective in reducing unnecessary disk accesses. Not only
are all the cells containing object data known to the application program, but
cells associated with open windows are known as well. As you pan and zoom the
window, precise retrieval instructions can be fed to the database.
References to random locations are traced to their owner cells by a
geographically direct search route. Ownership of a point by a particular cell
is confirmed when a comparison of distances with the cell's immediate neighbor
cell center points shows them to be more distant.
In this application, we had to search our entire index to determine which data
was to be selected. This was because we knew of no way to map directly from
the 3-D ordered domain of our real-world objects into the linearly ordered
domain of the computer. But when we associate these objects with a Voronoi
cell structure, the situation changes.

The unambiguous classification of object vertices into a specific, linearly
ordered structure of cells makes possible the use of hierarchical searches for
the data, resulting in significant efficiencies.
Since the order of cell identifiers in a cell structure is irrelevant to its
algorithmic operation, cells can be arranged in any order. Therefore,
data-access bias can be arbitrarily imposed without affecting the logic of the
application.


Summary


The demand for efficient handling of crushing volumes of spatial data has
arrived. Round-Earth vector algebra and the Voronoi tessellation can be
combined to provide unrestricted modeling and efficient manipulation of
terrestrial objects. Precise spatial indexing can be provided on the basis of
distance calculations rather than coordinate decomposition. Monolithic
geographic information systems may soon be history.
_SPATIAL DATA AND THE VORONOI TESSELLATION_
by Hrvoje Lukatela and John Russell


[LISTING ONE]

* Townsite
 50.42 -100.13
 50.41 -100.15
 50.40 -100.16
 50.39 -100.18
 50.38 -100.20
 50.37 -100.20
 50.37 -100.28
 50.20 -100.28
 50.20 -100.16
 50.20 -100.00
 50.42 -100.00
 50.42 -100.13
* Highway
 50.45 -99.39
 50.31 -100.04
 50.17 -100.20
 49.56 -100.48
 49.42 -101.26
* Highway
 50.31 -100.04
 50.31 -101.28
* Flight Path
 50.00 -101.28
 49.85 -99.40
* River
 49.37 -99.81
 49.38 -99.83
 49.42 -99.86
 49.43 -99.88
 49.44 -99.90
 49.46 -99.92
 49.47 -99.94
 49.47 -99.96
 49.48 -99.98
 49.49 -100.00
 49.49 -100.02
 49.49 -100.04
 49.48 -100.06
 49.48 -100.08
 49.47 -100.11
 49.47 -100.13
 49.47 -100.16
 49.46 -100.18
 49.46 -100.20

 49.45 -100.22
 49.44 -100.24
 49.43 -100.24
 49.42 -100.21
 49.41 -100.19
 49.41 -100.18
 49.41 -100.16
 49.41 -100.14
 49.41 -100.12
 49.41 -100.10
 49.40 -100.08
 49.38 -100.08
 49.37 -100.10
 49.36 -100.12
 49.35 -100.15
 49.35 -100.17
 49.36 -100.19
 49.37 -100.21
 49.38 -100.23
 49.38 -100.26
 49.39 -100.28
 49.39 -100.30
 49.39 -100.33
 49.38 -100.35
 49.38 -100.38
 49.38 -100.41
 49.37 -100.43
 49.36 -100.45
 49.35 -100.47
 49.34 -100.49
 49.33 -100.51
 49.32 -100.52
 49.31 -100.53
 49.29 -100.53
*






[LISTING TWO]

/* ---------------------------------------------------------------- *
 * ALGEBRA functions: A sampling of geometronical vector algebra *
 * functions and their supporting manifest constants and structure *
 * declarations. *
 * The following code is derived from similar functions which are *
 * a small part of the Hipparchus Library. For simplicity, it lacks *
 * the "fuzz control" and other programming elements of practical *
 * numerical significance. *
 * Programmer: Hrvoje Lukatela, September 1992. *
 * Geodyssey Limited, Calgary - (403) 234 9848, fax: (403) 266 7117 *
 ------------------------------------------------------------------ */

#include <math.h>

#define PI 3.14159265358979324
#define DEG2RAD (PI / 180.0) /* degrees to radians... */

#define RAD2DEG (180.0 / PI) /* ... and vice versa */
#define LC_SCALE 32000.0 /* local coordinate scale factor */

struct plpt { /* point in a Cartesian projection plane */
 double est;
 double nrt;
 };
struct lclpt { /* local (object) coordinates */
 short est;
 short nrt;
 };
struct dpxl { /* display screen coordinates */
 short x;
 short y;
 };
struct ltln { /* point latitude-longitude, radians */
 double lat;
 double lng;
 };
struct vct3 { /* 3-D vector; x,y,z direction cosines */
 double di;
 double dj;
 double dk;
 };
struct vct2 { /* as above, in plane, internal use */
 double di;
 double dj;
 };
struct indexRec { /* line segment database index record */
 struct vct3 center; /* nominal object center point */
 double radius; /* in radian arc measure */
 long fileOffset; /* offset in the coordinate data file */
 short vertexCount; /* count of coordinate vertices */
 short segmentId; /* for possible application use? */
 };

/* ----- Transform Latitude and Longitude Angles to Direction Cosines ----- */
void LatLongToDcos3(const struct ltln *pa, struct vct3 *pe) {
 double cosphi;

 cosphi = cos(pa->lat);
 pe->di = cosphi * cos(pa->lng);
 pe->dj = cosphi * sin(pa->lng);
 pe->dk = sin(pa->lat);
 return;
 }

/* ---- Transform Direction Cosines to Latitude and Longitude Angles ----- */
void Dcos3ToLatLong(const struct vct3 *pe, struct ltln *pa) {
 pa->lat = atan2(pe->dk, sqrt(pe->di * pe->di + pe->dj * pe->dj));
 pa->lng = atan2(pe->dj, pe->di);
 return;
 }

/* ---- Normalize a 3-D Direction Cosine Vector ---- */
void NormalizeDcos3(struct vct3 *vcc) {
 double d;

 d = 1.0 / sqrt(vcc->di * vcc->di +

 vcc->dj * vcc->dj +
 vcc->dk * vcc->dk);
 vcc->di *= d;
 vcc->dj *= d;
 vcc->dk *= d;
 return;
 }

/* ----- Normalize a 2-D Direction Cosine Vector ---- */
void NormalizeDcos2(struct vct2 *vcc) {
 double d;

 d = 1.0 / sqrt(vcc->di * vcc->di + vcc->dj * vcc->dj);
 vcc->di *= d;
 vcc->dj *= d;
 return;
 }

/* ----- Spherical Arc (Great Circle Distance) - First Approximation ----- */
double ArcDist(const struct vct3 *pea, const struct vct3 *peb) {
 double chord, sqChord;

 sqChord = (peb->di - pea->di) * (peb->di - pea->di) +
 (peb->dj - pea->dj) * (peb->dj - pea->dj) +
 (peb->dk - pea->dk) * (peb->dk - pea->dk);
 chord = sqrt(sqChord);
 return(chord + ((sqChord * chord) / 24));
 }

/* ----- Direct Stereographic Projection (Map, Sphere to Plane) ----- */
void MapStereo(const struct vct3 *p0,
 const struct vct3 *pe, struct plpt *pw) {
 struct vct3 prln;
 double t, am, bm, ap, bp, cp, xi, yi, zi;
/* ---------------------------------------------------------------- */
/* Find tangency point relative values. */

 cp = sqrt(p0->di * p0->di + p0->dj * p0->dj);
 am = -(p0->dj / cp);
 bm = p0->di / cp;
 ap = -(p0->dk * bm);
 bp = p0->dk * am;

/* Intersection of the projection line and the intersection plane. */
 prln.di = -(p0->di + pe->di);
 prln.dj = -(p0->dj + pe->dj);
 prln.dk = -(p0->dk + pe->dk);

 NormalizeDcos3(&prln);

 t = -((p0->di * pe->di + p0->dj * pe->dj + p0->dk * pe->dk - 1.0) /
 (p0->di * prln.di + p0->dj * prln.dj + p0->dk * prln.dk));
 xi = pe->di + prln.di * t;
 yi = pe->dj + prln.dj * t;
 zi = pe->dk + prln.dk * t;

/* Stereographic plane coordinates are the oriented distances from
 the intersection point to the meridian and prime vertical plane. */
 pw->est = am * xi + bm * yi;

 pw->nrt = ap * xi + bp * yi + cp * zi;
 return;
 }

/* ----- Inverse Stereographic Projection (Un-Map, Plane to Sphere) ---- */
void UnMapStereo(const struct vct3 *p0,
 const struct plpt *pw, struct vct3 *pe) {
 struct vct3 prln;
 double gcx, am, bm, ap, bp, cp, cpsq;
 double xe, ye, ze, xc, yc, zc, lymx, lxmy, root, t;
/* ---------------------------------------------------------------- */

/* Find the sphere/plane tangency point values: ap, bp, cp are
 components of the "North" vector, and am, bm of "East" vector
 in this point. The "East" vector has no "Z" axis component. */
 gcx = sqrt(pw->est * pw->est + pw->nrt * pw->nrt);

 cpsq = p0->di * p0->di + p0->dj * p0->dj;
 cp = sqrt(cpsq);
 am = -(p0->dj / cp);
 bm = p0->di / cp;

 ap = -(p0->dk * bm);
 bp = p0->dk * am;

/* Find Cartesian coordinates of the projection center (xc,yc,zc)
 and the projected point in the plane of projection (xe,ye,ze). */
 xc = -p0->di;
 yc = -p0->dj;
 zc = -p0->dk;

 xe = -xc + ap * pw->nrt + am * pw->est;
 ye = -yc + bp * pw->nrt + bm * pw->est;
 ze = -zc + cp * pw->nrt;

/* Find the intersection of ptc-pte line and the sphere.
 Solution requires solving a quadratic in t, the line parameter. */
 prln.di = -gcx;
 prln.dj = -2.0;
 NormalizeDcos2((struct vct2 *)&prln);
 lymx = prln.dj * gcx - prln.di;
 lxmy = -(prln.di * gcx + prln.dj);
 root = sqrt(1.0 - (lymx * lymx));

 t = lxmy - root; /* Find the closer of the two quadratic roots. */

/* Substitute the parameter in the parametric line equations. */
 prln.di = xc - xe;
 prln.dj = yc - ye;
 prln.dk = zc - ze;
 NormalizeDcos3(&prln);
 pe->di = xe + t * prln.di;
 pe->dj = ye + t * prln.dj;
 pe->dk = ze + t * prln.dk;
 NormalizeDcos3(pe);
 return;
 }

/* ----- Initialize Projection Plane / Display Translation and Scaling -----
*/

void SetPlaneDisplay(double *xfmArray,
 const struct plpt *w1, const struct plpt *w2,
 const struct dpxl *d1, const struct dpxl *d2) {
 double dx, dy, du, dv;

 dx = (w2->est) - (w1->est);
 dy = (w2->nrt) - (w1->nrt);
 du = (double)((d2->x) - (d1->x));
 dv = (double)((d2->y) - (d1->y));
 xfmArray[0] = dx / du;
 xfmArray[1] = dy / dv;
 xfmArray[2] = du / dx;
 xfmArray[3] = dv / dy;
 xfmArray[4] = w1->est - xfmArray[0] * ((double)d1->x + 0.5);
 xfmArray[5] = w1->nrt - xfmArray[1] * ((double)d1->y + 0.5);
 xfmArray[6] = ((double)d1->x + 0.5) - xfmArray[2] * w1->est;
 xfmArray[7] = ((double)d1->y + 0.5) - xfmArray[3] * w1->nrt;
 return;
 }

/* ----- Translate/Scale Point from a Projection Plane to the Display ----- */
void PlaneToDisplay(const double *xfmArray,
 const struct plpt *w, struct dpxl *d) {
 d->x = (short)(xfmArray[6] + xfmArray[2] * w->est);
 d->y = (short)(xfmArray[7] + xfmArray[3] * w->nrt);
 return;
 }

/* ----- Translate/Scale Point from the Display to a Projection Plane ----- */
void DisplayToPlane(const double *xfmArray,
 const struct dpxl *d, struct plpt *w) {
 w->est = xfmArray[4] + xfmArray[0] * ((double)d->x + 0.5);
 w->nrt = xfmArray[5] + xfmArray[1] * ((double)d->y + 0.5);
 return;
 }






[LISTING THREE]

/* ---------------------------------------------------------------- *
 * BUILD Program: Construct a seamless global database of line *
 * segments from text files defining the location of line vertices. *
 * Two levels of detail are provided: dense and sparse. The input *
 * consists of two flat ASCII text files; one each for dense and *
 * sparse coastline segment vertices. The segment is begun with a *
 * line containing an asterisk marker (*) as its first character. *
 * The remainder of such line is ignored, and may be used to name *
 * the segment. End-of-file is signalled by a similar marker line. *
 * Line segment vertex coordinates follow each marker line, one *
 * latitude/longitude pair per line. Coordinates are given in *
 * degrees, as free-format, white-space or comma delimited decimal *
 * fraction character strings. Numbers are signed according to *
 * the international geographic coordinates sign convention: *
 * westerly longitudes and southerly latitudes are negative. *
 * ---------------------------------------------------------------- */


#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "algebra.c"

#define LINE_LNGTH 128
#define MAX_VRTX 1024

void buildFiles(FILE *, FILE *, FILE *);

void main(void) {
 FILE *fpTextLines;
 FILE *fpIndex;
 FILE *fpCoords;
 char *fnTextLines0 = "coast0.lns"; /* dense input segments */
 char *fnTextLines1 = "coast1.lns"; /* sparse input segments */
 char *fnIndex0 = "coast0.idx"; /* dense database index */
 char *fnIndex1 = "coast1.idx"; /* sparse database index */
 char *fnCoords = "coast.dat"; /* composite coordinate file */
/* ---------------------------------------------------------------- */

 if ((fpTextLines = fopen(fnTextLines1, "rt")) == NULL) {
 fprintf(stderr,"Input file (%s) open failed.\n", fnTextLines1);
 exit(1);
 }
 if ((fpIndex = fopen(fnIndex1, "wb")) == NULL) {
 fprintf(stderr,"Index file (%s) open failed.\n", fnIndex1);
 exit(1);
 }
 if ((fpCoords = fopen(fnCoords, "wb")) == NULL) {
 fprintf(stderr,"Data file (%s) open failed.\n", fnCoords);
 exit(1);
 }
 fprintf(stderr, "\nSparse line segment input file...\n");
 buildFiles(fpTextLines, fpIndex, fpCoords);

 fclose(fpTextLines);
 fclose(fpIndex); /* Note that the coordinate file stays open! */

 if ((fpTextLines = fopen(fnTextLines0, "rt")) == NULL) {
 fprintf(stderr,"Input file (%s) open failed.\n", fnTextLines0);
 exit(1);
 }
 if ((fpIndex = fopen(fnIndex0, "wb")) == NULL) {
 fprintf(stderr,"Index file (%s) open failed.\n", fnIndex0);
 exit(1);
 }

 fprintf(stderr, "\nDense line segment input file...\n");
 buildFiles(fpTextLines, fpIndex, fpCoords);

 fclose(fpTextLines);
 fclose(fpIndex);
 fclose(fpCoords);

 fprintf(stderr, "\nTwo-level coastline database created:\n");
 fprintf(stderr, "Index files: %s, %s\n", fnIndex0, fnIndex1);
 fprintf(stderr, "Coordinate data file: %s\n", fnCoords);

 }

/* ---- Read Line Segment File, Write Index File and Vertex Coordinates ----
*/
void buildFiles(FILE *fpTextLines, FILE *fpIndex, FILE *fpCoords) {
 static struct vct3 vertex[MAX_VRTX];
 struct indexRec indexRec;
 struct plpt stereoPlaneVertex;
 struct ltln inVertex;
 struct lclpt shortVertex;
 double s, d;
 int i, fileLine, lineCount;
 long totalVertexCount;
 char inLine[LINE_LNGTH + 1];
/* ---------------------------------------------------------------- */

 fileLine = lineCount = indexRec.vertexCount = 0;
 indexRec.segmentId = 0;
 totalVertexCount = 0L;

 while (fgets(inLine, LINE_LNGTH, fpTextLines)) {
 fileLine++;
 if (inLine[0] == '*') { /* line segment header, end-of-file? */
 if (indexRec.vertexCount) { /* process accumulated segment */
 fprintf(stderr,"line:%d vertices:%d \r",
 lineCount, indexRec.vertexCount);

 indexRec.center.di = 0.0; /* find object "center" */
 indexRec.center.dj = 0.0;
 indexRec.center.dk = 0.0;
 for (i = 0; i < indexRec.vertexCount; i++) {
 indexRec.center.di += vertex[i].di;
 indexRec.center.dj += vertex[i].dj;
 indexRec.center.dk += vertex[i].dk;
 }
 NormalizeDcos3(&indexRec.center);

 indexRec.radius = 0.0; /* center-far-vertex distance */
 for (i = 0; i < indexRec.vertexCount; i++) {
 d = ArcDist(&indexRec.center, vertex + i);
 if (d > indexRec.radius) indexRec.radius = d;
 }
 if (indexRec.radius < 1.0e-10) {
 indexRec.radius = 0.0;
 s = 0.0;
 }
 else s = LC_SCALE / indexRec.radius;

 indexRec.fileOffset = ftell(fpCoords);
 fwrite(&indexRec, sizeof(struct indexRec), 1, fpIndex);

 for (i = 0; i < indexRec.vertexCount; i++) {
 MapStereo(&indexRec.center, vertex + i,
 &stereoPlaneVertex);
 shortVertex.est = (short)(s * stereoPlaneVertex.est);
 shortVertex.nrt = (short)(s * stereoPlaneVertex.nrt);
 fwrite(&shortVertex, sizeof(struct lclpt),1, fpCoords);
 }
 totalVertexCount += indexRec.vertexCount;
 indexRec.vertexCount = 0;

 lineCount++;
 }
 }
 else { /* next in a series of line segment vertices */
 inVertex.lat = DEG2RAD * atof(strtok(inLine, " ,\t\n"));
 inVertex.lng = DEG2RAD * atof(strtok(NULL, " ,\t\n"));
 if (((fabs(inVertex.lat) < 1.0e-10)
 && (fabs(inVertex.lng) < 1.0e-10)) /* 0.0 lat, 0.0 long? */
 (fabs(inVertex.lat) > 0.5 * PI)
 (fabs(inVertex.lng) > PI)) { /* lat/long out of range? */
 fprintf(stderr,"\nBad data, file line %d", fileLine);
 exit(1);
 }
 if (indexRec.vertexCount == MAX_VRTX) {
 fprintf(stderr,"\nSegment vertex buffer overflow...");
 exit(1);
 }
 LatLongToDcos3(&inVertex, vertex + indexRec.vertexCount++);
 }
 }
 fprintf(stderr,"...processed, line segments:%d, vertices:%ld\n",
 lineCount, totalVertexCount);
 }






[LISTING FOUR]

/* --------------------------------------------------------------- *
 * VIEW Program: View the line segments from a global database. *
 * Display areas of the Earth in stereographic projection at dif- *
 * ferent scales and with different map center (tangency) points. *
 * The user interface is simple: it provides for change of scale *
 * and shift of map center point using the sign and arrow keys. *
 * For each map scene, calculate its radius of view. Then retrieve *
 * from the coordinate file only those coastline segments that *
 * might come into view. Draw the line segments, relaying on the *
 * display-graphics to clip the parts still outside the window. *
 * --------------------------------------------------------------- */

#include <stdio.h>
#include <conio.h>
#include <stdlib.h>
#include <string.h>
#include <graph.h> /* Using MS C 6.0 graphics... */

#include "algebra.c"

#define DISPL_WIDE 0.24 /* screen width, meters */
#define DISPL_HIGH 0.18 /* screen height, meters */
#define EARTH_RAD 6.371e6 /* approximate Earth's radius, meters */

#define MERIDIAN 1 /* grid drawing selectors */
#define PARALLEL 0

#define COLOR_FRAME 7

#define COLOR_GRID 1
#define COLOR_COAST 3
#define COLOR_SCALE 7
#define COLOR_PROMPT_TEXT 5
#define COLOR_PROMPT_KEYS 7

void drawDataLines(FILE *, FILE *);
void drawGrid(int, int);
void drawGridSegment(const struct ltln *, double, double, int);

static double xfmArray[8]; /* Plane/display transfomation */
static double maxDispDist; /* Radius of view */
static struct vct3 displayCenter; /* Map center, spherical */

void main(void) {

/* Initial map scale and center (projection plane tangency) point: */
 double worldScale = 100.0e6;
 struct ltln llStart = {DEG2RAD * 50.0, DEG2RAD * -100.0};

 struct plpt pUpperLeft, pLowerRight, pNewCntr;
 struct dpxl dUpperLeft, dUpperMid, dUpperRight,
 dLowerLeft, dLowerMid, dLowerRight,
 dLeftMid, dRightMid, dNewCntr, dCntr;
 double worldWide, worldHigh;
 struct videoconfig vcnfg;
 struct vct3 sphVx;
 int ich;
 char outStr[32];

 char *fnIndex0 = "coast0.idx";
 char *fnIndex1 = "coast1.idx";
 char *fnCoordinates = "coast.dat";

 FILE *fpIndex0;
 FILE *fpIndex1;
 FILE *fpCoordinates;
/* ---------------------------------------------------------------- */

 LatLongToDcos3(&llStart, &displayCenter); /* initial map center */

 if ((fpIndex0 = fopen(fnIndex0, "rb")) == NULL) {
 fprintf(stderr,"Index file (%s) open failed.\n", fnIndex0);
 exit(1);
 }
 if ((fpIndex1 = fopen(fnIndex1, "rb")) == NULL) {
 fprintf(stderr,"Index file (%s) open failed.\n", fnIndex1);
 exit(1);
 }
 if ((fpCoordinates = fopen(fnCoordinates, "rb")) == NULL) {
 fprintf(stderr,"Data file (%s) open failed.\n", fnCoordinates);
 exit(1);
 }
 if (_setvideomode(_VRES16COLOR) == 0) { /* assume VGA graphics */
 fprintf(stderr, "Graphics mode set failed.\n");
 exit(1);
 }
 _getvideoconfig(&vcnfg);


 dUpperLeft.x = dUpperLeft.y = 0;
 dLowerRight.x = vcnfg.numxpixels - 1;
 dLowerRight.y = vcnfg.numypixels - 1 - 20;

 _setcliprgn(dUpperLeft.x, dUpperLeft.y,
 dLowerRight.x, dLowerRight.y);

 dCntr.x = (dUpperLeft.x + dLowerRight.x)/2;
 dCntr.y = (dUpperLeft.y + dLowerRight.y)/2;

 dLowerLeft.x = dLeftMid.x = dUpperLeft.x;
 dUpperRight.x = dRightMid.x = dLowerRight.x;
 dUpperMid.x = dLowerMid.x = dCntr.x;

 dUpperRight.y = dUpperMid.y = dUpperLeft.y;
 dLowerLeft.y = dLowerMid.y = dLowerRight.y;
 dLeftMid.y = dRightMid.y = dCntr.y;

 for (;;) {
 worldWide = (worldScale * DISPL_WIDE) / EARTH_RAD;
 worldHigh = (worldScale * DISPL_HIGH) / EARTH_RAD;

 pUpperLeft.est = -worldWide * 0.5;
 pUpperLeft.nrt = worldHigh * 0.5;
 pLowerRight.est = worldWide * 0.5;
 pLowerRight.nrt = -worldHigh * 0.5;

 SetPlaneDisplay(xfmArray,
 &pUpperLeft, &pLowerRight, &dUpperLeft, &dLowerRight);

 UnMapStereo(&displayCenter, &pUpperLeft, &sphVx);
 maxDispDist = ArcDist(&displayCenter, &sphVx);

 _clearscreen(_GCLEARSCREEN);
 _setcolor(COLOR_GRID);
 if (worldScale > 10.0e6) drawGrid( 1, 1);
 else if (worldScale > 3.0e6) drawGrid( 2, 3);
 else drawGrid(10, 15);

 _settextcolor(COLOR_PROMPT_TEXT);
 _settextposition(vcnfg.numtextrows, 20);
 _outtext("Press space bar to interrupt this scene...");

 _setcolor(COLOR_COAST);
 if (worldScale > 20.0e6) drawDataLines(fpIndex1, fpCoordinates);
 else drawDataLines(fpIndex0, fpCoordinates);

 sprintf(outStr, worldScale > 20.0e6 ?
 "1 : %.0lfM" : "1 : %.1lfM", worldScale / 1.0e6);
 _settextposition(vcnfg.numtextrows - 2, 37);
 _settextcolor(COLOR_SCALE);
 _outtext(outStr);

 _setcolor(COLOR_FRAME);
 _rectangle(_GBORDER, dUpperLeft.x, dUpperLeft.y,
 dLowerRight.x, dLowerRight.y);

 _settextcolor(COLOR_PROMPT_TEXT);
 _settextposition(vcnfg.numtextrows, 1);

 _outtext(" Press: (+)(-) to change scale, (\x1b)(\x18)"
 "(\x19)(\x1a) to move center, (Esc) to quit.");

 _settextcolor(COLOR_PROMPT_KEYS); /* highlight key characters */
 _settextposition(vcnfg.numtextrows, 10); _outtext("+");
 _settextposition(vcnfg.numtextrows, 14); _outtext("-");
 _settextposition(vcnfg.numtextrows, 35); _outtext("\x1b");
 _settextposition(vcnfg.numtextrows, 39); _outtext("\x18");
 _settextposition(vcnfg.numtextrows, 43); _outtext("\x19");
 _settextposition(vcnfg.numtextrows, 47); _outtext("\x1a");
 _settextposition(vcnfg.numtextrows, 67); _outtext("Esc");

 do {
 if (ich = getch()) { /* non-0 scan code, ACSII character */
 switch (ich) {
 case 45: worldScale *= 2.0; break; /* - */
 case 43: worldScale /= 2.0; break; /* + */
 case 27: _setvideomode(_DEFAULTMODE); exit(0);/* Esc */
 default: putch('\a'); ich = 0; /* invalid key */
 }
 if (ich) { /* OK, scale changed, enforce limits */
 if (worldScale < 1.5625e6) worldScale = 1.5625e6;
 if (worldScale > 200.0e6) worldScale = 200.0e6;
 }
 }
 else { /* arrow or diagonal key, move map tangency point */
 switch (ich = getch()) { /* get "extended scan code" */
 case 73: dNewCntr = dUpperRight; break; /* up/right */
 case 72: dNewCntr = dUpperMid; break; /* up */
 case 71: dNewCntr = dUpperLeft; break; /* up/left */
 case 75: dNewCntr = dLeftMid; break; /* left */
 case 79: dNewCntr = dLowerLeft; break; /* down/left */
 case 76: dNewCntr = dCntr; break; /* 5, center */
 case 80: dNewCntr = dLowerMid; break; /* down */
 case 81: dNewCntr = dLowerRight; break;/* down/right */
 case 77: dNewCntr = dRightMid; break; /* right */
 default: putch('\a'); ich = 0; /* invalid key */
 }
 if (ich) { /* OK, center was re-positioned */
 DisplayToPlane(xfmArray, &dNewCntr, &pNewCntr);
 UnMapStereo(&displayCenter, &pNewCntr, &displayCenter);
 }
 }
 } while (ich == 0); /* i.e. until valid input was obtained */
 }
 }

/* ---- Traverse the Line Index and Display Close Line Segments ---- */
void drawDataLines(FILE *fpIndex, FILE *fpCoordinates) {
 struct indexRec indexRec;
 struct lclpt shortVx;
 struct vct3 sphereVx;
 struct plpt stereoPlaneVx;
 struct dpxl displayVx;
 double s;
 int i;
/* ---------------------------------------------------------------- */

 rewind(fpIndex); /* whole index will be searched sequentially */

 while (fread(&indexRec, sizeof(struct indexRec), 1, fpIndex)) {

 if (kbhit()) if (getch() == 32) break; /* space bar, break */

/* Skip line segments which are so far away from the display
 that they can't have any vertices in it... */

 if (ArcDist(&displayCenter, &indexRec.center)
 > maxDispDist + indexRec.radius) continue;

 s = indexRec.radius / LC_SCALE;
 fseek(fpCoordinates, indexRec.fileOffset, SEEK_SET);

 for (i = 0; i < indexRec.vertexCount; i++) {
 fread(&shortVx, sizeof(struct lclpt), 1, fpCoordinates);
 stereoPlaneVx.est = s * (double)shortVx.est;
 stereoPlaneVx.nrt = s * (double)shortVx.nrt;
 UnMapStereo(&indexRec.center, &stereoPlaneVx, &sphereVx);

 MapStereo(&displayCenter, &sphereVx, &stereoPlaneVx);
 PlaneToDisplay(xfmArray, &stereoPlaneVx, &displayVx);
 if (i == 0) _moveto(displayVx.x, displayVx.y);
 else _lineto(displayVx.x, displayVx.y);
 }
 }
 }

/* ---- Display Latitude/Longitude "Rectangles" in the Display ---- */
#define RCT_LAT_DEG 10 /* Rectangle extent in latitude... */
#define RCT_LNG_DEG 15 /* ... and longitude, in degrees */
#define RCT_HALFDIAG_DEG 9 /* Rect. center-to-corner distance */
#define RCT_LAT_RAD (DEG2RAD * RCT_LAT_DEG) /* same in radians */
#define RCT_LNG_RAD (DEG2RAD * RCT_LNG_DEG)

void drawGrid(int densityLat, int densityLng) {
 struct ltln llVx;
 struct vct3 sphereVx;
 double sLat, sLng;
 int i, j, k;
/* ---------------------------------------------------------------- */

 for (i = -80; i <= 80; i += RCT_LAT_DEG) {
 for (j = -180; j < 180; j += RCT_LNG_DEG) {
 llVx.lat = i * DEG2RAD + 0.5 * RCT_LAT_RAD;
 llVx.lng = j * DEG2RAD + 0.5 * RCT_LNG_RAD;
 LatLongToDcos3(&llVx, &sphereVx); /* grid rectangle center */

 if (ArcDist(&displayCenter, &sphereVx)
 > maxDispDist + DEG2RAD * RCT_HALFDIAG_DEG) continue;

 sLat = (RCT_LAT_RAD / densityLat);
 sLng = (RCT_LNG_RAD / densityLng);
 for (k = 0; k < densityLat; k++) {
 llVx.lat = i * DEG2RAD + k * sLat;
 llVx.lng = j * DEG2RAD;
 drawGridSegment(&llVx, sLng / 4, RCT_LNG_RAD, PARALLEL);
 }
 if (i == 80) continue;
 for (k = 0; k < densityLng; k++) {

 llVx.lat = i * DEG2RAD;
 llVx.lng = j * DEG2RAD + k * sLng;
 drawGridSegment(&llVx, sLat / 4, RCT_LAT_RAD, MERIDIAN);
 }
 }
 }
 }

/* ---- Display a Segment of a Meridian or a Parallel in Short Steps ---- */
void drawGridSegment(const struct ltln *llVxStart,
 double step, double maxDist, int m) {
 struct ltln llVx;
 struct vct3 sphereVx;
 struct plpt stereoPlaneVx;
 struct dpxl displayVx;
 double d = 0.0;
/* ---------------------------------------------------------------- */

 LatLongToDcos3(llVxStart, &sphereVx);
 MapStereo(&displayCenter, &sphereVx, &stereoPlaneVx);
 PlaneToDisplay(xfmArray, &stereoPlaneVx, &displayVx);
 _moveto(displayVx.x, displayVx.y);
 do {
 d += step;
 if (d + 1.0e-10 > maxDist) d = maxDist;
 llVx.lat = llVxStart->lat + (m * d); /* <m> is 0 or 1 */
 llVx.lng = llVxStart->lng + ((1 - m) * d);
 LatLongToDcos3(&llVx, &sphereVx);
 MapStereo(&displayCenter, &sphereVx, &stereoPlaneVx);
 PlaneToDisplay(xfmArray, &stereoPlaneVx, &displayVx);
 _lineto(displayVx.x, displayVx.y);
 } while (d != maxDist);
 }





























December, 1992
SOUND AS A DATA TYPE


QuickTime makes it easy




Aaron E. Walsh


Aaron is cofounder and CEO of Fore-Runner Development and Consulting Corp., a
Boston-based software firm. He can be reached at 1357 Washington St., West
Newton, MA 02135.


Programming for sound has never been simple. In the Macintosh world, for
instance, programmers used to face the onerous task of programming the Sound
Driver which was, to be kind, a challenge. Nevertheless, until Apple released
the Sound Manager in System 6.0.2, there were no alternatives.
Although difficult to master, the sound manager was a dramatic improvement
over previous methods of sound programming. You still had to have an intimate
knowledge of complex sound-data structures for anything other than simply
playing a sound at preset volumes. Playing a sound asynchronously, altering
the playback volume, or even playing two different sounds at the same time
often required more effort than it was worth.
QuickTime, Apple's system-wide architecture for handling dynamic
data--animation, video, and audio--goes beyond the Sound Manager by defining a
data type called a movie for dealing with dynamic data. A movie is simply a
structure containing 0 or more data streams, known as tracks. Tracks reference
(via pointers) a single-track media, which in turn references the raw data
samples comprising a movie. Figure 1 illustrates the basic QuickTime movie
structure. (For more information on QuickTime movies, see "Programming
QuickTime," DDJ, July 1992.)
QuickTime 1.0 is incapable of directly playing the popular sound resource
(snd) data type developers rely on as the Macintosh sound format standard. To
do so, snd resource data structures must be manually extracted into the
QuickTime sound format. Documentation for converting snd data types to the
QuickTime sound format is nonexistent as of this article. The code and article
presented here demonstrate a technique for converting traditional and sound
resources to QuickTime sound data samples.
QuickTime itself does not provide direct support for specific media types.
This is the responsibility of media handler components, to which QuickTime
provides a number of interface routines. When accessing and manipulating movie
data via media handlers, you are not required to have detailed knowledge about
the data itself. In this respect, graphic and audio elements are treated as
data types for which a number of standard routines are provided.
Compared to the original Sound Driver and Sound Manager routines, QuickTime
makes sound management on the Macintosh amazingly simple:
To retrieve the current volume of a given track, use GetTrackVolume(). * To
change the volume of a sound track, call SetTrackVolume().
To play multiple sound tracks at one time, simply activate the desired tracks.
Once activated, the track's media handlers take over: The data samples are
loaded from disk, sound buffers are automatically created and maintained, and
the sounds are played asynchronously.
When playing QuickTime movies, you don't need to know about the data
structures in a movie, only the standard mechanics of playing it. Recording a
QuickTime movie is another story, however, because you need to know about
those data structures (like sound) you wish to record.


Sound Description Record


To add sound to a QuickTime movie, the sound-media handlers must be given a
description of the sound data for which they will be responsible.
A Sound Description record contains information such as sample size, sample
rate, compression details, and so on. Figure 2 details the sound-description
structure. When you provide raw data samples and a corresponding description
of that data, QuickTime can add an audio track to any movie.
Figure 2: QuickTime Sound Description record.

 struct SoundDescription
 {
 long descSize; /* total size in bytes of this SoundDescription
 structure */
 long dataFormat; /* describes sample encoding technique (i.e.,
 off-set binary, twos-complement) */
 long resvd1; /* reserved by Apple. Set to 0 in your sound
 descriptions */
 short resvd2; /* reserved by Apple. Set to 0 in your sound
 descriptions */
 short dataRefIndex; /* reserved. Set to 1 in your sound descriptions
 */
 short version; /* reserved by Apple. Set to 0 in your sound
 descriptions */
 short revlevel; /* reserved by Apple. Set to 0 in your sound
 descriptions */
 long vendor; /* reserved by Apple. Set to 0 in your sound
 descriptions */
 short numChannels; /* number of channels of sound. Set to 1 for
 monaural, set to 2 for stereo */
 short sampleSize; /* number of bits per sample. Set to 8 for 8-bit
 sound, 16 for 16-bit sound */
 short compressionID; /* reserved. Set to 0 in your sound descriptions
 */
 short packetSize; /* reserved. Set to 0 in your sound descriptions

 */
 Fixed sampleRate; /* rate at which samples were obtained. Field
 contains an unsigned, fixed point number. */
 };

You're responsible for filling in the sound description structure, for which
the fields in Table 1 must be set corresponding to the original audio sample;
the remaining fields are reserved for use by Apple. Set each of these to 0,
with the exception of dataRefIndex, which should be set to 1; see Figure 2.
Table 1: Fields of the sound-description structures.

 Function Description
 ----------------------------------------------------------------------

 descSize Total size, in bytes, of the SoundDescription structure.
 dataFormat Describes sample encoding technique (off-set binary or
 twos-complement, for example).
 numChannels Indicates the number of sound channels used by the sound
 sample.
 sampleSize Number of bits in each sound sample.
 sampleRate Rate at which sound samples were captured.

Listing One (page 102) details how to create a sound-description record based
on the data structures of a sound (snd) resource. This technique is an
effective means of bridging the sound-format incompatibility between QuickTime
and the Sound Manager.
On a computer, sound data is stored as a series of digital samples, each
specifying the amplitude of the sound at a given time. This format is commonly
known as pulse-code modulation (PCM).
The amplitude values in a sample are typically encoded in one of two ways:
offset-binary or twos-complement:
Offset-binary. Amplitude values are represented by an unsigned number, with
the midpoint representing silence. For example, sample values stored in
offset-binary for an 8-bit sample would range from 0 to 255 with the midpoint
(128) specifying silence (no amplitude). Samples stored as Macintosh sound
resources (snd resource types) are stored using this technique.
Twos-complement. Amplitude values are represented using a signed integer, with
0 representing silence. An 8-bit sample stored in twos-complement format would
have sample values ranging from -128 to 128, with 0 specifying silence (no
amplitude). Audio Interchange File Format (AIFF) sounds used by the Sound
Manager are stored using this technique.
QuickTime also supports the Macintosh Audio Compression and Expansion (MACE)
encoding techniques, yielding compression ratios of up to 6:1. These
compression techniques save a considerable amount of disk space and RAM
consumption, though not without a penalty in playback speed and quality.
The number of bits used to encode the amplitude value for each sample is known
as the sample size. A larger sample size provides more bits to encode the
amplitude value. Hence, the size of a sample corresponds directly to the
quality of sound. The standard Macintosh hardware is limited to 8-bit samples,
where QuickTime supports up to 32-bit samples. When playing larger samples on
a standard Macintosh, QuickTime converts the samples to 8-bit format for
compatibility.
Also influencing the quality of sound is the sample rate. The sample rate
represents the number of samples captured in one second. A higher sample rate
can more accurately describe the original sound waveform. The standard
Macintosh hardware is capable of an output sampling rate of 22.2545 KHz, where
QuickTime supports up to 65.535 KHz. QuickTime will convert higher sample
rates to accommodate the playback hardware.
Sound can be added to a QuickTime movie in real time, in which case an audio
digitizing device is required to convert the analog sound wave into its
digital representation. Alternately, sound can be added directly from a disk
file.


Creating a Sound Track


The code in Listing One demonstrates the technique of taking an existing sound
resource (snd resource type) from a disk file and incorporating it into a
QuickTime movie as a new sound track. The user is prompted to select a file
containing the sound resource from which data samples will be extracted, and
prompted again to select the movie file into which the sound will be added.
ConvertSndToMovie() opens the selected movie file and adds a new track for the
sound data. Using the sound-description structure created by
CreateSndDescriptor(), ConvertSndToMovie() then adds sound samples to the
track media.
CreateSndDescriptor() creates a sound-description structure based on data in
the original sound (snd) resource. This routine handles only type 1 snd
resources containing uncompressed raw data stored in offset-binary format.
While the majority of sound resources fall into this category, you may wish to
enhance the code to handle MACE compressed resources.
Once the sound track has been added to the movie file, execution of the
application is terminated. The original sound resource and its associated file
remain unchanged, while the QuickTime movie now has a new sound track. The
sound samples are saved directly into the movie file itself, although
QuickTime allows data samples to be saved as an external file as well.


Manipulating QuickTime Sound


Volume and balance are the two main attributes of sound in QuickTime.
Essentially, QuickTime treats sound as a data type. A variety of routines are
provided which simplify the interaction between the programmer and the data.
Among these are:
GetMovieVolume(), SetMovieVolume(). The current volume setting determines the
overall loudness of a movie's sound tracks. The current volume setting is lost
when the movie is closed and saved to disk. Use these routines to retrieve and
alter the current volume setting.
GetMoviePreferredVolume(), SetMoviePreferredVolume(). When opened from disk,
the current volume is set to the preferred volume setting. The preferred
volume is saved to disk, and may be changed with this routine. Use these
routines to retrieve and alter a movie's preferred volume setting.
GetTrackVolume(), SetTrackVolume(). Each track in a movie has an independent
volume setting. Use these routines to retrieve and alter the volume setting
for a particular track.
GetSoundMediaBalance(), SetSoundMediaBalance(). The balance setting determines
the mix of audio between a computer's two speakers. Use these routines to
retrieve and alter the audio balance of a particular media. This setting is
stored in the movie file when the movie file is saved to disk.


Conclusion


QuickTime sound support is simple, once the sound data has been successfully
saved into the movie format. The current version of QuickTime, Version 1.0,
does not provide an API for converting snd sound resources or AIFF sound files
to the movie sound format. Ideally, adding sounds to a QuickTime movie should
require no more effort than simply specifying the sound file to play. Until
this is possible, the developer must devise techniques (such as the one
presented here) for converting the non-QuickTime sound formats into the
QuickTime format.
With QuickTime, Apple has come a long way towards making the sound-data
structure a standard data type on the Macintosh. However, the issue of
interformat sound-data exchange remains, though I find it difficult to
complain when much of the headache of sound manipulation has been resolved by
QuickTime.
_SOUND AS A DATA TYPE_
by Aaron E. Walsh


[LISTING ONE]


/************************************************************************/
/* AddSoundTrackToMovie.c -- QuickTime program demonstrating how to add */
/* a sound track to a movie using data found in an snd sound resource. */
/* by Aaron E. Walsh Developed using Symantec Think C 5.0 and */
/* Apple QuickTime headers */
/************************************************************************/

#include <Types.h>
#include <Dialogs.h>
#include <Sound.h>
#include <Movies.h>
#include <MoviesFormat.h>

#define defaultChunkSize 32768

void main(void);
OSErr InitMacManagers(void);
void doSoundToMovie(void);
OSErr OpenSndFile(short vRef, Str255 fileName, short *rRef,
 Handle *soundHandle);
void ConvertSndToMovie(short destVolume, Str255 destName, Handle sndH,
 long chunkSize, Boolean CreateFile);
OSErr CreateSndDescriptor(Handle sndHandle, SoundDescriptionPtr sndDescriptP,
 unsigned long *dataOffset,unsigned long *frameCount);

/***** main() -- simple main event loop ******/
void main()
{
 OSErr error;
 error = InitMacManagers();
 if (!error)
 {
 doSoundToMovie(); /* focus of this article */
 ExitMovies(); /* exit Movie Toolbox when done */
 }
} /*main*/

/***** InitMacManagers() -- Initialize Toolbox Managers used here app. *****/
OSErr InitMacManagers()
{
 OSErr error;
 InitGraf(&qd.thePort);
 FlushEvents(everyEvent,0);
 InitWindows();
 InitDialogs(nil);
 InitCursor();

 MaxApplZone();

 /* attempt to initialize the Movie Toolbox: a better approach */
 /* would be to use the Gestalt Manager, as detailed in my */
 /* July 1992 DDJ article "Programming QuickTime" */

 error = EnterMovies();
 return (error);
} /*InitMacManagers*/

/**** doSoundToMovie() -- simple interface: select sound file & add it as a */

/* QT sound track ****/
void doSoundToMovie()
{
 Point where;
 SFReply userReply;
 Handle soundResourceHandle = nil;
 short resourceRefNum = 0;
 OSErr soundFileSelectErr, movieFileSelectErr;
 SFTypeList movieFileTypes;


 soundFileSelectErr=movieFileSelectErr = 1; /* initialize error flags */

 /* select sound file: */
 SFGetFile(where, "\p", nil, -1, nil, nil, &userReply);
 if (userReply.good)
 soundFileSelectErr = OpenSndFile (userReply.vRefNum,
 userReply.fName,
 &resourceRefNum,
 &soundResourceHandle
 );
 /* select movie file: */
 movieFileTypes[0] = 'MooV';
 SFGetFile(where, "\p", nil, 1, movieFileTypes, nil, &userReply);
 movieFileSelectErr = !userReply.good;

 if (!soundFileSelectErr && !movieFileSelectErr)
 ConvertSndToMovie(userReply.vRefNum, userReply.fName,
 soundResourceHandle, defaultChunkSize, false);
 else
 SysBeep(1); /* beep to indicate conversion wasn't done */
} /*doSoundToMovie()*/

/** OpenSndFile() -- open given file and retrieve first snd resource found **/
OSErr OpenSndFile(short volRefNum, Str255 fileName, short *refNum,
 Handle *soundHand)
{
 FSSpec fileSpec;
 OSErr error;
 short saveResFile;

 /* create file specification record: */
 error = FSMakeFSSpec(volRefNum, 0, fileName, &fileSpec);
 if (error) DebugStr("\pFSMakeFSSpec Failed");

 if (*refNum)
 {
 CloseResFile(*refNum); /* close file once spec is created */
 if (error = ResError()) DebugStr("\pCloseResFile Failed");
 *refNum = 0;
 }
 *refNum = FSpOpenResFile(&fileSpec, fsRdPerm); /* open using spec */
 if (*refNum < 0) DebugStr("\pFSpOpenResFile Failed");

 saveResFile = CurResFile(); /* save current resource refnum */
 UseResFile(*refNum); /* switch to our newly opened file */
 if (error = ResError()) DebugStr("\pUseResFile Failed");

 *soundHand = Get1IndResource('snd ', 1); /* get first snd resource */

 error = ResError();
 UseResFile(saveResFile);

 if (!*soundHand)
 error = -1; /* return error if sound handle is nil */
 return (error);
} /*OpenSndFile*/

/* ConvertSndToMovie()--create new movie sound track from snd resource data */
void ConvertSndToMovie(short destVolume, Str255 destName, Handle soundHand,
 long chunkSize, Boolean CreateFile)
{
 FSSpec destSpec;
 short resID = 1;
 short resRefNum;
 OSErr theErr;
 Track sndTrack = 0;
 Media sndMedia = 0;
 Movie theMovie;
 unsigned long sndSize, bytes, dataOffset, numSampleFrames;
 unsigned long totalSamples, chunkSamples, samples;
 unsigned long bytesPerFrame, samplesPerFrame;
 SoundDescription **descH;
 SoundDescription *sndDescP;

 descH = (SoundDescription **) NewHandleClear( sizeof(SoundDescription));
 if (!descH) DebugStr("\pCould not get description handle");

 sndDescP = *descH;

 CreateSndDescriptor(soundHand, sndDescP, &dataOffset, &numSampleFrames);
 bytesPerFrame = 1;
 samplesPerFrame = 1;

 theErr = FSMakeFSSpec(destVolume, 0, destName, &destSpec);
 if (theErr == fnfErr) theErr = 0;
 if (theErr) DebugStr("\pFSMakeFSSpec Failed");

 theErr = OpenMovieFile(&destSpec, &resRefNum, fsRdWrPerm); // ALTERED!
 if (theErr) DebugStr("\pOpenMovieFile Failed");

 resID = 0;
 theErr = NewMovieFromFile(&theMovie,resRefNum,&resID,destName,0,0L );
 if (theErr) DebugStr("\pNewMovieFromFile Failed");

 /* Now put it into a track */
 sndTrack = NewMovieTrack(theMovie, 0, 0, 255);
 if (theErr = GetMoviesError()) DebugStr("\pNewMovieTrack Failed");

 sndMedia = NewTrackMedia(sndTrack, SoundMediaType,
 ((unsigned long)(*descH)->sampleRate) >> 16, nil, 0);
 sndSize = GetHandleSize(soundHand) - dataOffset;

 bytes = numSampleFrames * bytesPerFrame; /* number of bytes
 we expect in file */
 if (bytes > sndSize) /* sample too large */
 {
 SysBeep(1); /* give a beep to indication error occured */
 numSampleFrames = sndSize / bytesPerFrame;

 }
 totalSamples = numSampleFrames * samplesPerFrame; /* samples in file */

 if (!chunkSize)
 chunkSamples = totalSamples; /* all in one chunk */
 else
 chunkSamples = (chunkSize * samplesPerFrame) / bytesPerFrame;
 /* get size of chunk in samples */
 theErr = BeginMediaEdits( sndMedia );
 if (theErr) DebugStr("\pBeginMediaEdits Failed");
 while (totalSamples)
 {
 samples = totalSamples; /* samples in chunk */
 if (samples > chunkSamples)
 samples = chunkSamples;
 bytes = (samples * bytesPerFrame) / samplesPerFrame;
 /* bytes in chunk */
 theErr = AddMediaSample( sndMedia, soundHand, dataOffset,
 bytes, (TimeValue) 1, (SampleDescriptionHandle)
 descH, samples, 0, 0);
 if (theErr) DebugStr("\pAddMediaSample Failed");
 dataOffset += bytes;
 totalSamples -= samples;
 }
 theErr = EndMediaEdits( sndMedia );
 if (theErr) DebugStr("\pEndMediaEdits Failed");

 theErr = InsertMediaIntoTrack(sndTrack,0L,0L,
 GetMediaDuration(sndMedia), 0x10000);
 if (theErr) DebugStr("\pInsertMediaIntoTrack Failed");

 /* Write out the movie */
 DisposHandle((Handle) descH);
 if (CreateFile)
 {
 theErr = AddMovieResource( theMovie, resRefNum, &resID, destName );
 if (theErr) DebugStr("\pAddMovieResource Failed");
 }
 else
 {
 theErr = UpdateMovieResource( theMovie, resRefNum, resID, destName );
 if (theErr) DebugStr("\pUpdateMovieResource Failed");
 }
 theErr = CloseMovieFile( resRefNum);
 if (theErr) DebugStr("\pCloseMovieFile Failed");
 DisposeMovie(theMovie);
}/*ConvertSndToMovie*/

/**** CreateSndDescriptor() -- create a Sound Descriptor record from */
/* snd resource data ****/
OSErr CreateSndDescriptor(Handle sndHandle, SoundDescriptionPtr sndDescriptP,
 unsigned long *dataOffset,unsigned long *frameCount)
{
 short *i, synthCount, sndCommandCount;
 SndCommand *SoundCommand;
 SoundHeaderPtr sndHeadPtr;

 HLock (sndHandle);
 i = (short *) *sndHandle; /* get first word of snd resource */


 if (*i != 1) /* deal only with format 1 snd resource */
 return (-1); /* return error (-1) for other snd types */
 i++;
 synthCount = *i; /* count of modifiers/synthesizers in resource */
 i++;
 i += (synthCount * (1 + 2)); /* jump over modifiers/synthesizers, */
 /* so we can get at the sound commands */
 sndCommandCount = *i; /* count of sound commands in resource */
 i++;
 SoundCommand = (SndCommand*) i; /* get reference to 1st sound command */

 sndDescriptP->descSize = sizeof(SoundDescription);
 /*size of SoundDescription*/
 sndDescriptP->resvd1 = 0; /* reserved by Apple */
 sndDescriptP->resvd2 = 0; /* reserved by Apple */
 sndDescriptP->version = 0; /* data version */
 sndDescriptP->revlevel = 0; /* codec version */
 sndDescriptP->vendor = 0; /* codec vendor */
 sndDescriptP->compressionID = 0; /* sound compression 0=none*/
 sndDescriptP->packetSize = 0; /* compression packet size,*/
 /* 0 if not compressed */
 sndDescriptP->numChannels = 1; /* number of channels of sound*/
 sndDescriptP->sampleSize = 8; /* number of bits per sample */
 sndDescriptP->dataFormat = 'raw '; /* uncompressed */

 /* coerce in order to access sample rate, frame count, & data offsets*/
 sndHeadPtr = (SoundHeaderPtr) (*sndHandle + SoundCommand->param2);
 sndDescriptP->sampleRate = sndHeadPtr->sampleRate;
 /*sample rate of data*/
 *frameCount = sndHeadPtr->length; /* number of frames */
 *dataOffset = sndHeadPtr->sampleArea - *sndHandle; /* offset to data */

 HUnlock(sndHandle);
} /*CreateSndDescriptor*/



























December, 1992
PERSISTENT OBJECTS IN C++


Cooperation among classes using a persistent-object database


 This article contains the following executables: PARODY.ZIP


Al Stevens


Al is a contributing editor to DDJ and can be contacted through the DDJ
offices or on CompuServe at 71101,1262.


This article describes a method for adding persistent objects to C++ programs
by deriving applications classes from a persistent base class. Persistence is
not part of the C++ language, so every application must deal with it in one
way or another. The techniques described here represent a persistent-object
database manager I've implemented as a class library. Because of space
constraints, that library is only available electronically; see
"Availability," page 5.
To understand persistent objects we must first agree on what an object is. In
C++, an object is a declared instance of a data type. The object can be an
instance of one of the C++ primitive data types, such as int or long, or it
can be an instance of an abstract data type, defined as a C++ class. When you
declare an object, it comes into scope, bare except for the memory it occupies
and the constructor effects of any data-member initializers you provide. Its
contents might change significantly while it is in scope because your program
might do things to change it. When the object goes out of scope, its memory is
returned to the C++ free store, and the object ceases to exist. The object is
not persistent because the changes are not in evidence the next time your
program, or any other program, declares the same object.
A persistent object would retain its content from instance to instance. You
could declare a persistent object, change it, and let it go out of scope. The
next time you declared another instance of the same object, the object would
reflect the changes you made. This concept implies that the object system uses
an object database to store and retrieve objects and that such a database
knows how to retrieve and save specific objects when they are constructed and
destroyed.
Just as not every variable in the traditional program is in the database, not
every object in an object-oriented program will be persistent. You would not
want every class to include the persistent attribute. If every object were
persistent, the object database would grow unnecessarily large as it gathered
copies of dates, strings, complex numbers, and so on, that programs declared.
Instead, the persistent attribute should apply only to those objects whose
values the application needs to retain.


Nonpersistent C++


The C++ language does not support intrinsically persistent objects. In fact,
it does not support the notion that an object has its own identity--that a
subsequent declaration of a previously declared and destroyed object is, in
fact, the same object at all. It is unlikely that C++ will ever have
persistent objects as a part of the language. Type identification is, however,
being considered by the ANSI X3J16 committee. Although type identification
does not necessarily distinguish individual objects of a class, it can
distinguish classes from one another, which is necessary to an object-database
manager that must store and retrieve objects of different classes.


The Persistent-object Database


A number of commercial products already implement persistent C++ objects in
one form or another. I've seen several available for MS-DOS computers and
heard of others on other platforms. The objective of the approach described in
this article is not to cover all the terrain with a package that competes with
such commercial products, but to present a solution that supports most of a
programmer's requirements without an overwhelming and complex interface.
You will observe that I do not identify the technique proposed in this article
as an object-oriented database. No consensus exists on the definition of this
term. I suspect that the formal definition will accrue by default to the
format of whatever turns out to be the most successfully marketed
object-database management system. Until then, it is prudent to avoid using
the expression unless you are marketing such a system and hope to define the
technology by your success. I will call this technique simply a
persistent-object database.


Objectives of the Persistent-object Database


The persistent-object database must consist of a simple and intuitive
class-library interface. There should be a minimum number of class member
functions in the interface, and the interface should use the features of C++
in an intuitive manner. The objectives of the persistent object interface are
as follows:
Associative access. Objects must be identified by key data member values. When
the persistent-object database saves an object, it maintains an index that
associates key values with their object records in the database. When a
subsequent instance of an object of the same class specifies the same key
value, the persistent-object database will retrieve the associated object. The
database will also maintain indexes of other data members so that the program
can retrieve objects on the basis of data values other than the primary key.
Associative access is preferred over navigational access, which assigns object
identifications--identifications usually related to the object's address in
the database. The application must remember these addresses to retrieve the
objects later. Objects related to other objects of other classes would
similarly use addresses to point to their relatives. Some existing systems
work this way. Navigational access is one of the characteristics that the
relational database model strives to eliminate because of the inherent
instability in its use.
Maintenance of objects. The persistent-object database will provide methods
for the application to change and delete objects that exist in the database.
Object integrity. The persistent-object database will indicate when a declared
object does not already exist in the database. It will also refuse to add an
object to the database if another object exists with the same key value.
Class relationships. The persistent-object database will maintain
relationships between classes on the basis of key values, and it will maintain
the integrity of those relationships. If a class includes another class's
primary key as a secondary key, the classes are related. The persistent-object
database will not delete an object if other objects that are related to it
remain, and it will not attempt to relate an object to a nonexistent object.


The DBMS Connection


The aforementioned objectives sound like the same ones that have been around
for years and are supported by traditional database management systems
(DBMSs). We're breaking no new ground here, it seems. The requirements to
store, maintain, and retrieve entities of data have not been overturned simply
because we have newer ways to express programs. The object-oriented paradigm
does not invalidate years of experience with database-management technology.
To understand how an object database can leverage that experience and still
exploit the expressiveness of object-oriented design, we should examine the
problems of implementing persistent objects and some of the solutions.


Persistent C++


Inasmuch as C++ does not support intrinsic persistent objects, what features
of the language might contribute to the solution? First, we have stated that
we need to specify which classes are persistent. The C++ inheritance mechanism
takes care of that. If you derive a class from a persistent base class, the
derived class is persistent. Second, the persistent object needs to retrieve a
copy of itself when a program declares it. The C++ constructor mechanism is
the best place to handle that. Finally, the persistent object needs to write
itself back to the database when the object goes out of scope. The C++
destructor mechanism will do that. So, although C++ does not directly support
persistent objects from within the language itself, it does provide the
primitive language constructs with which we can add the feature.



Persistent C


Object persistence in C was a relatively simple matter. You defined a
structure, put some data in it, and wrote it to a disk file. A generic file
input/output function would read and write the structure by using the sizeof
operator. This method is self-adjusting when you modify the program to change
the contents of the structure. If you needed more features, you used a DBMS,
but the solution still centered around the basic flat record structure, and so
the C solution looks like Example 1.
Example 1: A C persistent object.

 struct Employee {
 /* ... */
 } Employee empl;
 fread (&empl, sizeof (struct Employee), 1, dp);

The C solution is not perfect, however. If the structure has a pointer in it,
the generic file input/output function becomes less generic. It has to know
what the pointer points to, the length of that object, and, if the pointer
points to the first member of an array, the size and number of the members.
But, except for these restrictions, persistence in C is a straightforward
procedure.


Persistent Wrinkles in C++


The C++ solution introduces a new set of problems, primarily because of the
many new kinds of things that you can put into a class. Suppose you wanted to
design a Persistent base class that would automatically manage all aspects of
persistence for any derived class. Now consider the derived class in Example
2.
Example 2: A persistent-object class.

 class Employee : public Persistent {
 EmployeeNo emplno;
 String *name;
 Department& dept;
 int promotion_ctr;
 Date *promotions;
 public:
 virtual int
 SalaryReviewPeriod();
 };

The Employee class in Example 2 has several data members:
An instance of another class, known to the application.
A pointer to an object of a class taken from a class library.
A reference to an object of one of the application's classes.
A count of the number of members in an array.
A pointer to the first member of an array.
To complicate matters, the class has a virtual function, which means that a
vtbl (virtual function table) pointer is somewhere among the other data
members. Furthermore, you do not know if the embedded objects have vtbls.
How would you design a Persistent base class that would know how to find all
the pertinent data members and write them to disk? How would it know how to
construct those data members properly to read them back in? How would it know
the size and location of the data members? How could it possibly understand
the dependent relationship between the promotion_ctr member and the promotions
member? How would it know how to get around the one or more vtbl pointers in
the class?
The answer is, of course, that you couldn't design such a Persistent base
class.


Some Solutions


How can you solve these problems? There are a number of ways, the first and
most frequently used of which is the least desirable.
Custom file input/output methods. You can forget the idea of a Persistent base
class and write custom file storage and retrieval methods for every class in
your design. This approach betrays everything we have learned about database
management in the last thirty years. The apparent contrary nature of the C++
class definition is the result of its power. It is a strength of C++ that you
can design a class that includes all things difficult to pin down in a
database manager. But that doesn't mean we should abandon the effort--only
that it is going to take some thought and work.
Extend the C++ language. You can propose to the X3J16 committee that they add
the persistent attribute to the language and that they figure out how to make
it work. You can. I won't.
Write a preprocessor. This is one way to extend C++ without getting involved
with X3J16. You can write a preprocessor program that translates your extended
C++ language into C++ code. The preprocessor would need to search all the
header files to ferret out the formats of the embedded classes. This approach
will not resolve problems such as the relationship between the counter and the
array pointer mentioned above, but it is a step closer. One commercial
product, POET from BKS Software (Cambridge, Massachusetts), uses a
preprocessor to translate class-definition extensions into C++.
Limit the scope of a persistent class. You can get around all the
aforementioned problems by laying down some rules. You can specify that to
derive from the Persistent class, an object may not use any of the C++
features that cause those problems. If your application can get by without the
features, then such an approach might work. Using sizeof in the base class
would not work, though. The sizeof operator is not polymorphic. Nonetheless,
if it suits you to limit a persistent-object class to that which is easy to
implement, you might as well use one of the relational DBMSs already
available. Read on.
Use a relational DBMS. You can build an object-database manager by putting a
C++ wrapper around an existing relational (or other) DBMS. That is a viable,
and sometimes appropriate approach. At least one commercial object-database
manager does just that. The CodeBase product from Sequiter Software (Edmonton,
Alberta) is a C function library that uses the dBase database formats. Their
C++ product consists of C++ wrapper classes around their C function library.
When should you use a relational database and when should you use an object
database? The answer lies in an analysis of the mutually exclusive strengths
and weaknesses of each and the requirements of your application. The
relational data model has some advantages that the object data model does not
support, namely:
The relational schema is stored in the database catalog.
General-purpose query programs can use the catalog.
The SELECT, PROJECT, and JOIN operators can build new database views at run
time.
The database is cross-compatible with other applications that use the same
DBMS.
Conversely, the object data model supports data representations that the rules
for the relational data model absolutely forbid:
Variable-length data members to support such applications as imagery,
multimedia, geographic data, and weather.

Abstract data types. The typical relational DBMS supports a small set of
primitive data types.
Arrays.
Encapsulation of data formats with the methods that define behavior.
Polymorphism.
Clearly, the strength of the object data model is that it supports an
object-oriented design. A designer must decide which way to go by weighing the
benefits of both approaches. If you decide that you need those relational
features not available in the object model, then read no further. You will
find, however, that the object model can emulate much of the behavior of the
relational model.
The derived class cooperates in its own persistence. This is the approach
presented in this article. Everything that the base class cannot know about
the data is known to the derived class. If the derived class provides a few
required functions that the base class calls, and if the derived class calls
specified base-class functions at specified times, the problems associated
with building a Persistent base class evaporate. The only problem that remains
is to design an interface to the persistent-object database that is simple
enough not to clutter up the user's program.


The Persistent-object Database: Cooperation Among Classes


To cooperate with the Persistent base class, a derived persistent-object class
will provide some of the methods. To be sure that it does, the Persistent
class names them as pure virtual functions. You cannot declare a persistent
object unless those functions exist in the derived-class definition. The
Persistent class will call the functions when it needs them.
One such function in the derived class provides the class-type identifier.
Some day this function will be unnecessary. Until then, we'll have to provide
it.
The derived class's constructor calls LoadObject in the base class after all
the construction is done. That tells the base class to use the key data member
value to position the database at the object's record. Then the LoadObject
function calls the derived class's Read function, which must be provided. A
derived class will know which data members need to be read. The derived class
does not do the physical reading and writing; the Persistent base class does
that. The derived class calls the base class's ReadObject function, passing
the address of each data member and its size.
The derived class's destructor calls SaveObject in the base class before
destruction of the object begins. The base class positions the database at the
object and calls the derived class's Write function, which writes the data
members back to the database through the WriteObject function in the base
class.


The Persistent-object Class Interface


The Persistent class provides an interface to the derived class's user, the
program that declares the persistent object. The ObjectExists function tells
if the specified object was found in the database. If not, the AddObject
function tells the base class to add the object to the database when the
destructor calls the SaveObject function. The DeleteObject function tells the
base class that the user wants to delete the object from the database. The
ChangeObject function tells the base class that the user changed the object in
some way and that it should rewrite the object when the constructor calls the
SaveObject function.
The application can navigate the object database by using the FirstObject,
LastObject, NextObject, and PreviousObject methods. Calling these functions
retrieves the relative object based on the key sequence of the key that the
program used to construct the object.


Class Definition = Schema


Traditional DBMS languages include a data-definition language (DDL) that
defines the format of records in the database files and the relationships of
files to one another. The DDL is said to define the database schema. The
persistent-object database's DDL is C++ itself. You design a database by
designing classes that define the file formats and their key data members.
Example 3 is an illustration of such a design.
Example 3: A database schema.

 // -------- key department number
 class DeptNo : public Key {
 int deptno;
 // ...
 };
 // -------- department class
 class Department : public Persistent {
 DeptNo deptno; // primary key
 String *name;
 // ...
 };
 // -------- key employee number
 class EmployeeNo : public Key {
 int emplno;
 // ...
 };
 // -------- Employee class
 class Employee : public Persistent {
 EmployeeNo emplno; // primary key
 DeptNo deptno; // secondary key
 String *name;
 Date date_hired;
 Currency salary;
 public:
 // ...
 };

Two of the classes in Example 3 are derived from the Key class. This is how
you specify a key data member. Observe that the Employee class has two key
members, the EmployeeNo object and the DeptNo object. The first derived Key
object is the class's primary key. All subsequent Key objects are secondary
keys. There can be only one object of a particular class in the object
database with a given primary key value because it's the primary key value
that identifies the object. Multiple objects of the same class can share
secondary key values, however. The Department class in Example 4 has a DeptNo
member as its primary key, so there may be only one Department object with a
department number of 123, for example. The Employee class has a DeptNo member
as a secondary key. Several employees can be assigned to department number
123. This relationship is an implied one, based on the presence of those Key
data members. Because the design implies the relationship, the
persistent-object database maintains it. You will not be allowed to write an
Employee object with a nonnull DeptNo key value unless there is a
corresponding Department object with the same key value. You will not be
allowed to delete a Department object if any Employee objects include the
matching DeptNo key value.



The Persistent Class


Example 4 is a listing of a simplified Persistent class; Example 5 is a
simplified Key class. The classes in Example 3 derive from these two classes.
The actual implementation is more complex than these examples, which are pared
down to facilitate this discussion and illustrate the concept.
Example 4: The Persistent class.

 class Persistent {
 // --- object state flags
 Bool changed, deleted, newobject, exists;
 // --- allows key constructors to associate with object
 static Persistent *thispers;
 // --- key indexes
 LinkedListHead Keys;
 // --- interface for derived class
 Bool LoadObject ();
 Bool SaveObject ();
 // --- provided by class
 virtual int ClassID () = 0;
 virtual void Write () = 0;
 virtual void Read () = 0;
 void ReadObject (void *buffer, int length);
 void WriteObject (void *buffer, int length);
 public:
 Persistent(); // constructor

 virtual ~Persistent(); // destructor
 // --- interface for user of derived class
 Bool AddObject();
 Bool DeleteObject();
 Bool ChangeObject();
 Bool ObjectExists() { return exists; }
 void AddKey (Key *key);
 void FirstObject();
 void LastObject();
 void NextObject();
 void PreviousObject();
 };

Example 5: The Key class.

 class Key : public LinkedList {
 int classid;
 int keyno;
 public:
 Key ();
 virtual ~Key ();
 virtual Bool operator> (Key& key) = 0;
 virtual Bool operator==(Key& key) = 0;
 virtual void Write (fstream& bfile) = 0;
 virtual void Read (fstream& bfile) = 0;
 };

The order in which constructors and destructors run is important to the way
the persistent-object database works. When a program declares an object of
type Employee, the constructors execute in this order:
Persistent::Persistent
emplno.Key::Key
emplno.EmployeeNo::EmployeeNo
deptno.Key::Key
deptno.DeptNo::DeptNo

Employee::Employee
This sequence supports the persistent-object database. The Persistent
constructor stores the object's address (this) in the thispers pointer, which
is a static global variable that the constructors of the objects of the Key
class use to associate themselves with the Persistent object. They call the
AddKey function through that pointer, which adds the key object to a linked
list within the Persistent object. The constructor for the Employee object
completes its construction, which presumably includes putting the initialized
employee number into the EmployeeNo key member. Then the constructor calls the
Persistent class's LoadObject function to find the object in the database.
When the object goes out of scope, the destructors execute in the reverse
order. The Employee destructor, which executes first, calls the Persistent
class's SaveObject function before it does any destruction. This action either
saves the object if it is to be changed or added or deletes it if indicated.


The Key Class


Primary and secondary key classes derived from the Key class include functions
to support the index mechanism. Besides the index-key values themselves, the
derived classes supply overloaded relational operators so the index process
can compare Key objects to one another and Read and Write functions to read
and write the object's data values in the index file. Different keys may have
different lengths, but the length of any particular key for a Persistent class
must be fixed.


Physical Database Format


The Persistent class manages the persistent-object database. Besides the
functional objectives of the database earlier, the persistent-object database
must support:
Variable-length objects, even within the same class.
Fixed-length (per class) key indexes.
Automatic garbage collection when the program deletes an object or changes its
size.
Two physical files per database: object file and indexes.
The object and index files use a common file mechanism that stores data in a
linked list of fixed-length nodes. An object uses as many nodes as it needs to
hold all its data. An index fills a node with keys. When the application
deletes an object or when changes to an index release an index node, the node
or nodes are added to a linked list of deleted nodes. The next process that
needs a node takes it from this list.
The persistent-object database uses the B-tree algorithm to implement the
object indexes. Each entry in the B-tree includes the class identification,
the relative key number within the class, the key value, and the node address
where the object is stored.


Copies of Objects


The Persistent class in Example 4, has no copy constructor or overloaded
assignment operator. Neither should the classes you derive from it. Making
copies of a persistent object has its perils. The system would not know which
copy to save. Therefore, it must assure that only one copy of any particular
persistent object is in memory at a time. A C++ program copies objects in
several ways, two of which are the copy constructor and assignment. In a third
way, the program can simply declare another instance of the same object.
Rather than allow these processes to make copies of an object, the Persistent
class must take measures to prevent it.
Although Example 4, does not show it, the Persistent class uses the
handle/copy idiom to implement reference counting. The using program
instantiates the object by declaring an instance of the handle class. The
handle class contains a pointer to a copy class, which is the body of the
object and which contains all the members. Only one instance of the copy class
exists for any given object, regardless of the number of copies. The copy
class also contains a count of the current number of copies. When the first
copy of the class is constructed, the constructor uses the new operator to
create a new copy object. Each new copy object gets the address of the copy
class in its pointer, and the counter in the copy class gets incremented. When
a handle class's destructor runs, it decrements the counter. When the counter
is 0, the handle class deletes the copy object. The handle class provides the
copy constructor and overloaded assignment operator. It also monitors for the
presence of other instances of the object. Thanks to Jim Coplien for
documenting this and many other good C++ techniques in his book, Advanced C++
Programming Styles and Idioms (Addison-Wesley, 1992). As mentioned earlier, a
persistent-object database manager I've implemented is available
electronically. For a complete discussion of the class library, refer to my
book, C++ Database Development (MIS Press, 1992).
_PERSISTENT OBJECTS IN C++_
by Al Stevens


Example 1: A C persistent object.

struct Employee {
 /* ... */
} Employee empl;
fread(&empl, sizeof(struct Employee), 1, dp);



Example 2: A persistent object class.


class Employee : public Persistent {
 EmployeeNo emplno;
 String *name;
 Department& dept;
 int promotion_ctr;
 Date *promotions;
public:
 virtual int SalaryReviewPeriod();
};



Example 3: A database schema

 // -------- key department number

 class DeptNo : public Key {
 int deptno;
 // ...
 };
 // -------- department class
 class Department : public Persistent {
 DeptNo deptno; // primary key
 String *name;
 // ...
 };
 // -------- key employee number
 class EmployeeNo : public Key {
 int emplno;
 // ...
 };
 // -------- Employee class
 class Employee : public Persistent {
 EmployeeNo emplno; // primary key
 DeptNo deptno; // secondary key
 String *name;
 Date date_hired;
 Currency salary;
 public:
 // ...
 };



Example 4: The Persistent class


 class Persistent {
 // --- object state flags
 Bool changed, deleted, newobject, exists;
 // --- allows key constructors to associate with object
 static Persistent *thispers;
 // --- key indexes
 LinkedListHead Keys;
 // --- interface for derived class
 Bool LoadObject();
 Bool SaveObject();
 // --- provided by class
 virtual int ClassID() = 0;
 virtual void Write() = 0;
 virtual void Read() = 0;
 void ReadObject(void *buffer, int length);
 void WriteObject(void *buffer, int length);
 public:
 Persistent(); // constructor

 virtual ~Persistent(); // destructor

 // --- interface for user of derived class
 Bool AddObject();
 Bool DeleteObject();
 Bool ChangeObject();
 Bool ObjectExists() { return exists; }
 void AddKey(Key *key);
 void FirstObject();

 void LastObject();
 void NextObject();
 void PreviousObject();
 };



Example 5: The Key class

 class Key : public LinkedList {
 int classid;
 int keyno;
 public:
 Key();
 virtual ~Key();
 virtual Bool operator>(Key& key) = 0;
 virtual Bool operator==(Key& key) = 0;
 virtual void Write(fstream& bfile) = 0;
 virtual void Read(fstream& bfile) = 0;
 };










































December, 1992
PERSISTENCE IN A PROGRAMMING ENVIRONMENT


An object-oriented database can make the difference




Richard P. Gabriel


Dick is a Fellow at Lucid where he can be contacted at rpg@lucid.com.


Suppose you had a dog and were trying to teach it to roll over and play dead.
And further suppose the dog forgot everything you taught it whenever you left
it alone. As long as you were in front of the dog, it learned and remembered
everything you taught it--even if this lasted for months and months--but the
moment you left, he forgot it all. You'd think there was something wrong with
the dog because it couldn't remember things, and maybe you'd try to return him
for a refund or give him to some friends out in the country. If a person
suffered from this, we'd take him to a doctor or put him in a hospital.
But a lot of programs are like this: When you're running them, they build up
information about the task you're doing; when you kill the program and log
out, that information is gone. Fortunately, most programs can be this stupid.
However, in real-world applications (payroll systems, for instance), this
behavior is unacceptable because the data stored represents people or objects
that exist over time, and representations of them must persist as well.
In object-oriented programming, we come across this problem more often than in
other traditions because objects created and maintained in object-oriented
programs are more like people than like data structures--objects have state
(instance variables, slots) and behavior (methods, member functions), and a
typical program creates objects and manipulates them.
The solution is to put persistent data (data that exists longer than one
incarnation of the program that manipulates it) in an object-oriented database
that contains objects that a program creates; whenever the program is started
up, those objects are available. Unlike in a relational database, however, a
program does not need to perform queries to access those objects; instead, the
process of ordinary object access--through pointers, array access, global
variables, and the like--serves to access these persistent objects. In this
sense, persistence is a characteristic of an object rather than a process for
accessing objects.
When my group first started working on a programming environment for C and
C++, there was no commercially available way to add persistence to objects.
This article tells what we were trying to do, why we needed persistence, our
first attempt at adding persistence, and finally, our solution. We'll show the
situations in which persistence makes sense, how to implement a simple
roll-your-own persistence, and the advantages of a commercial solution.


The Context: A Programming Environment


We initially set out to build a C and C++ programming environment based on an
architecture of a central object repository connected to a variety of
programming tools. We call the repository the environment server (server, for
short), and the objects are kept persistently in an object-oriented database.
The server contains objects that represent things in the programming domain:
source code, object files, cross references, (hypertext-like) links to
documentation, user-written notes, and semantic analyses of the source. A
special kind of object called an annotation links semantically related parts
of the source and serves as hypertext. Unlike other environments that
represent parts of programs as objects, our environment operates on source
code generated by an ordinary text editor rather than a structure editor.
In the environment, a piece of source text is represented by an object that
describes the semantic relations of that source to other parts of the source.
In other words, the environment tries to mimic the actions of a programmer
taking a hardcopy listing of the program, circling parts of the source in red
pen, and drawing labeled lines to other circled parts of the source or to
documentation.
For example, a C++ function might be annotated as using the external interface
of certain classes and calling certain functions. It's as though a function
were circled in red, with a line connecting it to the definition of, say, a
public member function.
In addition, that C++ function might be annotated with the portion of the
specification to which it corresponds and an informal note about the state of
debugging. Each of these things is an object, even the links representing
relations.
Among the tools connected to the server is a compiler that sends messages to
the server about the semantic contents of the source code. The compiler
partitions the original source into meaningful sections and labels them. An
object in the server is created for each section of partitioned source text,
and for each labeled line. Objects that represent labels are called language
elements; those that represent portions of the source code are called source
elements.
The annotations referred to are also objects with attached methods. These
annotations will figure into the user interface, but they must be persistent.
The user-interface part of the environment comprises a group of presentation
tools that display information in textual, graphical, or mixed representations
(text editors, graphers, and browsers, for example). In our environment, each
presentation tool can: receive objects from the server for display; display
objects; display the appropriate associated annotations; and engage the user
in a dialog mediated by the server. This dialog shows the user the operations
possible on an object and its annotations, notes the user's selection, and
sends the result back to the server for action.
For example, a text editor is sent the text representing a C function along
with annotations that represent its associated information. The text editor
must display the source along with the "red circles and labeled lines." In our
case, the text editor just makes the annotated text mouse sensitive and puts a
special icon next to it. A region of text is mouse sensitive if moving the
mouse over it highlights the region and a mouse-click brings up a menu.
When the user clicks the mouse on annotated text, the text editor opens a
dialog with the user about the object represented by the text. From the user's
perspective, the editor pops up a menu for the object and the annotations. The
menu offers such choices as compiling the function (if that's what the section
represents) or following the annotation to its destination. Suppose the user
chose to look at the call graph starting at the function in the section. The
environment would bring up a grapher with the selected function at the root
along with the called functions.
Behind the scenes, the environment does the following: When the user indicates
that dialog is required, the editor sends a message to the server which
computes the list of operations available and sends it back to the editor. The
editor then offers the choices, the user selects one, and the editor sends the
choice to the server, which invokes the operation. In this case, the operation
selected is invoking the call grapher. It is executed by the environment
server, which sends nodes and arcs to the grapher tool, along with the
annotations to display.
When the grapher displays a node that represents a function, it is displaying
the same object as the text editor. In both tools, all the annotations are
displayed and the same operations can be performed on the displayed object,
because each tool simply communicates user-initiated requests to the server,
which computes the response independently of the tool. The only difference is
in the presentation.
If the user alters the information displayed for the object by using a tool
(for example, by editing the text with the text editor), those changes must
percolate back to the object, causing a side effect. Thus, the text editor
seems to operate knowledgeably on source code, although in fact it can only
display active regions and engage in what is to it a meaningless dialog.
Therefore, the program is strongly modularized, with only narrow communication
between the parts.


The Problem


The problem we faced is this: The compiler produces a large number of language
elements, each representing some part of the user's program. Furthermore,
annotations, which are objects, are created to represent the network of
relationships between the language elements and between the language and
source elements. This information is used to drive incremental compilation
(the relationships stored about dependencies are used to determine what needs
to be recompiled when a change to the source is made) and the browsers. Users
want to access this information without going through a lengthy importation or
startup process, and the environment's knowledge of a program is viewed
through objects whose identities are shared by various presentation tools.
Therefore, it would make the most sense for the user to simply never exit the
environment and never log out.
Of course, people must log out and exit their programs, so each of these
objects--language elements, source elements, and annotations--must be made
persistent. This requires each object to reside in a file external to the
program. Because the number of objects could be large (we estimated that the
size of the file containing the persistent objects would be about as large as
the a.out file for the entire program), we did not think it reasonable to read
in the entire file at program startup. For instance, we wanted the environment
to be able to work on itself, which implied that the running server would be a
dozen or so megabytes. Since most of these objects would rarely be touched, it
made more sense to think in terms of paging the objects in and out, much like
a virtual-memory system.
We wanted to impose a further requirement: that the existence of persistent
objects be invisible to a certain level of client code. Here's what I mean:
Suppose I am writing a program to search through a network of objects that
have pointers to other objects. Then I do not want anything in my code to be
contingent on persistent objects. In fact, I want no evidence at all that
objects are persistent. In particular, I don't want to call special functions
to follow pointers or access parts of the object.
When we decided to work on this problem no commercial object-oriented database
existed, so we rolled our own persistent-object system using a commercial ISAM
(indexed sequential access mechanism).


The Solution


First, the class hierarchy for persistent objects must be determined, and
within the environment, objects that "know" their class must be created. That
is, class_of(obj) should return at run time something that represents the
class of which obj is an instance. To achieve this, we defined a class called
classed_object that represents all such objects. The class we're interested
in--pos_object--is a subclass of classed_object. The class res_pos_object is a
subclass of pos_object that represents objects allocated out of a resource
pool of such objects. (This allocation technique can also be good for solving
certain performance problems.) Figure 1 illustrates this hierarchy.
Figure 1: The persistence hierarchy.

 classed_object-->pos_object-->res_pos_object

In some situations, you have a type of object or data structure you both
create and destroy frequently. General-purpose memory-allocation
routines--malloc and free, garbage collection, and the like--are often tuned
for infrequent allocations. For high-frequency allocation and deallocation, it
is usually better to keep a resource--a free list--of storage that can be used
for allocating objects of a particular kind. When allocating, the client
program takes a free piece of storage off of the free list; when deallocating,
it returns that storage to the free list. Suppose you were going to frequently
allocate and deallocate vectors of length 3. If you had a list of length-3
vectors, allocating one would just take one or two instructions rather than a
system call; returning the vector would involve a similar number of
instructions to append it onto the free list.
Writing methods for new and delete implements the special behavior. Of course,
the free list is initially empty, so if new needs to make a new item and none
are available to reuse, it just mallocs up the storage as usual. In this
sense, the resource is self-adjusting. The sample C++ code in Example 1
implements this behavior. Example 1(a) shows that the resources are linked
together by a linked list threaded through the resources themselves. This
structure is used to cast the resource to a form where we can uniformly refer
to the link cell as "next." The function in Example 1(b) is used to allocate
segment_size new resources when the free list runs out. The resources are
allocated in a big block (result = new char[...), then threaded into a list
from back to front (for(i = 0, p = res ...). Notice the use of casting in this
function.free_list is a data member of res_pos_class; each class object
contains the free list for new instances of itself. The function in Example
1(c) takes an object and returns it to the free list. This function uses
casts, too, and is inlined because it is called from exactly one place and
eliminates cast to void*. Example 1(d) is the overload of new. It first checks
whether there are any free resources; if not, it creates some; if so, it just
takes the first of them. The class of the instance is passed in as an
argument, so a call to this new would look like Example 1(e). Example 1(f) is
the overload for delete; it just puts the object back in the resource.

Example 1: Resource allocation.

 (a)

 struct resource_cast {
 resource_cast* next;
 };

 (b)

 void
 res_pos_class::allocate_resource (int object_size){
 /* free_list is NULL on entry */
 /* On exit, a new segment of object is allocated and free_list is
 * updated
 */
 char* result = NULL;
 result = new char [segment-size * object-size];
 char* res = result;
 int i;
 char *p;
 for (i=0, p=res; i<segment_size; i++, p+=object_size) {
 ((resource_cast*)p)->next = (resource_cast*)free_list;
 free_list= (res_pos_object*)p;
 }
 }

 (c)

 inline void
 put_back_in_resource (res_pos_object* obj){
 res_pos_class* cl = (res_pos_class *)class_of(obj);
 ((resource_cast*)obj)->next = (resource_cast*)cl->free_list;

 cl->free-List = obj;
 }

 (d)

 void*
 res_pos_object::operator new (new_type s, res_pos_class& cl){
 res_pos_object* newinst;

 if (!cl.free_list)
 cl.allocate_resource(s);

 newinst = cl.free_list;
 cl.free_list = ((resource_cast*)newinst)->next;
 return newinst;
 }

 (e)

 my_instance = new (cl) a_class;

 (f)

 void
 res_pos_object::operator delete (void* obj){

 put_back_in_resource (obj);
 }

Because C++ lets us write methods for new and delete, the behavior of the
resource is transparent--you never have to write code with the knowledge that
resources are used: The methods handle that. In fact a person writing code for
the server only needs to create a subclass of res_pos_object.


The First Implementation (ISAM)


The first implementation of providing persistence was based on an ISAM, not an
object-oriented database.
The persistent-object system (POS) is a layered program, each layer having
different responsibilities. The layers, from highest to lowest level, are as
follows:
Application-programming layer.
Persistent-object class definition layer.
Database-interface layer.
Database-implementation layer.
At the application-programming layer, programmers must merely use object types
and corresponding pointer types defined at lower levels. They need never be
concerned about whether or not objects are in memory, or about explicitly
accessing the database.
At the persistent-object class definition layer, new object classes are
defined to the POS, and methods are defined for loading and storing objects in
their corresponding database file or files. The database-interface layer
provides the next-higher level with a portable functional interface to the
underlying database system. Finally, the database-implementation layer is the
underlying database system.
By defining new classes of objects that inherit from persistent classes
defined in the POS, the programmer can create smart objects and associated
classes of smart pointers, which have properties of persistence,
resource-allocation, reference-counting garbage collection, and a
least-recently used (LRU) object-swapping capability. The fundamental
implementation technique is to distribute the work between the smart-pointer
classes and the persistent-object classes to which such pointers point.
The C++ operators -> and * are defined on smart pointers to transparently
reference objects that might only be in the external database prior to the
reference. The operator = and constructors on the pointer classes are used to
manage reference-count and LRU bookkeeping information. The operators new and
delete are specialized on the persistent-object classes to implement resource
allocation of objects.
To implement persistent objects, there must be a uniform way of referring to
objects in the database. This is done by introducing a new data type for
objects: IDs. An ID can be translated into a pointer after the referent object
is brought into memory. Currently an ID is a 32-bit integer subdivided into
some bits of class information, from which the identity of the database file
can be determined, and some bits of object ID within that file. Objects stored
in the database can refer to each other by means of such object IDs. Other ad
hoc cross-referencing mechanisms can also be used in an application-specific
manner, but the object ID provides a unique handle on each object in the
database, as well as a convenient key that other objects and clients can use
to refer to that object.
All pointers to persistent objects, whether in memory or in the database, are
actually pointers to surrogates--objects used to reference other objects.
When an object is first accessed, it is read in from the database. The
object's surrogate is modified to point to the actual object, and the
surrogate is flagged to indicate that its object is in memory. A hash table
maps from object IDs to the address of the surrogate; references can be
resolved by using this hash table. When an object is brought in from the
database, its references to other persistent objects are replaced by
references to their surrogates--if the object has already been referenced, the
surrogate exists; if not, the surrogate is created.
The operators -> and * are coded as inline member functions on the smart
pointer classes to use surrogate objects and to call methods for swapping in
the nonresident objects. Example 2 shows sample C++ code that implements this
behavior, but the code is suggestive only. The class surrogate_res_pos_ptr
contains one data member, which is the union of an ID and a pointer. When p is
a pointer, it points to an object that is half-word aligned; when it is an ID,
it's an odd number. Thus, by using in_memory_p to check the low-order bit, you
can tell at run time which sort of object in the union it is. Furthermore,
since reading in an object from the ISAM is method driven, an elaborate table
lookup (buried in fault_in_object and not included here) is used to determine
the class of the object not yet in memory.
Example 2: Smart objects and pointers.

 res_pos_object*
 ensure_in_memory (small_surrogate* ss) {
 if (ss) {
 if (! in_memory_p(ss->pos_object_id))
 ss->fault_in_object();
 return ss-> pos_object_id.pos_object_ptr;
 } else
 return NULL;
 }
 res_pos_object*
 surrogate_res_pos_ptr::_operator_arrow () CONST_MEMBER{
 if (!p) return NULL;
 return ensure_in_memory(p);
 }
 surrogate_res_pos_ptr::surrogate_res_pos_ptr () {
 p = NULL;
 }
 surrogate_res_pos_ptr:: surrogate_res_ptr (res_pos_object* obj){
 p = get_surrogate (obj);
 }
 surrogate_res_pos_ptr::surrogate_res_pos_ptr
 (surrogate_res_pos_ptr CONST_REF ptrref){
 p = ptrref.p;
 }
 surrogate_res_pos_ptr::operator res_pos_object* () CONST_MEMBER{
 if (!p) return NULL;
 return ensure_in_memory (p);
 }
 res_pos_object&
 surrogate_res_pos_ptr::operator* () CONST_MEMBER{
 return ensure_in_memory(p);
 }
 res_pos_object*
 surrogate_res_pos_ptr::operator-> () CONST_MEMBER{

 return ensure_in_memory (p);
 }

Surrogates require an extra indirection compared to ordinary object
references. However, this overhead is acceptable because access methods can be
coded inline for the fast case (the object is already in memory), and because
the persistent-object class writer can determine the granularity of objects to
be stored in the database. In particular, a persistent object may turn into a
complex data structure in memory with only one persistent handle. In this case
the programmer must take care with nonsmart pointers to the internal
components of the structure.
Describing the manner in which data structures are flattened for database
storage and retrieval is difficult, so we adopted a method-driven approach.
Thus, when a new persistent-object class is defined, methods for storing and
retrieving it must be defined as well. Though more complex to use, this
approach lets the application designer tune the representation of objects in
the database, possibly distributing them over multiple files or relations or
condensing a large, in-memory data structure into one or a few database
records. The designer can also put more keys into the database representation
of objects, allowing associative access.
This layered approach initially proved effective, and the first POS system was
implemented on top of a simple sequential-access database (NetISAM and
C-ISAM). However, we soon discovered three essential problems:
Performance was not good enough when searching and compiling. Basically, the
smart-object scheme sets granularity too small; whenever the code tried to
access a particular part of the network of objects for searching, for example,
each object had to be individually read in. Although reading was method driven
and methods could be written to read in multiple objects, we never hit upon a
general method that would read in the "right" number of nearby objects--enough
when searching, not too many when accessing--with the performance we needed.
We needed to checkpoint the objects occasionally to find the "dirty" objects
(those altered since the last checkpoint) and this proved too costly at the
granularity at which we were working.
The server's working set size--the memory size required to handle all the
frequently handled objects--was too large when we tuned it for performance.
So we turned to an object-oriented database.


The Second Implementation: ObjectStore


We chose the ObjectStore database from Object Design, primarily because it
uses a page-based scheme rather than a smart-object scheme. In the page-based
scheme, new persistent objects are allocated on particular pages of memory,
and dirty pages are written out at check-points. When the program using
persistent objects is started up, these pages are associated with the process
but not read in. When the program accesses an object not in memory, the object
is paged in from the database. This causes all the other objects on that page
to come in simultaneously, so the granularity is better.
To effect the page-in, the operating system must support some level of
user-controllable paging. In SunOS, the page-fault mechanism has a hook in
which the user process (the client program with ODI's libraries loaded) is
notified of a paging request. The hook controls reading the data, which comes
from the database. The process of paging in is not trivial, since it involves
relocating the objects and knowing the class of each persistent object at run
time, just as with our simple initial scheme.
As far as client code is concerned, the use of persistent objects is once
again transparent to the programmer. The C++ code in Example 3 implements new
and delete for ObjectStore's persistent objects. Figure 2 shows the full class
hierarchy for persistent objects.
Example 3: new and delete for ObjectStore. These are the two definitions. All
other operators on persistent objects are as usual. delete just invokes the
version in the ObjectStore library.

 void*
 pos-object::operator new (size_t 1, basis_pos_class* cl){
 /* get the Objeststore type identifier for the type being allocated */
 /* We cache it in the class object for efficiency */
 os_typespes* tx = ((pos_object*) (cl ->prototype)) ->get_typespec();
 /* allocate one instance in cadillac_database */
 return ::operator new (1, cadillac_database, 1 tx);
 }

 void
 pos-object::operator delete (void* ob){
 /* just call the normal operator delete provided by Objectstore */
 ::operator delete (ob);
 }

Commercial databases accrue additional benefits:
Database locking: Databases are kept safe from several users writing to them
simultaneously.
Queries: We don't use this facility yet.
Distributed servers: Performance is improved by putting all the code for
relocation and other activities for several database clients on one dedicated
computer.
Database integrity: The database makes sure checkpoints are done correctly.


Conclusion


Consider carefully whether your application needs persistence, and, if so,
whether it requires the machinery presented here. Remember that our goals
included very fast performance for databases with hundreds of thousands of
objects, along with transparent programming for the developers--so it was
worth it for us to develop the right machinery. But sometimes you can get away
with just reading and writing data to a file, so don't let the cachet of sexy
new techniques or concepts sway you to use them unnecessarily.
_PERSISTENCE IN A PROGRAMMING ENVIRONMENT_
by Richard P. Gabriel


Example 1

(a)

struct resource_cast {
resource_cast* next;
};




(b)

void
res_pos_class::allocate_resource (int object_size){
 /* free_list is NULL on entry */
 /* On exit, a new segment of object is allocated and free_list is
 * updated
 */
 char* result = NULL;
 result = new char[segment_size * object_size];
 char* res = result;
 int i;
 char *p;
 for(i=0, p=res; i<segment_size; i++, p+=object_size) {
 ((resource_cast*)p)->next = (resource_cast*)free_list;
 free_list= (res_pos_object*)p;
 }
}

(c)


inline void
put_back_in_resource (res_pos_object* obj){
 res_pos_class* cl = (res_pos_class *)class_of(obj);
 ((resource_cast*)obj)->next = (resource_cast*)cl->free_list;

 cl->free_list = obj;
}


(d)

void*
res_pos_object::operator new (new_type s, res_pos_class& cl){
 res_pos_object* newinst;

 if (!cl.free_list)
 cl.allocate_resource(s);

 newinst = cl.free_list;
 cl.free_list = ((resource_cast*)newinst)->next;
 return newinst;
}



(e)

my_instance = new (cl) a_class;

void
res_pos_object::operator delete (void* obj){
 put_back_in_resource (obj);
}





Example 2:

res_pos_object*
ensure_in_memory(small_surrogate* ss) {
 if (ss) {
 if (!in_memory_p(ss->pos_object_id))
 ss->fault_in_object();
 return ss->pos_object_id.pos_object_ptr;
 } else
 return NULL;
}
res_pos_object*
surrogate_res_pos_ptr::_operator_arrow () CONST_MEMBER{
 if (!p) return NULL;
 return ensure_in_memory(p);
}
surrogate_res_pos_ptr::surrogate_res_pos_ptr (){
 p = NULL;
}
surrogate_res_pos_ptr::surrogate_res_pos_ptr (res_pos_object* obj){
 p = get_surrogate(obj);
}
surrogate_res_pos_ptr::surrogate_res_pos_ptr
 (surrogate_res_pos_ptr CONST_REF ptrref){
 p = ptrref.p;
}
surrogate_res_pos_ptr::operator res_pos_object* () CONST_MEMBER{
 if (!p) return NULL;
 return ensure_in_memory(p);
}
res_pos_object&
surrogate_res_pos_ptr::operator* () CONST_MEMBER{
 return ensure_in_memory(p);
}
res_pos_object*
surrogate_res_pos_ptr::operator-> () CONST_MEMBER{
 return ensure_in_memory(p);
}


Example 3:

void*
pos_object::operator new (size_t l, basic_pos_class* cl){
 /* get the Objectstore type identifier for the type being allocated */
 /* We cache it in the class object for efficiency */
 os_typespec* ts = ((pos_object*)(cl->prototype))->get_typespec();
 /* allocate one instance in cadillac_database */
 return ::operator new (l, cadillac_database, 1, ts);
}

void
pos_object::operator delete (void* ob){
 /* just call the normal operator delete provided by Objectstore */
 ::operator delete (ob);
}

































































December, 1992
SPLAY TREES


For fast access of frequently accessed data




Dean Clark


Dean is a programmer at Logicon R&D Associates in Albuquerque, New Mexico,
specializing in computer graphics. He can reached on CompuServe at 71160,2426.


We tend to view software programs as composed of two distinct parts: code and
data. We think of code as the animated, active component, while data just sort
of lays around wherever we happen to put it. As programmers, one of our duties
is to arrange for data to be stored in such a way that our programs can access
it efficiently. Usually we assume that once we've put the data in place, it
stays put.
The very term "data structure" implies something static, like a bridge or a
sky-scraper. Most data structures do not provide a mechanism to rearrange data
to account for changing conditions as a program executes. However, in many
situations we could improve performance if the data itself could respond to
the way the program uses it. What we'd like to have, in fact, are
self-adjusting data structures.
Coincidentally, such beasts do exist. They not only modify their shape after
obvious operations like insert and delete, they also adjust themselves for
innocuous operations like find.
Self-adjusting data structures have been around for quite a while. A type of
self-adjusting linear list is used to implement the least-recently-used paging
scheme in virtual-memory systems. More recently we have the introduction of
skew heaps, Fibonacci heaps, red-black trees, and (drum roll ... ) splay
trees.
A splay tree is a normal binary search tree (BST) in all outward respects. The
difference is that the standard tree operations--insert, delete, and find--are
implemented in terms of an operation called SPLAY.
The easiest way to describe the SPLAY operation is by example. Suppose you
want to find a particular key K in a tree (assume K is in the tree). The first
step is to search for K in the tree. SPLAY locates the node by performing a
normal tree traversal.
Now comes the weird part. Instead of just returning K, SPLAY promotes it to
the root of the tree via a series of operations called "rotations." Find then
returns the root of the tree. These rotations reorganize the tree such that
skewed trees tend to become bushier and therefore more balanced. The node
containing the target of the search is now at the root of the tree, so the
next time you go looking for it, it'll be waiting right near the top.
That's the real reason for splay trees; frequently accessed data stays near
the root of the tree, so most accesses are very fast. Suppose a bank kept its
customer database in a splay tree. Frequent customers would tend to be near
the root of the tree, while inactive customers would be located near the
leaves. Since records for frequent customers are accessed more often, overall
performance is improved.
At the same time, SPLAY tends to balance skewed trees. A splay tree is not a
balanced tree in the sense that height balancing is one of its invariants, and
it can have any shape (even all left or all right children, just like a BST.
Nevertheless, for any sequence of k insert, delete and find operations, on a
tree of size N, splay trees guarantee that we'll do O(k lg N) work.
Hold it. If a splay tree can be just as skewed as a BST, why isn't its worst
case O(n), like the BST? Because the splay tree adjusts itself. In fact, the
O(k lg N) bound for the splay tree is shown not through traditional worst-case
analysis but by using amortized analysis. Amortized analysis deals with
sequences of operations instead of focusing on a single horrible situation
like traditional techniques.
Rather than showing a formal proof, let's see if we can get a feel for the
splay tree's behavior by comparing it to a plain vanilla binary search tree.
The worst thing that can happen in a BST is for all nodes to end up on either
the left or right side. If we access the very bottom node, we pay O(n). We
could do that forever, paying the same price each time. We also pay O(n)
(worst case) to build our unfortunate tree in the first place. Overall cost:
O(n).
What if we try the same thing with the splay tree? First of all, in order to
get a linear tree, all the nodes must have gone to the root of the tree, so we
only pay O(1) for each insert, not O(n), as is the worst case for the ordinary
BST. Now, finding the very bottom node of a linear splay tree is about the
same amount of work as finding it in a BST. But since all the inserts were so
cheap, on the whole we've done much less than O(n) work--in fact, just about
O(lg n).
Furthermore, since finding the bottom node means we "splayed" the tree, the
whole tree is no longer linear. In fact, it's about half as deep as it was, so
if we go get the deepest node again, it's only half as far to go, which again
reduces the tree depth, and so on. Eventually, the tree is just about
balanced, and all finds take about O(lg n) work.
The gist of the splay tree analysis is that some operations are worse than lg
n, and some are better. For any possible combination of operations, the good
ones always balance the bad ones to result in O(lg n) overall behavior.
In a splay tree, the find operation is implemented as a SPLAY operation to
promote the node to the root. The find operation then simply returns the root
of the tree (assuming the node is in the tree). The other common BST
operations, delete and insert, are implemented similarly. For insert, the tree
is traversed to find the right place for the new node, a normal insert is
performed, and then SPLAY is called to promote the new node to the root.Figure
1 illustrates the insert operation.
Deleting is slightly more complex. First, perform a SPLAY on the node to be
deleted; this promotes the node to the root. Delete the node; this leaves you
with two orphaned subtrees. Pick one of the subtrees (say the left one) and
again SPLAY on the node just deleted. Of course the node won't be found;
you'll get the in-order successor or predecessor to the node, which becomes
the new root of the tree. Because everything in the right orphaned subtree
must be greater than everything in the left orphaned subtree, the right
subtree is simply attached to the new root. Delete therefore requires two
splayings on the tree. Figure 2 illustrates a delete operation.


Rotations


The basis of splaying is the rotations that bring the target node to the root.
Three kinds of rotations correspond to three patterns of subtrees consisting
of the key node, its parent, and its grandparent. The rotations are called
zig, zig-zig, and zig-zag.
The zig rotation is easiest and can only occur when the key node's parent is
the root of the tree. This is illustrated in Figure 3, where K is the key node
and P is the parent. For the zig-zig and zig-zag rotations, the key node has
both a parent and grandparent. If the key node K is the left child of a left
child, then a zig-zig rotation is called for, as shown in Figure 4. Finally,
if the key is the right child of a left child, then we do a zig-zag rotation;
see Figure 5.
Each rotation has its mirror twin: That is, we also zig when the key is the
right child of the root, zig-zig for a right child of a right child, and
zig-zag for a left child of a right child.
It's easy to see that a zig-zag rotation tends to make the tree more balanced.
It's not as readily apparent that the other rotations help the balancing as
well. Let's try an example.
Assume we have a BST with all left children and we want to find the left-most
child. Figure 6 shows the tree before SPLAY gets ahold of it. Performing a
SPLAY on node 1 results in the intermediate steps shown in Figure 7.
It's not readily apparent from an example with only nine nodes, but the
resulting tree is roughly half as deep as the original. Notice also that only
zig-zig rotations were necessary.


The Code


The subroutines for the six SPLAY rotations are in Listing One (page 106). All
rotations require that we have some way to get to the parents of a node. For
bottom-up splaying (it's also possible to splay from the top down), we can
either push the traversal path onto a stack or store parent pointers in the
nodes themselves. I've chosen the latter approach.
Listing Two (page 106) contains the SPLAY routine itself. Listing Three (page
107) contains generic find, delete, and insert routines. You must supply
functions for comparing keys and recovering memory used by a deleted node,
because these are data dependent.
Note that the overhead for the rotations is pretty low, just a few pointer
assignments and a couple of tests. This, combined with the fact that the
number of rotations required is roughly half the height of the tree, means
that splay trees have good overall performance compared to standard BSTs.


Conclusions


Splay trees are one example of a self-adjusting data structure, a kind of data
structure that rearranges itself in response to changing program operations.
They're easy to code and maintain, have low overhead, and can really improve
performance for data-access situations that are heavily skewed but not
predictably so.



References


Moret, Bernard M.E. and Henry D. Shapiro. Algorithms from P to NP. vol. 1.
Redwood City, CA: Benjamin Cummings, 1991.
Sleator, Daniel D. and Robert E. Tarjan. "Self-Adjusting Binary Search Trees."
Journal of the ACM (July, 1985).
Weiss, Mark A. Data Structures and Algorithm Analysis. Redwood City, CA:
Benjamin Cummings, 1992.
_SPLAY TREES_
by Dean Clark


[LISTING ONE]

void zig_left(TREENODE *t)
{
 TREENODE *p, *r;
 p = t->parent;
 r = t->right;
 t->right = p;
 if (r) {
 r->parent = p;
 }
 p->left = r;
 p->parent = t;
}
void zig_right(TREENODE *t)
{
 TREENODE *p, *l;
 l = t->left;
 p = t->parent;
 p->right = t;
 if (l) {
 l->parent = t;
 }
 p->right = l;
 p->parent = t;
}
void zig_zig_left(TREENODE *t)
{
 TREENODE *p, *pr, *g, *r, *gp;
 p = t->parent;
 g = p->parent;
 gp = g->parent;
 r = t->right;
 pr = p->right;
 p->right = g;
 g->left = pr;
 t->parent = gp;
 t->right = p;
 p->left = r;
 if (r) {
 r->parent = p;
 }
 if (pr) {
 pr->parent = g;
 }
 if (gp) {
 if (gp->left == g) {
 gp->left = t;
 }

 else {
 gp->right = t;
 }
 }
}
void zig_zig_right(TREENODE *t)
{
 TREENODE *p, *pl, *g, *l, *gp;
 p = t->parent;
 g = p->parent;
 gp = g->parent;
 l = t->left;
 pl = p->left;
 p->left = g;
 g->right = pl;
 t->parent = gp;
 t->left = p;
 p->right = l;
 if (l) {
 l->parent = p;
 }
 if (pl) {
 pl->parent = g;
 }
 if (gp) {
 if (gp->left == g) {
 gp->left = t;
 }
 else {
 gp->right = t;
 }
 }
}
void zig_zag_left(TREENODE *t)
{
 TREENODE *p, *gp, *ggp, *l, *r;
 p = t->parent;
 gp = p->parent;
 ggp = gp->parent;
 l = t->left;
 r = t->right;
 t->parent = ggp;
 t->left = p;
 t->right = gp;
 gp->parent = t;
 p->parent = t;
 p->right = l;
 gp->left = r;
 if (l) {
 l->parent = p;
 }
 if (r) {
 r->parent = gp;
 }
 if (ggp) {
 if (ggp->left == gp) {
 ggp->left = t;
 }
 else {

 ggp->right = t;
 }
 }
}
void zig_zag_right(TREENODE *t)
{
 TREENODE *p, *gp, *ggp, *l, *r;
 p = t->parent;
 gp = p->parent;
 ggp = gp->parent;
 l = t->left;
 r = t->right;
 t->left = gp;
 t->right = p;
 t->parent = ggp;
 p->parent = t;
 gp->parent = t;
 gp->right = l;
 p->left = r;
 if (l) {
 l->parent = gp;
 }
 if (r) {
 r->parent = p;
 }
 if (ggp) {
 if (ggp->left == gp) {
 ggp->left = t;
 }
 else {
 ggp->right = t;
 }
 }
}





[LISTING TWO]

/* Splay functions
** Assumptions: Tree is pointed to by global variable T,
** type KEYTYPE
** External function Compare_Key(KEYTYPE k1, KEYTYPE k2) returns
** 0 if the two keys are equal, -1 if first key is less than
** second, 1 if first is greater than second
** External rotation functions from listing 1
*/

Splay(KEYTYPE k)
{
 int gle;
 /* Assume global tree T. Traverse T looking for key. Compare_Key()
 ** returns 0 if the two keys are equal, -1 if the first is less than
 ** the second, 1 if first is greater than the second */
 while ((gle = Compare_Key(T->key,k)) != 0) {
 if (gle > 0) {
 if (T->left) {

 T = T->left;
 }
 else break;
 }
 else {
 if (T->right) {
 T = T->right;
 }
 else break;
 }
 }
 /* T now points to the node containing the key k, or to the inorder
 ** successor or predecessor to k. We don't really care which at
 ** this point. T will be root when its parent pointer points to itself */
 while (T->parent != T) {
 if (T->parent->parent) {
 /* zig-zig or zig-zag*/
 if (T->parent->parent->left == T->parent) {
 if (T->parent->left == T) {
 zig_zig_left(T);
 }
 else {
 zig_zag_left(T);
 }
 }
 else {
 if (T->parent->right == T) {
 zig_zig_right(T);
 }
 else {
 zig_zag_right(T);
 }
 }
 }
 else {
 /* zig */
 if (T->parent->left == T) {
 zig_left(T);
 }
 else {
 zig_right(T);
 }
 }
 }
}







[LISTING THREE]

/* Splay Tree Standard Operations -- Find, Delete and Insert functions for
** splay trees. Assumptions:
** Tree is pointed to by global variable T, type KEYTYPE
** External function Compare_Key(KEYTYPE k1, KEYTYPE k2) returns
** 0 if the two keys are equal, -1 if first key is less than

** second, 1 if first is greater than second
** External function Destroy_Key(KEYTYPE k) that recovers memory
** used by a key
** Special KEYTYPE variable ERROR_KEY used as an error sentinel
*/

KEYTYPE Find(KEYTYPE k)
{
 Splay(k);
 /* If k was in the tree it's now the root node */
 if (T->key == k) {
 return (T->key);
 }
 else {
 return (ERROR_KEY);
 }
}
void Delete(KEYTYPE k)
{
 TREENODE *l, *r;
 /* Bring target node to the root */
 Splay(k);
 /* Make sure key was in the tree... */
 if (Compare_Key(T->key, k) == 0) {
 /* Detach the node and dispose of it */
 l = T->left;
 r = T->right;
 Destroy_Node(T);
 /* Splay left subtree to find the new root of tree (see text) */
 T = l;
 Splay(k);
 /* Root and left subtree are fine now, just attach right subtree */
 T->right = r;
 }
}
void Insert(KEYTYPE k)
{
 TREENODE *x;
 /* Insert the new node into the tree */
 Attach_Node(k);
 /* Now Splay to bring the new node to root. This is somewhat inefficient
 ** because Splay searches the tree all over again. */
 Splay(k);
}


















December, 1992
SIMULATION AND TESTBOARD FOR EMBEDDED SYSTEM DESIGN


Simultaneous software and hardware development




Michael Kutter


Michael is a software engineer for Advanced NMR Systems Inc., 46 Jonspin Road,
Wilmington, MA 01887.


Writing software for an embedded-controller project often puts software
engineers in the precarious position of being the last members of the
development team to finish their work. Since most embedded-controller systems
have a unique configuration, programmers must often wait for hardware to be
completed or partially available before beginning real work on debugging and
integration. Consequently, the software engineer can become the most visible
target for the pressure of getting the product out the door. Also, since
software debug occurs on a new target system, there's the additional burden of
determining if software failure is due to the code or to the new hardware
environment.
Because of these conditions, it is advantageous to take software development
as far as possible while still waiting for hardware. This yields the
additional benefit of cutting significant time from the development schedule.
Through the use of software simulation and hardware testboards, software
debugging and testing can occur in parallel with hardware development. This
article explains how software simulation and hardware-testboard techniques
were applied to a software-development cycle for an embedded controller at
Advanced NMR Systems, the company where I work.
Advanced NMR Systems produces the Instascan system, a retrofit device for GE
Signa MRI systems. The Instascan system allows the user to create magnetic
resonance (MR) images at speeds up to 1/25 of a second. In conventional
scanning, this time can range from several seconds to a minute. With decreased
imaging time, image-blurring problems due to patient movement, respiration,
and coronary motion are virtually eliminated. It is also possible to create
real-time "movies" of the heart with the Instascan system.
One component of the Instascan system is the Resonant Power Supply Controller
(RPSC), which controls power delivery to the retrofit device, performs
waveform generation and analysis, sets numerous scanning parameters, and
monitors correct system functioning and patient safety. To accomplish all
this, the RPSC communicates with a number of peripheral and logic devices,
including a Sun workstation and Data General mainframe. At the heart of the
RPSC is a Motorola 68040 microprocessor.


Design and Development


Those of us on the software development team for the RPSC were faced with the
task of developing, from the ground up, a sophisticated software-control
system for in-house use in six months, with customer delivery shortly
thereafter.
Because of its cost and the functionality of its Quickfix debugger (which
allows C source-level debugging with the 680x0), we chose to develop with the
C Cross-Development tools from Sierra Systems (Oakland, California). While
Sierra doesn't provide a compiler specifically for the 68040, minor changes
during the installation of the Quickfix debugger let us make full use of the
68020 compiler. The required RPSC software control is complex enough to
require a real-time operating system, so we chose the pSOS operating system
from Integrated Systems (Santa Clara, California). pSOS met all requirements
of the RPSC software specification and provided an interface for the Sierra C
compiler.
Our basic approach was to design software that was both modular and layered.
Typically, top-level control modules call subroutines that interface with
device drivers. Besides generating reusable code, this approach enables
software engineers to thoroughly test portions of code before they are
combined to create larger software systems. Bugs are easier to locate when
software is tested in this manner. This method also facilitates the use of
software simulation as an initial test for system software. If top-level
control modules, mid-level subroutines, and low-level hardware device drivers
are separate entities, introducing software simulators to a system is easy.


The Simulation Process


Software simulation tests are accomplished by adding routines that act in
place of actual hardware devices. Hardware drivers interface with the
simulation routines exactly as (or as closely as possible to) the device they
are designed to drive. Individual software modules are compiled in conjunction
with their simulation counterparts, allowing initial verification of correct
design and coding of software. The Microsoft C compiler and Codeview debugger
provide the environment to evaluate the RPSC software modules on a PC.
A basic design feature in the RPSC software is that all hardware operations
make use of the two macros defined in Example 1. In inbyte, the parameter is
the address of the hardware register being read--the value is "returned" by
the macro. In outbyte, parameter a is the address of the hardware register and
parameter b is the value written.
Example 1: Basic RPSC macros.

 #define inbyte(b) (*((unsigned char *)b))

 #define outbyte(a,b) (*((unsigned char *)a)=(unsigned char)b)

To create a simulation for routines that use these macros, the macros are
replaced by subroutines like those in Example 2 that read from and write to
text files. When a software module is compiled with these routines, the module
will look to text files for hardware inputs, and any hardware write operations
will be recorded in a separate text file.
Example 2: You can replace macros with subroutines like these that read from
and write to text files.

 FILE *sim_out; /* opened at beginning of simulation */
 /* closed at end. */

 FILE *sim_in [ONE_FOR_EACH_REGISTER];

 unsigned char inbyte (unsigned char address)
 {
 int invalue;

 fscanf(sim_in [address], "%x", &invalue);
 return(0xFF & invalue);
 }
 void outbyte (unsigned char dest, value);
 {
 fprintf(sim_out, "%x <-- %x\n", dest, value);


 }

Many of the RPS software modules were tested using this approach, including
the fault-detection routines. The Fault Monitor task is responsible for
monitoring hardware signals for fault events and responding in various ways,
depending on the type and severity of the fault. Rigorous initial testing is
achieved by creating simulation text files that are read by the Fault Monitor
through the inbyte routine just described. The actions of the Fault Monitor
are recorded by the outbyte routine for verification.
Another area tested through simulation is operation of a serial device used to
send commands to and read status from another board in the system. Commands
are transmitted, one bit at a time, by toggling data and control bits in a
serial device register. Status is read, one bit at a time, by toggling control
and reading data bits in the same serial control register. In this case, the
outbyte and inbyte routines require more intelligence, as they must behave as
if they are reading from and writing to a serial device. The serial device
drivers repeatedly call inbyte and outbyte until a complete command has been
sent or a complete status word has been received. By using these simulation
routines, both the device drivers and the routines that use them are verified.
The pseudocode for the outbyte and inbyte serial simulations is shown in
Listing One (page 68).
The RPSC communicates with a Sun workstation through an RS-232 device.
Software simulators can test the control routines and device drivers for this
interface. Both messages and tables are transmitted and received by the RPSC.
The top-level transmit routine calls a subroutine that translates the message
or table into a transmittable string. A subroutine then passes the string to
the device driver for sending. Similarly, the receive device driver passes the
received strings to a subroutine for encoding, and the subroutine passes the
encoded message to a top-level receive routine.
The simulation for testing this area involves altering the transmit and
receive subroutines to read from and write to text files. For testing the
receive software, the receive subroutine reads messages in the form of strings
from a file, encodes them, and passes the encoded message to the top-level
receive routine. The correct response of the top-level routine is verified.
For testing transmission, the top-level send routine is called with a variety
of messages and tables. The send subroutine is altered to write the
transmittable strings to a file. Further checks are performed by passing the
output files created from the transmit simulation to the receive simulation
and verifying that the tables are correctly reconstructed in memory. Also,
errors are introduced into messages received to verify that incorrect messages
are reported and correctly processed.
Using simulations, bugs are easily identified at the module/routine level.
After they are fixed and the simulations run correctly, chances are that the
software syntax, logic, and routine interfaces are correct. Seventy percent of
the modules in the RPSC software were verified using these simulation
techniques.


Hardware Testing Techniques


The second phase of RPSC software testing involves a 68000-based hardware
testboard (again from Sierra) usually provided for evaluating the use of the
compiler/debug environment. Since the Sierra toolset is able to create and
debug code for the 68000, and a 68000 version of pSOS is available, the
testboard is an excellent substitute for the actual 68040 target board.
Quickfix features a fast, parallel download for transferring software from a
PC to the target system. Once the RPSC software and pSOS are downloaded to the
testboard, full debug of the RPSC software is realized. The above simulations
are run again, but this time hardware operations are recorded in RAM instead
of in files.
Quickfix is very versatile and provides all the functions that you would
expect of a C-level debugger. It allows you to switch between C and assembler
and to maintain normal debug operation when in assembler mode. This allows the
user to actually step through the pSOS code to observe OS operation. The
ability to monitor OS operation was imperative in locating several problems.
The first software problem encountered in using the 68000 testboard was that
control never returned from the OS initialization call. The problem lay in
setting up the pSOS configuration table. Several entries in the configuration
table are pointers to routines that can be executed upon pSOS start up. pSOS
documentation states that by filling these locations with 0, no calls are made
to startup routines. There is also an entry for a pointer to a jump table used
for initializing I/O devices. The documentation does not explicitly state what
to do if you are not using I/O devices, so this table location was filled with
0. By tracing the initialization process to the point where the pointer to the
jump table was accessed, we saw that the jump was made even though the value
was 0. The pSOS initialization process completed correctly after the address
of a return instruction was substituted for the 0.
Virtually all the OS calls in the RPSC software can be verified using the
testboard. We verified the OS wait function by creating a timer task that used
TRAP vectors to run a simulated OS clock. The testboard features a switch for
generating a level-1 autovector interrupt. Activation and operation of all
tasks driven by external interrupts is tested by modifying the 68000 vector
table through the debugger. Using the autovector switch and a combination of
TRAPs to simulate external events integrates all software modules. All that
remain to be tested are the hardware drivers in a real-time environment.
An additional feature of the testboard is a fully functional 68681 dual
asynchronous receiver/transmitter (DUART) chip used by the RPSC control board
for communicating with the Sun workstation. The presence of the 68681 on the
testboard allows complete verification of DUART initialization and operation
software. Upload and download functions are verified by connecting a terminal
emulator to the 68681 to act as the Sun workstation.


Conclusion


By the time the actual 68040 board became available, we had gained plenty of
experience with the Sierra/pSOS development environment. Thus, we could locate
problems more readily than if the 68000 simulation had not been available. One
example is a problem that occurs due to an incompatibility between pSOS and
the RPSC board design. The software locked up on the first call to the OS wait
routine. Again, using the assembler step-through feature, the problem was
traced to the idle task executed when no other tasks are active. The pSOS idle
task contains an undocumented STOP instruction. Normally the next autovector
interrupt will bring the 68040 out of the stopped state. However, due to the
interaction between the 68040 and the RPSC interrupt controller, the 68040
remained halted. By inserting another idle task with an endless FOR loop, the
STOP instruction can be averted.
Using the techniques described here, most of the RPSC software was debugged
before hardware became available. This assured a smooth, rapid integration of
hardware and software, upon hardware completion. By using the software
simulation and hardware testboard, over one month of debug work was completed
before integration was started, allowing the first system to ship 11 months
after the RPSC software design was initiated.
_SIMULATION AND TESTBOARD FOR EMBEDDED SYSTEM DESIGN_
by Michael Kutter


[LISTING ONE]

FILE *serial_in,*serial_out;

void outbyte(unsigned char dest,value)
 {
 static long command;
 /* check to make sure dest indicates SERIAL_DEVICE. */
 if (dest != SERIAL_DEVICE)
 {
 printf("Error in Hardware destination address\n");
 return;
 }
 /* reconstruct the command by shifting bits into the */
 /* command variable. */
 if (control bit in value indicates a new command bit
 has been sent)
 {
 command += new input bit in "value";
 command <= 1;
 }
 /* record the command after it is complete */
 if (latch bit in value indicates the command is complete)
 {
 fprintf(serial_out,"serial command: %lx",command);

 command = 0;
 }
 }
unsigned char inbyte(unsigned char source)

 {
 static long status;
 unsigned char return_byte = 0;

 /* check to make sure dest indicates SERIAL_DEVICE. */
 if (source != SERIAL_DEVICE)
 {
 printf("Error in Hardware destination address\n");
 return;
 }
 /* get a new status word */
 if (latch bit in value indicates a new status read)

 fscanf(serial_in,"%lx",&status);
 /* send the status one bit at time, shifting bits */
 /* out of the status variable */
 if (control bit in status indicates a new statu bit
 has been requested)
 {
 return_byte = status & 0x01;
 status >= 1;
 }
 return(return_byte);
 }



Example 1: Basic RPSC macros

#define inbyte(b) (*((unsigned char *)b))
#define outbyte(a,b) (*((unsigned char *)a)=(unsigned char)b)




Example 2: You can replace macros with subroutines like these that read from
and write to text files

FILE *sim_out; /* opened at beginning of simulation */
 /* closed at end. */
FILE *sim_in[ONE_FOR_EACH_REGISTER];

unsigned char inbyte(unsigned char address)
 {
 int invalue;

 fscanf(sim_in[address],"%x",&invalue);
 return(0xFF & invalue);
 }

void outbyte(unsigned char dest,value);
 {
 fprintf(sim_out,"%x <-- %x\n",dest,value);

 }




































































December, 1992
SIMULATING HYPERCUBES IN UNIX PART I


Parallel processing for UNIX systems




Jeffrey W. Hamilton and Eileen M. Ormsby


Jeff was lead programmer for IBM's W4 Multiprocessing Adapter. He can be
contacted at jeffh@vnet.ibm.com. Eileen is a staff programmer for IBM's FSD,
working with W4 application development. She can be reached at
eileen@vnet.ibm.com.


Parallel computers are rapidly coming online, and a good number of these are
hypercubes. Unfortunately, few of us can afford to have our own personal
hypercube, so in this two-part article, we'll describe how you can simulate
the execution of a hypercube program on a standard UNIX system. We'll also
show you how to use a network of systems to execute your programs in parallel.
Our simulator, SIMCUBE, is designed to simulate an Intel iPSC/2 hypercube.
While other hypercubes have different application interfaces, the basic ideas
remain the same. A little time with your system's reference manual and good
understanding of the SIMCUBE code should allow you to re-create any hypercube
environment. This month, we'll focus on partitions, the basic building blocks
of a hypercube system. Next month, we'll present the source code for the
simulator and discuss how to use the system.
SIMCUBE was created so we could move one of several hypercube applications
onto another system that ran UNIX. We wanted a reasonable simulation of the
hypercube's environment with little or no modification of the application. If
we could move one application with no changes, then moving the other 125
applications would be a snap. Another important goal was reasonable simulation
speed. What is the use of simulating another computing environment if the
simulation runs 100 times slower? Finally, we restricted ourselves to standard
UNIX system calls. This permits the simulator and application to be moved to
any UNIX environment. It is possible to combine different UNIX platforms to
create one simulated hypercube.


What is a Hypercube Anyway?


A hypercube is a collection of computing elements (or nodes) joined to work
cooperatively. Each node consists of a processor and memory. Some of the nodes
may also have connections to I/O devices. Each node is joined to a certain
number of nearby nodes. The number of connections is referred to as the order
of the hypercube. Most hypercubes are order-4 machines. How the elements are
joined defines the topology of the hypercube. The topology varies from
manufacturer to manufacturer. The combination of order and topology affects
the distance messages must travel between nodes. Ideally, you would like every
node to be immediately connected to every other node. However, the number of
wires between the nodes would rapidly get out of hand as the number of nodes
in a machine increased. Therefore, compromises must be made. Some applications
spend a fair amount of time organizing their work so that the communication
path between any two subtasks is as short as possible, but for most
applications it is easier to imagine each node as a computer system on a local
area network.
Figure 1 shows a possible topology for an eight-node hypercube of order 4.
Each computing element communicates with its neighbors by passing messages.
Node 0 can send messages directly to node 1. To send a message to node 2, node
0 must first send the message to an intermediate node, which will forward the
message. In the example given, each node can reach half the nodes in one step
and the remaining nodes in two steps. In most hypercubes, the message routing
is totally handled by hardware, so the user does not have to be concerned
about how messages flow through a hypercube. However, if you want to get the
most performancefrom a hypercube, you must minimize the number of steps a
message takes to reach its destination node.
Node 0 has a special role: All communications to the outside world flow
through this node. On Intel hypercubes, node 0 is connected to the
system-resource manager (SRM)--a PC that controls the communications with the
hypercube and allocates portions of the hypercube to applications.


Programming for a Hypercube


Programs are loaded into the hypercube from the command line of the SRM or by
evoking a "host" program, which in turn loads the nodes of the hypercube. Most
hypercube systems have a barebones executive running on each node to work with
an application program. Each node can only execute one program at a time.
There are no complex operating-system functions such as scheduling or
virtual-memory support on the hypercube. It is like having a personal computer
DOS on each node.
The SRM can load each node with a program directly or load node 0 with a
program which in turn loads the remaining nodes. You do not have to load each
node with the same program, but most programmers do--it makes life much
simpler.
In SIMCUBE, we only consider the case of a host program loading the same
program on all the nodes. Extending the program to handle other cases will be
left as an exercise for the reader.


Hypercube Partitions


Few applications need all the nodes in a hypercube, and it is expensive to
dedicate a whole machine to running one job at a time. Therefore, the
hypercube is typically broken down into partitions--virtual hypercubes. The
typical number of nodes in a partition is a power of two (1, 2, 4, 8, and so
on).
SIMCUBE can simulate partitions of any size, including odd values like 3 or 7.
In a hypercube, a "partition" refers to all the nodes within a virtual
hypercube. In SIMCUBE, a "partition" will refer to the number of nodes
simulated on one computer system. A collection of SIMCUBE partitions will
simulate a single hypercube partition.
In a hypercube, a "node" is a semi-independent computer. In SIMCUBE, a "node"
is a process. Each SIMCUBE partition will have NUMBER_IN_PART nodes, defined
in the file cube.h (Listing One, page 108). You can set it to any value that
makes sense for your particular system. Since an application requests a
certain number of nodes, the value of NUMBER_IN_PART determines the number of
SIMCUBE partitions needed.


Partition-manager Overview


The partition manager (PM), see Listing Two (page 108), is responsible for
handling the communications between partitions. PM is also responsible for
starting and terminating the node processes. Since it is responsible for
starting the processes, PM can ensure that the communication paths are in
place before the application process begins execution. For an application that
does not need multiple partitions, the PM still executes and starts the
application processes.
The file .pmrc contains a list of the computer systems (host names) which can
execute a SIMCUBE partition. The host names are read sequentially from the
file, until the number of partitions needed have been allocated to host
systems. The first name in the .pmrc file must be the local-system name. The
remaining names are the remote systems on the network. How the PMs are started
is discussed in the description of the load command.


Partition Functions


Managing a partition is accomplished with the cubeinfo, getcube, setpid, load,
relcube, and killcube commands. The cubeinfo function returns information
about how the total system is subdivided and which users own which partitions.
Since we will be simulating one partition in a hypercube, and it will always
be available to use, this function is just a stub for future improvements.
The getcube function makes the request to the SRM for a partition. Only one of
the parameters, cubetype, is interesting to our simulator. The cubetype
parameter is a string that describes the desired size of the partition and the
type of computing element needed to run the application. The getcube function
only looks for the size information. The size can be expressed as either the
total number of nodes or the dimension of the cube. Dimensions begin with the
letter d.
The size of the requested hypercube partition determines how many
node-processes are created by SIMCUBE. Since we are only simulating a single
hypercube partition, we assume that the getcube command is called once per
program execution.
In a hypercube, the host program running on the SRM can control and
communicate with multiple partitions. To distinguish which group of processors
is being addressed, the program uses setpid to assign a partition identifier
to the partition immediately after obtaining the partition with the getcube
call.

For SIMCUBE, the partition identifier will be referred to internally as the
"group number," since PID already has special meaning in UNIX applications.
(PID is the process identifier and is used to track specific instances of
programs running in the system.)
The load function is used to load a program into one or more nodes within a
partition. There are actually two forms of this command: One loads a specific
node with a program; and the other loads all nodes within the partition with a
copy of the same program. For our simulator, we have only implemented the
latter case. The rest of the discussion on load is referring to our simulated
version of load.
The load function is the core of SIMCUBE for the host application. Before
loading the application, the simulator is initialized. Space is allocated in
memory for the various control structures of the simulator. The load function
reads the .pmrc file for a list of available systems. It determines how many
partitions to start and which systems to use. Once the argument list for PM
has been set, the load function starts the local PM with an exec call and the
remaining systems are started indirectly by execing the rsh command. PM is
responsible for starting the application processes.
Interrupt handlers are set up so that if we or the program we load dies
unexpectedly, there will be a chance to clean up. This is important under
UNIX, because we are asking the system to set aside global resources for our
exclusive use. If we do not notify the system that we are done with the
resources, we could run out of resources on a subsequent execution of SIMCUBE.
The relcube function is called at the end of program execution to release the
partition gained with the getcube function. For SIMCUBE, this command is just
a placeholder.
The killcube function is called to force the termination of program that will
not quit. The killcube function can be used to terminate a single node within
a partition or to terminate the entire partition. For SIMCUBE, we only needed
to implement the latter function.
We simulate the behavior of killcube by sending a SIGTERM signal to each PM,
which in turn sends the same signal to each application-program node that it
manages. Cleanup is done by all parties to release the shared system
resources.


Partition-manager Functions


The PM sets up communications for the nodes, starts the nodes (forks the
application processes), and performs any communications for the nodes between
the partitions, as required. Each PM performs these functions for the nodes in
its partition.
The PM is started by the load function. This function does a significant
amount of preparation to start each PM.
When the PM process is started, it is passed the following: the name of the
application program to execute; its partition number; the access key, so the
first PM can communicate with the host process; the partition (group) number
assigned by setpid; the number of nodes being simulated; and a list of all
systems running the simulator.
Once PM starts, it determines which partition it is, how many nodes are
running on it, and how many other partitions exist. For SIMCUBE, sockets are
used to communicate between the partitions, and shared memory is used for
nodes within a partition. PM sets up a socket to communicate with each remote
partition.
As in the load function, the interrupt handling is set up so that appropriate
cleanup can be performed upon application termination.
PM is responsible for forking the appropriate number of application processes
(nodes). Each application process must know its node number and the total
number of nodes working, in addition to other simulator-specific information.
PM must pass this information to each application process.
In line with our design goal of minimizing alterations to the hypercube
application, we wanted to find the least obtrusive way to pass this
information to the application. Our choices were to pass it via either the
argument list, a file, or an environment variable. Passing the information in
the application program's argument list would cause too many alterations to
the application program, while a file would be too slow and would hinder
executing multiple SIMCUBE partitions on one system. We therefore chose the
environment variable. Each application process needs the following
information: the access key, so that a node can communicate with other nodes;
the node number assigned to this particular node (process); the partition
number assigned by the setpid function; and the total number of nodes being
simulated. PM forks each application node, sets the environment variable
appropriately, and then execs the application.
PM then waits for the application program to communicate with other nodes on
remote partitions. PM is actually running as two processes. The server PM is
always waiting to receive data on the socket coming from remote partitions.
The client PM is always waiting for a node to send data to a remote partition.
If a node needs to communicate with a node local to its partition, shared
memory is used instead.


Next Month


In next month's installment, we'll discuss SIMCUBE's application environment
and present the source code to the simulate.c program.
_SIMULATING HYPERCUBES IN UNIX_
by Jeffery W. Hamilton and Eileen M. Ormsby


[LISTING ONE]

/***** cube.h *****/

/* Hypercube Simulation definitions */
#define NUMBER_IN_PART 4 /* number of nodes in partition */
#define PM_PORT 6000

/* Maximum message sent between nodes */
#define MAX_MESSAGE_SIZE (1024 * 16)

typedef struct {
 char *name; /* network name of the computer hosting partition */
 int socket; /* file descriptor for the socket */
 int errfdp; /* file descriptor for sending "kill" values */
 struct sockaddr_in addr;
} subpart;

typedef struct {
 int type; /* message type sent with the message -1 or greater */
 int spid; /* sender's group number (pid) */

 int snode; /* sender's node number */
 int dnode; /* node this message is destined for */
 int t_length; /* total length of message */
 int length; /* length of the message */
 char valid[NUMBER_IN_PART+2]; /* 0= no message */
 char msg[MAX_MESSAGE_SIZE]; /* Actual message contents */






[LISTING TWO]

/***** pm.c *****/
/* PARTITION MANAGER -- This program will run on all partitions used for an
** application. It is started via a remote execution call from "load".
** The main program gets the input arguments, sets a few variables and calls
** the Partition Manager subroutine which performs the following functions:
** determines local partition information; allocates neccessary partition
** structures; sets up interrupt handling to free system resources when the
** application is terminated; sets up server portion of socket communications;
** forks a client PM that sets up client portion of the socket communications,
** waits for a node to request data to be sent to a partition, and sends data
** over the sockets; forks and execs application children; performs server PM
** functions that waits to receive data from the sockets and notifies
** appropriate nodes when data has arrived.
** The PM server only receives data for its nodes, and the PM client sends
** data to a remote partition.
** BASIC SOFTWARE ARCHITECTURE: The load module in the simcube library will:
** 1) Read the .pmrc file; 2) Determine how many partitions will be used
** for this application; 3) Fork and exec a local PM and the appropriate
** number of remote PMs PM is passed the name of the application program,
** its partition number, the key value for PM to communicate to the host
** process with the group number, the total number of application nodes,
** the names of the other partitions running this application.
** The initialization portion (init_simulator) which is called by the
** application processes (nodes) will set up interrupt handling and create
** the shared memory and semaphores necessary for communications
** between local nodes and the Partition Manager.
*/

/* These functions allow a UNIX system to simulate a hypercube environment. */
#include <sys/types.h>
#include <errno.h>
#include <stdio.h>
#include <fcntl.h>
#include <sys/ioctl.h>
#include <sys/sem.h>
#include <sys/shm.h>
#include <sys/ipc.h>
#include <sys/wait.h>
#include <signal.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netdb.h>
#include <string.h>
#include <sys/time.h>
#include <sys/utsname.h>
#include <sys/param.h>
#include "cube.h"


#define NUM_TRIES 60
#define min(x,y) (((x) < (y)) ? (x) : (y))

/* Function Prototypes */

void *malloc(int size);
void *shmat(int, void*, int);

int pm (char *filename, char *pmsites[]);
void setup_server_sockets(void);
void pm_server(void);
void setup_client_sockets(void);
void pm_getmsg(void);
void abort_prog(void);
void sig_terminate(int sig, int code, struct sigcontext *scp);
void unexpected_death(int sig, int code, struct sigcontext *scp);
int _killcube(int node, int pid);
int init_shared_mem(void **pointer, int size, int key);
int init_semaphore(int *semid, int size, int value, int key);
int semcall_all(int semid, int size, int operation);
int semcall_one(int semid, int num, int operation);
int numnodes(void);
int numparts(void);
int mypart(void);
int partof(int node);
int pm_partof(int node);
int numbuffers(void);
int mybuffer(void);
int bufferof(int node);
int mynode(void);
int myhost(void);
void pm_client(void);

/* Local, Private Information */
fd_set node_part_set, temp_set;
 /* node_part_set is the variable that FD_XXX commands are */
 /* applied to. Definitions of fd_set structure, and FD_ZERO, */
 /* FD_SET, FD_CLR, and FD_ISSET macros are in <sys/types.h> */
 /* node_part_set will have socket file descriptors.*/
static int num_parts; /* number of partitions */
static int my_part; /* partition this process is in */
static int nodes_in_part; /* number of nodes in this partition */
static subpart *partition; /* list of partition information */
static int base; /* base key value for allocating shared data */
static int my_node; /* node number for this process */
static int my_group; /* group id for this process */

 /* There are two groups, host communications */
 /* and inter-node communications */
static int num_nodes; /* total number of nodes in all partitions */

static int msgavail = -1; /* semaphores indicating message is available */
static int msgfree = -1; /* semaphores indicating buffer is free */
static int next_message = -1; /* which message is to be received next */
static int shmid_m = -1; /* id of shared area for messages */
static message *buffer = NULL;/* communication areas */
static int *children = NULL; /* process ids of all child processes */
static int child_index = 0; /* number of children created */
static int pmserver_pid = 0; /* pid of pmserver */

/* Main: reads arguments from command line, places them in local variables
** and calls pm. (Local variables are not necessary, but enhances readability)
** NOTE: ONLY TEN NODES (PM SITES) ARE READ FROM THE COMMAND LINE */
int main(int argc, char *argv[])

{
 char *filename;
 char *pmsites[16];
 int i;
 if (argc < 7 ) {
 fprintf (stderr, "PM main: error not enough arguments\n");
 fflush(stderr);
 exit(-1);
 }
 filename = argv[1];
 my_part = atoi(argv[2]);
 base = atoi(argv[3]);
 my_group = atoi(argv[4]);
 num_nodes = atoi(argv[5]);
 for (i = 0; i < argc - 6; i++) {
 pmsites[i] = argv[i + 6];
 }
 pm (filename, pmsites);
}

/* pm -- Determines partition information, sets up signal handling, sets up
** server sockets, forks client pm, forks application children. PM splits the
** application into NUMBER_IN_PART processes. The partition number is passed
** as an input parameter. The starting node number is the partition number
** NUMBER_IN_PART and remaining processes will be numbered consecutively.
** Shared memory will be allocated to serve as a communications vehicle within
** a partition. Sockets used between partitions to allow multiple UNIX systems
** to be combined to create a larger set of CPUs to be applied to a problem.
*/
int pm (char *filename, char *pmsites[])
{
 register int i, pid;
 char temp[128]; /* used to set up environment variables */
 char part_names[64];
 int start_node;
 int dest_node;
 /* Determine how many other partitions exist */
 num_parts = (num_nodes + NUMBER_IN_PART - 1) / NUMBER_IN_PART;
 /* Determine which node is the first for this partition */
 start_node = mypart() * NUMBER_IN_PART;
 /* Determine how many nodes are in this partition (1-4) */
 nodes_in_part = numnodes() - (mypart() * NUMBER_IN_PART);
 nodes_in_part = min(NUMBER_IN_PART, nodes_in_part);
 /* Set PM's node to be the last node on this partition */
 /* (The children will be start_node through start_node + nodes_in_part-1) */
 my_node = nodes_in_part;
 /* Create the structure to hold the partition names and socket fds */
 if ((partition = malloc(num_parts * sizeof(subpart))) == NULL) {
 fprintf(stderr,"PM %d SERVER: insufficient memory\n, mypart()");
 fflush(stderr);
 return -1;
 }
 memset(partition, 0, num_parts * sizeof(subpart));
 /* Catch these signals so PM can notify children to clean up */
 signal(SIGINT,sig_terminate);
 signal(SIGTERM,sig_terminate);
 signal(SIGQUIT,sig_terminate);
 /* Watch for unexpected deaths */
 signal(SIGCHLD, unexpected_death);
 /* Create, bind, and listen on sockets */

 setup_server_sockets();
 if (mypart() != 0) {
 /* Only change the base on partitions that are not the one that includes
 ** host. That partition requires same base that host session is using. */

 base = getpid();
 }
 /* Allocate shared memory */
 shmid_m = init_shared_mem(&buffer, sizeof(message) * numbuffers(), base);
 if (mypart() != 0) {
 memset(buffer, 0, sizeof(message) * numbuffers());
 }
 /* Allocate communications semaphores */
 init_semaphore(&msgavail, numbuffers(), 0, base+10000);
 init_semaphore(&msgfree, numbuffers(), 0, base+20000);
 /* Flush stdout and stderr before doing a fork, so child doesn't inherit */
 fflush(stdout);
 fflush(stderr);
 /* Fork PM CLIENT here */
 if ((pmserver_pid = fork()) < 0) {
 /* Can't create the PM CLIENT */
 _killcube(0, 0);
 fprintf(stderr, "PM %d SERVER: unable to create PM CLIENT process %d\n",
 mypart(), i);
 fflush(stderr);
 return -1;
 } else if (pmserver_pid == 0) {
 /* Fill in the names of the other sites in the partition structure and
 ** close the socket file desciptors that this process just inherited. */
 for (i = 0; i < num_parts; i++) {
 if (mypart() != i) {
 partition[i].name = pmsites[i];
 close(partition[i].socket);
 }
 }
 /* CALL CLIENT SUBROUTINES */
 setup_client_sockets();
 pm_client();
 } else {
 /* SERVER: forks application children then calls pm_server subroutine */
 /* Read from pmsites array, create a comma delimited string for env */
 part_names[0] = '\0';
 for (i = 0; i < num_parts; i ++) {
 strcat(part_names, pmsites[i]);
 strcat(part_names, ",");
 }
 /* Allocate space for child pids */
 if ((children = malloc(nodes_in_part * sizeof(int))) == NULL) {
 fprintf(stderr,"PM %d SERVER: insufficient memory\n", mypart());
 fflush(stderr);
 return -1;
 }
 /* Load all nodes within this partition */
 for (i = start_node; (i < start_node + nodes_in_part); i++) {
 if ((pid = fork()) < 0) {
 /* Can't create all the children! */
 _killcube(0, 0);
 fprintf(stderr, "PM %d SERVER: unable to create node process %d\n",
 mypart(), i);

 fflush(stderr);
 return -1;
 } else if (pid == 0) {
 /* I'm the child process */
 /* Start the node program */
 my_node = i;
 sprintf(temp, "SIM_INFO=%d,%d,%d,%d,%s",
 base,my_node,my_group,num_nodes,part_names);
 if (putenv(temp) != 0) {
 fprintf(stderr,
 "PM %d SERVER: Insufficient room to add env variable\n",
 my_node);
 fflush(stderr);
 return -1;
 }
 execlp(filename,filename,NULL);
 /* If we get here, we had a problem */
 perror("execlp");
 fprintf(stderr,"PM %d SERVER: error execing node=%d file=%s
 errno=%d\n", mypart(), my_node, filename, errno);
 fflush(stderr);
 return -1;
 } else {
 /* I'm the parent process */
 children[child_index++] = pid;
 }
 }
 /* CALL SERVER SUBROUTINE */
 pm_server();
 } /* end if PM SERVER */
}
/* setup_server_sockets -- SERVER SOCKETS- for all partitions except ourself:
** Create a socket Bind the socket to a unique PORT id. (If the socket was
** in use in a prior iteration, it may not have been reset yet - therefore we
** loop a fixed number of times retrying.) Put a listen on socket. Put new
** socket file descriptor into our set */
static void setup_server_sockets(void)

{
 int i, j;
 struct sockaddr_in part_sock, tempaddr;
 /* Zero out the set of partition sockets */
 FD_ZERO(&node_part_set);
 FD_ZERO(&temp_set);
 for (i = 0; i < num_parts; i++) {
 /* Skip ourself */
 if (i == mypart () )
 continue;
 for (j = 0; j < NUM_TRIES; j++) {
 /* Create a SERVER socket to receive data */
 if ((partition[i].socket = socket(AF_INET, SOCK_STREAM, 0))
 < 0) {
 fprintf(stderr, "PM %d SERVER: can't open stream socket, errno\n",
 mypart(), errno);
 fflush(stderr);
 exit (100);
 }
 /* Bind SERVER socket to local addr so partitions can send to it */
 bzero((char*)&part_sock, sizeof(part_sock));

 part_sock.sin_family = AF_INET;
 part_sock.sin_addr.s_addr = htonl (INADDR_ANY);
 /* Create unique SERVER socket port address, up to 16 per computer */
 part_sock.sin_port = htons (PM_PORT + (mypart() << 4) + i);
 /* If socket is still in use from prev iter, keep trying to bind */
 if ((bind(partition[i].socket, &part_sock,
 sizeof(part_sock))) < 0) {
 if ((errno == EADDRINUSE) (errno == EINTR)) {
 /* Previous load hasn't shutdown yet, or we were interrupted. */
 close(partition[i].socket);
 sleep(2);
 } else {
 fprintf(stderr,"PM %d SERVER: can't bind local addr,
 errno=%d\n", mypart(), errno);
 fflush(stderr);
 exit(100);
 }
 } else {
 /* It worked, exit the loop */
 break;
 }
 }
 if (j == NUM_TRIES) {
 /* Exceeded retry limit */
 fprintf(stderr,"PM %d SERVER: can't bind local addr, errno=%d\n",
 mypart(), errno);
 fflush(stderr);
 exit(100);
 }

 /* Issue a listen for the server sockets */
 if (listen(partition[i].socket, 1) < 0) {
 fprintf(stderr,"PM %d SERVER: can't listen on %d, errno = %d\n",
 mypart(), partition[i].socket, errno);
 fflush(stderr);
 exit(100);
 }
 /* Set the bit for the socket file descriptor */
 FD_SET(partition[i].socket, &node_part_set);
 } /* end for setting up SERVER sockets */
}
/* pm_server -- SERVER- go into a receiving loop: Copy file desciptors to a
** temporary set. Determine how many sockets are ready to be accepted. For
** each file descriptor that is ready: Find file descriptor that is ready.
** If it is found in a partition's array of fd's then it is a base socket and
** it is "accept"ed and added to the fd set. Else it is an fd that has data to
** be received. Receive the size of the message. Loop until entire message is
** received. Clear the valid indicator bits. Inform nodes that a message has
** arrived. If a broadcast message, set everyone's valid bit, and wait until
** everyone receives it. Else verify that message belongs to a node on this
** part and set that node's valid bit, wait until it is recvd. */
static void pm_server(void)
{
 int i, j;
 int accept_rdy;
 int newsockfd, templen;
 int size, count, partial;
 char *target;
 struct sockaddr_in tempaddr;

 /* forever, accept sockets and receive data */
 for ( ; ; ) {
 temp_set = node_part_set;
 /* Determine how many sockets are ready to be accepted */
 /* FD_SETSIZE is defined in <sys/types.h> to be 200 */
 if ((accept_rdy = select( FD_SETSIZE, &temp_set, 0, 0, 0)) == -1) {
 if (errno != 4) {
 fprintf(stderr, "PM %d SERVER: error in select, errno = %d\n",
 mypart(), errno);
 perror( "pm select" ) ;
 fflush(stderr);

 _killcube(0,0);
 exit(-1);
 } else {
 /* We were interrupted, try again */
 continue;
 }
 }
 for (i = 1; (accept_rdy != 0) && (i < FD_SETSIZE) ; i++) {
 /* Find the file descriptor that needs servicing */
 if ( FD_ISSET( i, &temp_set)) {
 /* temporary modification */
 /* accept_rdy--; */
 accept_rdy = 0;
 /* Examine each partition's array of fd's to find ready one */
 for (j = 0; j < num_parts; j++) {
 /* Skip examining our own partition */
 if (j == mypart() )
 continue;
 /* Since this matches our "base" socket, accept the socket */
 if (i == partition[j].socket) {
 newsockfd = accept(partition[j].socket,
 (struct sockaddr_in *)&tempaddr, &templen);
 FD_SET (newsockfd, &node_part_set);
 /* Found "base" socket, break out of for each part loop */
 break;
 } /* end if base socket */
 } /* end for check file descriptors in partition's array */
 /* If it wasn't a base socket, then need to receive data */
 if (j != num_parts) {
 continue;
 } else /* receive the data from the socket */ {
 /* First receive the size of the message */
 while (recv(i, &size, sizeof(size)) < 0) {
 if (errno != 22) {
 fprintf(stderr, "PM %d SERVER: recv size err,
 errno=%d, fd=%d\n", mypart(),errno, i);
 fflush(stderr);
 _killcube(0,0);
 exit(-1);
 } else {
 fprintf(stderr, "PM %d SERVER: recv size err, errno=%d,
 fd=%d\n", mypart(), errno, i);
 fflush(stderr);
 }
 } /* end while recv msg */

 target = (char *) &buffer[nodes_in_part];

 count = 0;
 /* Now receive the message, it could come in pieces */
 while (count < size) {
 if ((partial = recv(i, target, size - count)) < 0) {
 fprintf(stderr, "PM %d SERVER: Error recvng msg;
 errno=%d\n", mypart(),errno);
 fflush(stderr);
 exit(-1);
 }
 count += partial;
 target += partial;
 }
 /* Make sure all valid bits are cleared */
 memset(buffer[nodes_in_part].valid,0,
 sizeof(buffer[nodes_in_part].valid));
 /* Tell the node(s) the message is there */
 if (buffer[nodes_in_part].dnode == -1) {
 /* Broadcast the message to nodes in this partition */
 for (j=0; j < nodes_in_part; j++) {
 buffer[nodes_in_part].valid[j] = 1;
 }
 semcall_all(msgavail,nodes_in_part, 1);
 /* Wait until everyone receives the message */
 semcall_one(msgfree, nodes_in_part, -nodes_in_part);
 } else {
 if (mypart() != partof(buffer[nodes_in_part].dnode))
 {
 fprintf(stderr, "PM %d SERVER: Recvd msg for node %d
 not this partition\n",
 mypart(), buffer[nodes_in_part].dnode);
 fflush(stderr);
 } else {
 /* Point to point to another node in same partition */
 j = bufferof(buffer[nodes_in_part].dnode);
 buffer[nodes_in_part].valid[j] = 1;
 semcall_one(msgavail, j, 1);
 /* Wait until it is received */
 semcall_one(msgfree, nodes_in_part, -1);
 }
 } /* endif broadcast message */
 } /* endif receiving data from this socket */
 } /* endif this socket */
 } /* endfor */
 } /* end forever receive messages on sockets */
}
/* setup_client_sockets -- Setting up CLIENT sockets- for all partitions
** except ourself: Create a socket to send data. Look up address of host,
** place in the sockaddr_in structure. Determine appropriate PORT id (needs to
** match with SERVER). (If socket was in use in a prior iteration, it may not
** have been reset yet - therefore we loop a fixed number of times retrying.).
** Issue a connect for the socket */
static void setup_client_sockets(void)
{
 int i, j;
 struct hostent *hent;
 /* Establish socket communications with other partitions */
 for (i = 0; i < num_parts; i++) {
 /* Skip ourself */
 if (i == mypart () )

 continue;
 for (j = 0; j < NUM_TRIES; j++) {
 /* Create a CLIENT socket to send data */
 partition[i].socket = socket(AF_INET, SOCK_STREAM, 0);
 /* Lookup host address and place in the socket address structure */
 memset(&partition[i].addr, 0, sizeof(struct sockaddr_in));
 partition[i].addr.sin_family = AF_INET;
 if ((hent = gethostbyname(partition[i].name))
 == NULL) {
 fprintf(stderr,"PM %d CLIENT: No entry for %d in /etc/hosts\n",
 mypart(), partition[i].name);
 fflush(stderr);
 exit(100);
 }
 memcpy(&partition[i].addr.sin_addr, hent->h_addr,
 hent->h_length);
 partition[i].addr.sin_port = htons(PM_PORT + (i << 4) +
 mypart());
 /* Connect to the socket */
 if (connect(partition[i].socket, &partition[i].addr,
 sizeof(struct sockaddr_in)) < 0) {
 if (errno == ECONNREFUSED) {
 /* unsuccessful connect, sleep and try again */
 sleep(3);
 } else {
 /* another error occurred, quit trying to connect */
 j = NUM_TRIES;
 break;
 }
 } else {
 /* successful connect, break out of loop */
 break;
 } /* endif connect */
 } /* endfor NUM_TRIES */
 if (j == NUM_TRIES) {

 fprintf(stderr,
 "PM %d CLIENT: Unable to connect sock to %s, errno %d\n",
 mypart(), partition[i].name, errno);
 fflush(stderr);
 exit(100);
 }
 } /* end for setting up CLIENT sockets */
}
/* pm_client -- The PM CLIENT process sends data to partitions. Set up client
** sockets. Send messages over the sockets: Get message. Send message (if it
** is a broadcast message send it to all partitions, if not send it to
** appropriate partition). Acknowledge sending of message. Release buffer.
** Reset next message indicator. */
static void pm_client(void)
{
 int i, size;
 /* CLIENT- GO INTO INFINITE SENDING LOOP */
 /* Initial setting to indicate the next message has not been selected */
 next_message = -1;
 /* Forever, wait for messages to send over socket */
 for ( ; ; ) {
 /* Get the message */
 pm_getmsg();

 /* Determine where to send the message */
 if (buffer[next_message].dnode == -1) {
 /* BROADCAST MESSAGE, SEND TO ALL PARTITIONS */
 for (i = 0; i < numparts(); i++) {
 /* Don't send broadcast to self */
 if (i == mypart () )
 continue;
 /* First send the size of the message */
 size = buffer[next_message].length;
 if (send(partition[i].socket, &size, sizeof(size),0)
 < 0) {
 fprintf(stderr,
 "PM %d CLIENT: send to PM %d failed, errno=%d\n",

 mypart(), i, errno);
 fflush(stderr);
 return -1;
 }
 /* Then send the actual message */
 if (send(partition[i].socket,
 &buffer[next_message], size, 0) < 0) {
 fprintf(stderr,
 "PM %d CLIENT: send to PM %d failed, errno=%d\n",
 mypart(), i, errno);
 fflush(stderr);
 return -1;
 }
 } /* endfor SEND BROADCAST TO ALL PARTITIONS */
 } else {
 /* SEND TO A SPECIFIC PARTITION */
 /* First send the size of the message */
 size = buffer[next_message].length;
 i = partof(buffer[next_message].dnode);
 if (send(partition[i].socket, &size, sizeof(size),0)
 < 0) {
 fprintf(stderr,
 "PM %d CLIENT: send to PM %d failed, errno=%d\n",
 mypart(), i, errno);
 fflush(stderr);
 return -1;
 }
 /* Then send the actual message */
 if (send(partition[i].socket,
 &buffer[next_message], size, 0) < 0) {
 fprintf(stderr,
 "PM %d CLIENT: send to PM %d failed, errno=%d\n",
 mypart(), i, errno);
 return -1;
 }
 }
 /* FOR BOTH BROADCAST AND REGULAR MESSAGES */
 /* acknowledge the sending of the message */
 buffer[next_message].valid[mybuffer()] = 0;
 /* release (free) the buffer */
 semcall_one(msgfree, next_message, 1);
 /* reset next_message so the next getmsg will work */
 next_message = -1;
 } /* end forever CLIENT PROCESS sending messages over socket */
}

/***** Initialization and Termination routines *****/
/* abort_prog -- Clean up in the case of an error */
static void abort_prog(void)
{
 int i;
 /* Remove the sets of semaphores */
 if (pmserver_pid != 0) {
 if (msgavail != -1) {
 semctl(msgavail, 0, IPC_RMID, 0);
 msgavail = -1;
 }
 if (msgfree != -1) {
 semctl(msgfree, 0, IPC_RMID, 0);
 msgfree = -1;
 }
 }
 /* Remove the shared memory */
 if (buffer != NULL) {
 shmdt(buffer);
 buffer = NULL;
 }
 /* Only PM SERVER process should execute this code */
 if (pmserver_pid != 0) {
 if (shmid_m != -1) {
 shmctl(shmid_m, IPC_RMID, 0);
 shmid_m = -1;
 }
 }
 /* Close the sockets */
 for (i = 0; i < num_parts; i++) {
 if (i != mypart() ) {
 close (partition[i].socket);
 partition[i].socket = 0;
 }
 }
 /* Make sure all pending output gets out */
 fflush(stdout);
 fflush(stderr);
}
/* Handle termination signals */
void sig_terminate(int sig, int code, struct sigcontext *scp)
{
 int i;
 /* Send termination signal to each of PM SERVER's children */

 if (pmserver_pid != 0) {
 for (i = 0; i < child_index; i++) {
 kill(children[i], SIGTERM);
 }
 child_index = 0;
 kill(pmserver_pid, SIGTERM);
 }
 /* Clean up the use of semaphores and shared memory */
 abort_prog();
 exit(100);
}
/* Handle unexpected termination signals */
void unexpected_death(int sig, int code, struct sigcontext *scp)
{

 int statval;
 int waitpid;
 /* Only PM SERVER process should execute this code */
 if (pmserver_pid != 0) {
 waitpid = wait(&statval);
 if (waitpid < 0) {
 printf("Error determining who died unexpectedly. Errno=%d\n", errno);
 } else {
 if (WIFSIGNALED(statval) != 0) {
 printf("Process %d did not catch signal %d.\n",
 waitpid, WTERMSIG(statval));
 } else if (WIFSTOPPED(statval) != 0) {
 printf("Process %d stopped due to signal %d.\n",
 waitpid, WSTOPSIG(statval));
 } else if (WIFEXITED(statval) == 0) {
 /* Normal termination */
 } else {
 /* Terminated with exit code */
 }
 }
 }
 fflush(stdout);
}
/* killcube -- On abort, kill off all children on the hypercube partition */
int _killcube(int node, int pid)
{
 int i;
 int statval;
 int waitpid;

 /* Only PM SERVER process should execute this code */
 if (pmserver_pid != 0) {
 for (i = 0; i < child_index; i++) {
 kill(children[i], SIGTERM);
 }
 kill(pmserver_pid, SIGTERM);
 for (i = 0; i <= child_index; i++) {
 waitpid = wait(&statval);
 if (waitpid < 0) {
 /* No more children left */
 break;
 } else {
 if (WIFSIGNALED(statval) != 0) {
 printf("Process %d did not catch signal %d.\n",
 waitpid, WTERMSIG(statval));
 } else if (WIFSTOPPED(statval) != 0) {
 printf("Process %d stopped due to signal %d.\n",
 waitpid, WSTOPSIG(statval));
 } else if (WIFEXITED(statval) == 0) {
 /* Normal termination */
 } else {
 /* Terminated with exit code */
 }
 }
 }
 }
 /* Clean up after ourself */
 abort_prog();
 child_index = 0;

 return 0;
}
/* init_shared_mem -- Allocates a shared memory region. Sets pointer to region
** in this process's memory space and returns the shared memory identifier. */
static int init_shared_mem(void **pointer, int size, int key)
{
 int shmid;
 if ((shmid = shmget(key, size, 0666 IPC_CREAT)) < 0) {
 printf("init_shm: allocation of shared memory failed. Errno=%d\n",errno);
 printf(" mynode=%d key=%d size=%d\n",my_node,key,size);
 _killcube(0,0);
 exit(-1);
 }
 *pointer = shmat(shmid, NULL, 0);
 return shmid;
}

/* init_semaphore -- Allocates a set of semaphores and initializes them */
static int init_semaphore(int *semid, int size, int value, int key)
{
 register int i;
 if ((*semid = semget(key, size, 0666 IPC_CREAT)) < 0) {
 printf("init_sem: allocation of semaphores failed. Errno=%d\n",errno);
 printf(" mynode=%d key=%d size=%d\n",my_node,key,size);
 _killcube(0,0);
 exit(-1);
 }
 for (i = 0; i < size; i++) {
 if (semctl(*semid, i, SETVAL, value) < 0) {
 printf("init_sem: init of semaphores failed. Errno=%d\n",errno);
 printf(" mynode=%d offset=%d value=%d\n",my_node,i,value);
 _killcube(0,0);
 exit(-1);
 }
 }
 return *semid;
}
/* semcall_all --Perform same operation on all elements of semaphore at
once.*/
static int semcall_all(int semid, int size, int operation)
{
 struct sembuf sbuf[NUMBER_IN_PART+1];
 register int i;
 for (i = 0; i < size; i++) {
 sbuf[i].sem_num = i;
 sbuf[i].sem_op = operation;
 sbuf[i].sem_flg = 0;
 }
 while (semop(semid, sbuf, size) < 0) {
 /* repeat operation if interrupted */
 if (errno != EINTR) {
 printf("PM %d: Semaphore broadcast failed. Errno = %d\n",
 mypart(), errno);
 fflush(stdout);
 return -1;
 }
 }
 return 0;
}
/* semcall_one -- Perform an operation on an element of a semaphore. */

static int semcall_one(int semid, int num, int operation)
{
 struct sembuf sbuf;
 sbuf.sem_num = num;
 sbuf.sem_op = operation;
 sbuf.sem_flg = 0;
 while (semop(semid, &sbuf, 1) < 0) {
 /* repeat operation if interrupted */
 if (errno != EINTR) {
 printf("PM %d: Semaphore failed. Errno = %d\n", mypart(), errno);
 fflush(stdout);
 return -1;
 }
 }
 return 0;
}
/***** Environment Information (External and Internal) *****/
/* numnodes -- Returns the number of simulated nodes */
int numnodes(void)
{
 return num_nodes;
}
/* numparts -- number of partitions */
static int numparts(void)
{
 return num_parts;
}
/* mypart -- Partition this process is in */
static int mypart(void)
{
 return my_part;
}

/* partof -- Determines which partition a given node is a member of */
static int partof(int n)
{
 if (n == myhost()) {
 return 0;
 } else {
 return n / NUMBER_IN_PART;
 }
}
/* pm_partof -- Determines which subpartition a given node is a member of
** A -1 can be passed if a destination node is broadcast, return -1. */
static int pm_partof(int n)
{
 if (n == myhost()) {
 return 0;
 } else if (n == -1) {
 return -1;
 } else {
 return n / NUMBER_IN_PART;
 }
}
/* numbuffers -- Number of buffers in this partition */
static int numbuffers(void)
{
 if (mypart() == 0) {
 return (nodes_in_part + 2);

 } else {
 return (nodes_in_part + 1);
 }
}
/* mybuffer -- returns the index for this process's buffer */
static int mybuffer(void)
{
 return (nodes_in_part);
}

/* bufferof -- Returns the buffer offset of the given node. Host is always
** second to last buffer in partition 0. The PM is always the last buffer */
static int bufferof(int n)
{
 if (mypart() != partof(n)) {
 return nodes_in_part; /* Return the buffer of PM */
 } else if (n == myhost()) {
 return nodes_in_part + 1; /* This partition, buffer of host */
 } else {
 return n % NUMBER_IN_PART; /* This partition, buffer of node */
 }
}
/* mynode -- Returns the node number for this process */
int mynode(void)
{
 return my_node;
}
/* myhost -- Returns the node number of the host */
int myhost(void)
{
 return numnodes();
}
/***** Communications *****/
/* pm_getmsg -- Wait until a message is available. This routine differs from
** getmsg, in that it checks to ensure that destination node is not in this
** partition. (Getmsg checks that current node equals destination node.)
** OUTPUT: next_message - set to the message found of the proper type */
static void pm_getmsg(void)

{
 int i;
 /* Only wait if a message is not already selected */
 if (next_message != -1) return;
 /* Wait for a message for me */
 semcall_one(msgavail, mybuffer(), -1);
 /* Search for those messages that are for me */
 for (i = 0; i < numbuffers(); i++) {
 if (buffer[i].valid[mybuffer()] != 0) {
 next_message = i;
 return;
 }
 }
}







[LISTING THREE]


/***** simulate.c *****/
/* These functions allow a UNIX system simulate a hypercube environment. */
#include <stdio.h>
#include <ctype.h>
#include <sys/types.h>
#include <errno.h>
#include <sys/ipc.h>
#include <sys/sem.h>
#include <sys/shm.h>
#include <sys/signal.h>
#include <sys/wait.h>
#include <sys/socket.h>
#include <sys/in.h>
#include <netdb.h>
#include "cube.h"

/* Prototypes */
char *getenv(char *variable);
void *shmat(int shmid, void *shmaddr, int shmflg);
char *strtok(char *, char *);
char *strcpy(char *, char *);
void *malloc(int size);
#define min(x,y) (((x) < (y)) ? (x) : (y))
int csend(int type, void *msg, int length, int target_node, int group);
int crecv(int type, void *buf, int len);
int killcube(int, int);
int numnodes(void);
int myhost(void);
int mynode(void);
int numparts(void);
int numbuffers(void);
int mybuffer(void);
int bufferof(int node);
int mypart(void);
int partof(int node);

/* Local, Private Information */
static int num_parts; /* number of partitions */
static int my_part; /* partition this process is in */
static int nodes_in_part; /* number of nodes in this partition */
static subpart *partition = NULL; /* list of partition information */
static int base; /* base key value for allocating shared data */
static int my_node; /* node number for this process */
static int my_group; /* group id for this process */
 /* There are two groups, host communications */
 /* and inter-node communications */
static int num_nodes; /* total number of nodes in all partitions */
static int msgavail = -1; /* semaphores indicating message is available */
static int msgfree = -1; /* semaphores indicating buffer is free */
static int next_message; /* which message is to be received next */
static int shmid_m = -1; /* id of shared area for messages */
static message *buffer = NULL;/* communication areas */
static int *children = NULL; /* process ids of all child processes */
static int child_index = 0; /* number of children created */

/ ** Initialization and Termination routines ** /

/* abort_prog -- Clean up when the program terminates */
void abort_prog(void)
{
 /* Remove the sets of semaphores */
 if (mynode() == myhost()) {
 if (msgavail != -1) {
 semctl(msgavail, 0, IPC_RMID, 0);
 msgavail = -1;
 }
 if (msgfree != -1) {
 semctl(msgfree, 0, IPC_RMID, 0);
 msgfree = -1;
 }
 }
 /* Remove the shared memory */
 if (buffer != NULL) {
 shmdt(buffer);
 buffer = NULL;
 }
 if (mynode() == myhost()) {
 if (shmid_m != -1) {
 shmctl(shmid_m, IPC_RMID, 0);
 shmid_m = -1;
 }
 }
 /* Make sure all pending output gets out */
 fflush(stdout);
 fflush(stderr);
}
/* Handle termination signals */
void sig_terminate(int sig, int code, struct sigcontext *scp)
{
 if (mynode() == myhost()) {
 /* Pass on the termination signal to the node processes */
 killcube(0,0);
 } else {
 /* This is executed by the node processes */
 /* Clean up the use of semaphores and shared memory */
 abort_prog();
 }
 exit(100);
}
/* Handle unexpected termination signals. Used by the host process. */
void unexpected_death(int sig, int code, struct sigcontext *scp)
{
 int statval;
 int waitpid;
 waitpid = wait(&statval);
 if (waitpid < 0) {
 printf("Error determining who died unexpectedly. Errno=%d\n", errno);
 } else {
 if (WIFSIGNALED(statval) != 0) {
 printf("Process %d did not catch signal %d.\n",
 waitpid, WTERMSIG(statval));
 } else if (WIFSTOPPED(statval) != 0) {
 printf("Process %d stopped due to signal %d.\n",
 waitpid, WSTOPSIG(statval));
 } else if (WIFEXITED(statval) == 0) {
 /* Normal termination */

 } else {
 /* Terminated with exit code */
 }
 }
 fflush(stdout);
}
/* handler -- handles hypercube specific errors that do not map to UNIX. */
void handler(int type, void (*proc)())
{
 /* ignore this */
}
/* getcube -- Called by host process to gain possession of a partition in a
** hypercube. Note: Assuming getcube is only called once per host process. */
void getcube(char *cubename, char *cubetype, char *srmname, int keep,
 char *account)
{
 char size[8];
 int is_dimension = 0;
 int i;
 char *ptr;
 char *target;
 /* Pull out the requested number of nodes */
 ptr = cubetype;
 if (*ptr == 'd') {
 ptr++;
 is_dimension = 1;
 }
 target = size;
 i = 4;
 while (isdigit(*ptr) && (i-- != 0)) {
 *target++ = *ptr++;
 }

 *target = '\0';
 /* The rest of the parameters don't matter */
 /* Determine the total number of nodes */
 num_nodes = NUMBER_IN_PART; /* default size */
 sscanf(size,"%d",&num_nodes);
 if (is_dimension) {
 num_nodes = 1 << num_nodes;
 }
}
/* cubeinfo -- Passes back information about the partitions on a hypercube.
** Input: global=0 current attached cube; 1. all cubes you own and allocated
** by the current host; 2. all cubes on the system from which the command was
** executed; 3. how cubes are allocated on all SRMs; 4. 1 addition parameter
** (srmname) returns info for that SRM */
int cubeinfo(struct cubetable *ct, int numslots, int global, ...)
{
 /* returns the number of cubes for which information is available */
 /* Ignore this for now */
 return 0;
}
/* relcube -- release cube gained by the getcube call. */
void relcube(char *cubename)
{
 /* Ignore this for now */
}
/* killcube -- On abort, kill off all processes in the hypercube partition */

int killcube(int node, int pid)
{
 int i;
 int statval;
 int waitpid;
 /* Force everyone to terminate */
 for (i = 0; i < child_index; i++) {
 kill(children[i], SIGTERM);
 }

 /* Give the children a chance to terminate */
 if (child_index > 0) sleep(1);
 /* Wait for everyone to exit, check status in case */
 for (i = 0; i < child_index; i++) {
 waitpid = wait(&statval);
 if (waitpid < 0) {
 /* No more children left */
 break;
 } else {
 if (WIFSIGNALED(statval) != 0) {
 printf("Process %d did not catch signal %d.\n",
 waitpid, WTERMSIG(statval));
 } else if (WIFSTOPPED(statval) != 0) {
 printf("Process %d stopped due to signal %d.\n",
 waitpid, WSTOPSIG(statval));
 } else if (WIFEXITED(statval) == 0) {
 /* Normal termination */
 } else {
 /* Terminated with exit code */
 }
 }
 }
 /* Clean up after ourself */
 abort_prog();
 child_index = 0;
 return 0;
}
/* init_shared_mem -- Allocates a shared memory region. Sets pointer to region
** in this process's memory space and returns the shared memory identifier. */
static int init_shared_mem(void **pointer, int size, int key)
{
 int shmid;
 if ((shmid = shmget(key, size, 0666 IPC_CREAT)) < 0) {
 printf("init: allocation of shared memory failed. Errno=%d\n",errno);
 printf(" mynode=%d key=%d size=%d\n",my_node,key,size);
 fflush(stdout);
 sig_terminate(0,0,NULL);
 }
 *pointer = shmat(shmid, NULL, 0);
 return shmid;
}
/* init_semaphore -- Allocates a set of semaphores and initializes them */
static int init_semaphore(int *semid, int size, int value, int key)
{
 register int i;
 if ((*semid = semget(key, size, 0666 IPC_CREAT)) < 0) {
 printf("init: allocation of semaphores failed. Errno=%d\n",errno);
 printf(" mynode=%d key=%d size=%d\n",my_node,key,size);
 fflush(stdout);

 sig_terminate(0,0,NULL);
 }
 for (i = 0; i < size; i++) {
 if (semctl(*semid, i, SETVAL, value) < 0) {
 printf("init: initialization of semaphores failed. Errno=%d\n",errno);
 printf(" mynode=%d offset=%d value=%d\n",my_node,i,value);
 fflush(stdout);
 sig_terminate(0,0,NULL);
 }
 }
 return *semid;
}
/* semcall_all -- Perform same operation on all elements of a semaphore. */
static int semcall_all(int semid, int size, int operation)
{
 struct sembuf sbuf[NUMBER_IN_PART+1];
 register int i;
 for (i = 0; i < size; i++) {
 sbuf[i].sem_num = i;
 sbuf[i].sem_op = operation;
 sbuf[i].sem_flg = 0;
 }
 while (semop(semid, sbuf, size) < 0) {
 /* repeat operation if interrupted */
 if (errno != EINTR) {
 printf("%d: Semaphore broadcast failed. Errno = %d\n",mynode(),errno);
 abort_prog();
 exit(-1);
 }
 }
 return 0;
}
/* semcall_one -- Perform an operation on an element of a semaphore. */
static int semcall_one(int semid, int num, int operation)
{
 struct sembuf sbuf;
 sbuf.sem_num = num;
 sbuf.sem_op = operation;
 sbuf.sem_flg = 0;
 while (semop(semid, &sbuf, 1) < 0) {
 /* repeat operation if interrupted */
 if (errno != EINTR) {
 printf("%d: Semaphore failed. Errno = %d\n",mynode(), errno);
 abort_prog();
 exit(-1);
 }
 }
 return 0;
}
/* setpid -- Assigns a partition identifier to the simulated partition. */
int setpid(int id)
{
 my_group = id;
 return 0;
}
/* init_simulator -- Should be called near the beginning of an application
** before any hypercube-related functions are called. */
void init_simulator(void)
{

 register int i, pid;
 char filename[20];
 char *temp;
 static char env[256]; /* must be static */
 struct hostent *hent;
 /* parent cm will send child cm SIGINT when a CTRL-BREAK is pressed */
 signal(SIGINT,sig_terminate);
 signal(SIGTERM,sig_terminate);
 signal(SIGQUIT,sig_terminate);
 /* Pick up the base key value from the environment */
 if ((temp = getenv("SIM_INFO")) == NULL) {
 fprintf(stderr,"init_sim: Missing environment variable\n");
 fflush(stderr);
 exit(-1);
 }

 strcpy(env,temp);
 if ((temp = strtok(env,",")) == NULL) {
 fprintf(stderr, "init_sim: Missing information in environment variable\n");
 fflush(stderr);
 exit(-1);
 }
 sscanf(temp,"%d",&base);
 if ((temp = strtok(NULL,",")) == NULL) {
 fprintf(stderr,"init_sim: Missing node info in environment variable\n");
 fflush(stderr);
 exit(-1);
 }
 sscanf(temp,"%d",&my_node);
 if ((temp = strtok(NULL,",")) == NULL) {
 fprintf(stderr,"init_sim: Missing pid info in environment variable\n");
 fflush(stderr);
 exit(-1);
 }
 sscanf(temp,"%d",&my_group);
 if ((temp = strtok(NULL,",")) == NULL) {
 fprintf(stderr,"init_sim: Missing number of node info in environment
 variable\n");
 fflush(stderr);
 exit(-1);
 }
 sscanf(temp,"%d",&num_nodes);
 num_parts = (num_nodes + NUMBER_IN_PART - 1) / NUMBER_IN_PART;
 my_part = my_node / NUMBER_IN_PART;
 /* Calcuate the number of nodes in this and remaining partitions */
 i = numnodes() - (mypart() * NUMBER_IN_PART);
 nodes_in_part = min(NUMBER_IN_PART, i);
 /* Allocate shared memory */
 shmid_m = init_shared_mem(&buffer, sizeof(message) * numbuffers(), base);
 /* Allocate communications semaphores */
 init_semaphore(&msgavail, numbuffers(), 0, base+10000);
 init_semaphore(&msgfree, numbuffers(), 0, base+20000);
}
/* load -- Should be called near the beginning of a host application before
any
** hypercube-related functions are called, except for getcube. It will start
** the appropriate number of PMs on the appropriate systems (as read from the
** .pmrc file.) Parent process will be node 0, which has special roles on a
** hypercube. Remaining processes will be numbered consecutively. */
int load(char *filename, int which_node, int group_id)

{
 register int i, j, pid, size;
 char *argv[20];

 char base_string[20];
 char partition_number[20];
 char group_string[20];
 char number_of_nodes[20];
 char temp[256];
 char *ptr;
 struct servent *sp;
 FILE *fd;
 /* Allocate space for child pids */
 if (children == NULL) {
 if ((children = malloc(numnodes() * sizeof(int))) == NULL) {
 fprintf(stderr,"load: insufficient memory\n");
 fflush(stderr);
 return -1;
 }
 }
 /* parent will send us SIGINT when CTRL-BREAK is pressed */
 signal(SIGINT,sig_terminate);
 signal(SIGTERM,sig_terminate);
 signal(SIGQUIT,sig_terminate);
 signal(SIGCHLD, unexpected_death);
 base = getpid();
 num_parts = (num_nodes + NUMBER_IN_PART - 1) / NUMBER_IN_PART;
 if (partition == NULL) {
 if ((partition = malloc(num_parts * sizeof(subpart))) == NULL) {
 fprintf(stderr,"load: insufficient memory\n");
 fflush(stderr);
 return -1;
 }
 memset(partition, 0, num_parts * sizeof(subpart));
 if ((fd = fopen(".pmrc","r")) == NULL) {
 fprintf(stderr,"load: Missing configuration file \".pmrc\"\n");
 fflush(stderr);
 return -1;
 }
 for (i = 0; i < num_parts; i++) {
 temp[0] = '\0';
 fscanf(fd," %[^ \n] \n",temp);
 size = strlen(temp);
 if ((ptr = malloc(size+1)) == NULL) {
 fprintf(stderr,"load: Insufficent memory\n");
 fflush(stderr);
 return -1;
 }
 strcpy(ptr,temp);
 partition[i].name = ptr;
 }
 fclose(fd);
 }

 /* Host program's node number is the same as the number of nodes */
 my_node = numnodes();
 my_part = 0;
 /* Calcuate the number of nodes in this and remaining partitions */
 i = numnodes() - (mypart() * NUMBER_IN_PART);

 nodes_in_part = min(NUMBER_IN_PART, i);
 /* Allocate shared memory */
 if (shmid_m == -1) {
 shmid_m = init_shared_mem(&buffer, sizeof(message) * numbuffers(),base);
 }
 memset(buffer,0,sizeof(message) * numbuffers());
 /* Allocate communications semaphores */
 if (msgavail == -1) {
 init_semaphore(&msgavail, numbuffers(), 0, base+10000);
 }
 if (msgfree == -1) {
 init_semaphore(&msgfree, numbuffers(), 0, base+20000);
 }
 /* Split into node processes */
 fflush(stdout);
 fflush(stderr);
 /* Start the local and remote Partition Managers */
 for (i = 0; i < num_parts; i++) {
 if ((pid = fork()) < 0) {
 /* Can't create all the children! */
 killcube(0,0);
 fprintf(stderr, "LOAD: unable to create Partition Managers\n");
 return -1;
 } else if (pid == 0) {
 /* I'm the child process */
 my_node = -1;
 /* Start the Partition Managers */
 if (i == 0) {
 argv[0] = "pm";
 argv[1] = filename;
 sprintf(partition_number, "%d", i);
 argv[2] = partition_number;
 sprintf(base_string,"%d", base);
 argv[3] = base_string;
 sprintf(group_string, "%d", group_id);
 argv[4] = group_string;
 sprintf(number_of_nodes, "%d", numnodes());
 argv[5] = number_of_nodes;
 for (i = 0; i < num_parts; i++) {
 argv[i+6] = partition[i].name;
 }
 argv[i+6] = NULL;
 execvp("pm",argv);
 /* If we get here, we had a problem */
 printf("execvp of PM 0 failed. errno=%d\n",errno);
 fflush(stdout);
 exit(-1);
 } else {
 argv[0] = "rsh";
 argv[1] = partition[i].name;
 argv[2] = "pm";
 argv[3] = filename;
 sprintf(partition_number, "%d", i);
 argv[4] = partition_number;
 sprintf(base_string,"%d", base);
 argv[5] = base_string;
 sprintf(group_string, "%d", group_id);
 argv[6] = group_string;
 sprintf(number_of_nodes, "%d", numnodes());

 argv[7] = number_of_nodes;
 for (i = 0; i < num_parts; i++) {
 argv[i+8] = partition[i].name;
 }
 argv[i+8] = NULL;
 execvp("rsh",argv);
 /* If we get here, we had a problem */
 printf("execvp of PM 0 failed. errno=%d\n",errno);
 fflush(stdout);
 exit(-1);
 }
 } else {
 /* I'm the parent process */
 children[child_index++] = pid;
 }
 }
}
/** Environment Information (External and Internal) **/
/* availmem -- returns amount of memory available */
int availmem(void)
{
 return 0;
}
/* nodedim -- Returns the dimension of the simulated hypercube */
int nodedim(void)
{
 unsigned int i, temp;
 temp = num_nodes;
 i = 0;
 while (temp != 0) {
 temp >> 1;
 i++;
 }
 return i;
}
/* numnodes -- Returns the number of simulated nodes */
int numnodes(void)
{
 return num_nodes;
}
/* numparts -- number of simulator partitions */
static int numparts(void)
{
 return num_parts;
}
/* mypart -- Simulator partition this process is in */
static int mypart(void)
{
 return my_part;
}
/* partof -- Determines which simulator partition a given node is member of */
static int partof(int n)
{
 if (n == myhost()) {
 return 0;
 } else {

 return n / NUMBER_IN_PART;
 }

}
/* numbuffers -- Number of buffers in this simulator partition */
static int numbuffers(void)
{
 if (mypart() == 0) {
 return nodes&us.in&us.part + 2;
 } else {
 return nodes&us.in&us.part + 1;
 }
}
/* mybuffer -- returns the index for this process's buffer */
static int mybuffer(void)
{
 if (mynode() == myhost()) {
 return nodes&us.in&us.part+1;
 } else {
 return mynode() % NUMBER_IN_PART;
 }
}
/* bufferof -- Returns the buffer offset of the given node. The host is always
** the last buffer in partition 0. The PM is always second to last buffer */
static int bufferof(int n)
{
 if (mypart() != partof(n)) {
 return nodes_in_part; /* Return the buffer of PM */
 } else if (n == myhost()) {
 return nodes_in_part + 1; /* This partition, buffer of host */
 } else {
 return n % NUMBER_IN_PART; /* This partition, buffer of node */
 }
}
/* mynode -- Returns the node number for this process */

int mynode(void)
{
 return my_node;
}
/* mypid -- Returns the group number */
int mypid(void)
{
 return my_group;
}
/* myhost -- Returns the node number of the host */
int myhost(void)
{
 return numnodes();
}
/** Communications **/
/* cread -- Special read for files on hypercube's high-speed disk system. We
just issue a standard read instead. */
int cread(int fd, void *buffer, int size)
{
 return read(fd, buffer, size);
}
/* gdsum -- Sum individual elements of an array on all processes */
void gdsum(double x[], long elements, double work[])
{
 register int i,j;
 double temp;

 if ((mybuffer()) == 0) {
 /* The first node in each partition sums the local data */
 if (nodes_in_part > 1) {
 /* Only sum when we aren't the only ones in the partition */
 for (i = 1; i < nodes_in_part; i++) {

 /* Get the next set of numbers to sum */
 crecv(-2, work, elements * sizeof(double));
 for (j = 0; j < elements; j++) {
 x[j] += work[j];
 }
 }
 }
 /* Node 0 sums for all partitions */
 if (mynode() == 0) {
 /* Only sum if there are more than one partition */
 if (numparts() > 1) {
 for (i = 1; i < numparts(); i++) {
 /* Get the next set of numbers to sum */
 crecv(-3, work, elements * sizeof(double));

 for (j = 0; j < elements; j++) {
 x[j] += work[j];
 }
 }
 }
 /* Only broadcast if there is more than one node */
 if (nodes_in_part > 1) {
 /* Broadcast the results */
 csend(-4,x,elements * sizeof(double),-1,mypid());
 }
 } else {
 /* Each partition needs to send the partial sum to node 0 */
 csend(-3,x,elements * sizeof(double),0,mypid());
 /* Wait for the answer */
 crecv(-4,x,elements * sizeof(double));
 }
 } else {
 /* Send the data to local node to do the summation */
 csend(-2,x,elements * sizeof(double),mypart()*4,mypid());
 /* Wait for the answer */
 crecv(-4,x,elements * sizeof(double));
 }
}
/* getmsg -- Wait until a message is available. OUTPUT: next_message, set to
** the message found of the proper type */
static void getmsg(void)
{
 int i;
 /* Only wait if a message is not already selected */
 if (next_message != -1) return;

 /* Wait for a message for me */
 semcall_one(msgavail, mybuffer(), -1);
 /* Search for those messages that are for me */
 for (i = 0; i < numbuffers(); i++) {
 if (buffer[i].valid[mybuffer()] == 1) {
 if ((buffer[i].dnode == mynode()) (buffer[i].dnode == -1)) {
 next_message = i;

 return;
 }
 }
 }
}
/* cprobe -- Wait until a message of a specific type is available. OUTPUT:
** next_message, set to the message found of the proper type */
void cprobe(int type)
{
 int i,j;
 /* Make sure all pending writes in application have occured */
 fflush(stdout);
 fflush(stderr);
 /* See if a specific type was requested */
 if (type == -1) {
 getmsg();
 return;
 } else if ((next_message != -1) && (type == buffer[next_message].type)) {
 /* message was already located */
 return;
 } else {
 while (1) {
 /* Wait for a message for me */
 semcall_one(msgavail, mybuffer(), -1);
 /* Search for those messages that are for me and is the type I need */
 for (i = 0; i < numbuffers(); i++) {
 if (buffer[i].valid[mybuffer()] == 1) {
 if ((buffer[i].dnode == mynode()) (buffer[i].dnode == -1)) {
 if (buffer[i].type == type) {
 next_message = i;
 /* Put back all skipped messages back */
 for (j = 0; j < numbuffers(); j++) {
 if (buffer[j].valid[mybuffer()] == 2) {
 buffer[j].valid[mybuffer()] = 1;
 semcall_one(msgavail, mybuffer(), 1);
 }

 }
 return;
 } else {
 /* Mark the message so that we don't look at it again */
 buffer[i].valid[mybuffer()] = 2;
 }
 }
 }
 }
 }
 }
}
/* infocount -- Return the length of the message that will be received. */
int infocount(void)
{
 getmsg();
 return buffer[next_message].t_length;
}
/* infonode -- Returns the node that sent the message */
int infonode(void)
{
 getmsg();

 return buffer[next_message].snode;
}
/* infopid -- Returns the group (pid) of the node that sent the message */
int infopid(void)
{
 getmsg();
 return buffer[next_message].spid;
}
/* csend -- Synchronous message sending between two nodes. If the target node
** number is -1, then the message is broadcasted to all nodes. Limitations:
** Assumes that the message buffer is free to use. In other words, if
** an asynchronous send was previously done, we assume that a msgwait
** was done to ensure the previous message reached its destination. */
int csend(int type, void *msg, int length, int target_node, int group)
{
 int i,j, sent_length = 0;
 char *source;
 i = mybuffer();
 /* Fill in the message */
 source = msg;
 buffer[i].type = type;
 buffer[i].dnode = target_node;
 buffer[i].spid = mypid();
 buffer[i].snode = mynode();
 buffer[i].t_length = length;
 while (length > 0) {
 /* Divide the message into smaller chunks */
 buffer[i].length = min(MAX_MESSAGE_SIZE, length);
 memcpy(buffer[i].msg, source, buffer[i].length);
 source += buffer[i].length;
 sent_length += buffer[i].length;
 length -= buffer[i].length;
 /* Tell the node(s) the message is there */
 if (target_node == -1) {
 /* Broadcast the message to nodes in this partition */
 /* and to the process manager */
 for (j=0; j < nodes&us.in&us.part + 1; j++) {
 buffer[i].valid[j] = 1;
 }
 semcall_all(msgavail,nodes&us.in&us.part+1, 1);
 /* Of course, we already have the message */
 semcall_one(msgavail,i, -1);
 /* Wait until everyone receives the message */
 semcall_one(msgfree, i, -nodes&us.in&us.part);
 } else {
 /* Point to point to another node */
 j = bufferof(target_node);
 buffer[i].valid[j] = 1;
 semcall_one(msgavail, j, 1);
 /* Wait until it is received */
 semcall_one(msgfree, i, -1);
 }
 }
 return sent_length;
}
/* crecv -- Synchronous message reception between two nodes. */
int crecv(int type, void *buf, int len)
{
 int recv_len = 0, copy_len = 0, temp_len, total_len;

 int recv_node;
 char *target;
 /* Get a message of this type */
 cprobe(type);
 target = buf;
 total_len = buffer[next_message].t_length;
 recv_node = buffer[next_message].snode;
 do {
 if (recv_node != buffer[next_message].snode) {
 /* Message is from another node, put off receiving */
 buffer[next_message].valid[mybuffer()] = 3;
 } else {
 /* Message is from same node, add it the previous messages */
 recv_len += buffer[next_message].length;
 temp_len = min(len - copy_len, buffer[next_message].length);
 if (temp_len > 0) {
 memcpy(target, buffer[next_message].msg, temp_len);
 target += temp_len;
 }
 copy_len += buffer[next_message].length;
 /* Acknowledge the receipt of the message */
 buffer[next_message].valid[mybuffer()] = 0;
 semcall_one(msgfree, next_message, 1);
 }
 /* Indicate that no message has been selected */
 next_message = -1;
 if (recv_len < total_len) {
 cprobe(type);
 }
 } while (recv_len < total_len);
 /* Scan buffers to restore any skipped messages */
 for (i = 0; i < numbuffers(); i++) {
 if (buffer[i].valid[mybuffer()] == 3) {
 buffer[i].valid[mybuffer()] = 1;
 semcall_one(msgavail, mybuffer(), 1);
 }
 }
 return total_len;
}
/* isend -- Asynchronous message sending between two nodes. If the target node
** number is -1, then the message is broadcasted to all nodes. Limitations:
** Assumes that the message buffer is free to use. In other words, if
** an asynchronous send was previously done, we assume that a msgwait
** was done to ensure the previous message reached its destination. */
int isend(int type, void *msg, int length, int target_node, int group)
{
 int i,j, sent_length = 0;
 char *source;
 i = mybuffer();
 buffer[i].type = type;
 buffer[i].dnode = target_node;
 buffer[i].spid = mypid();
 buffer[i].snode = mynode();
 buffer[i].t_length = length;
 while (length > 0) {
 /* Divide the message into smaller chunks */
 buffer[i].length = min(MAX_MESSAGE_SIZE, length);
 memcpy(buffer[i].msg, source, buffer[i].length);
 source += buffer[i].length;

 sent_length += buffer[i].length;
 length -= buffer[i].length;
 /* Tell the node(s) the message is there */
 if (target_node == -1) {
 /* Broadcast the message to nodes in this partition */
 /* and to the process manager */
 for (j=0; j < nodes_in_part+1; j++) {
 buffer[i].valid[j] = 1;
 }
 semcall_all(msgavail,nodes_in_part+1, 1);
 /* Of course, we already have the message */
 semcall_one(msgavail,i, -1);
 /* Wait for acknowledge on all but the last part */
 if (length > 0) {
 /* Wait until everyone receives the message */
 semcall_one(msgfree, i, -nodes_in_part);
 }
 } else {
 /* Point to point to another node */
 j = bufferof(target_node);
 buffer[i].valid[j] = 1;
 semcall_one(msgavail, j, 1);
 /* Wait for acknowledge on all but the last part */
 if (length > 0) {
 /* Wait until it is received */
 semcall_one(msgfree, i, -1);
 }
 }

 }
 /* Return which buffer needs to be waited on */
 return i;
}
/* irecv -- Asynchronous message reception between two nodes. Returns message
** identifier for acknowledging the message. */
int irecv(int type, void *buf, int len)
{
 int mid;
 int recv_len = 0, copy_len = 0, temp_len, total_len;
 char *target;
 /* Get a message of this type */
 cprobe(type);
 mid = next_message;
 target = buf;
 total_len = buffer[next_message].t_length;
 do {
 recv_len += buffer[next_message].length;
 temp_len = min(len - copy_len, buffer[next_message].length);
 if (temp_len > 0) {
 memcpy(target, buffer[next_message].msg, temp_len);
 target += temp_len;
 }
 copy_len += buffer[next_message].length;
 /* Acknowledge all but last partial message */
 if (recv_len < total_len) {
 /* Acknowledge the receipt of the message */
 buffer[next_message].valid[mybuffer()] = 0;
 semcall_one(msgfree, next_message, 1);
 }

 /* Indicate that no message has been selected */
 next_message = -1;
 if (recv_len < total_len) {
 cprobe(type);
 }
 } while (recv_len < total_len);
 return mid;
}
/* msgwait -- Wait for a message to be received by the target node(s) */
void msgwait(int mid)

{
 if (mid == mybuffer()) {
 /* Then it was a send to another node */
 if (buffer[mid].dnode == -1) {
 /* Wait for everyone to receive the message */
 semcall_all(msgfree, mid, -nodes&us.in&us.part);
 } else {
 semcall_one(msgfree, mid, -1);
 }
 } else {
 /* It was a receive from another node */
 semcall_one(msgfree, mid, 1);
 }
}
/* flushmsg -- Forces the removal of pending messages to a node */
void flushmsg(int type, int target_node, int group)
{
 /* Do nothing for now */
 fflush(stdout);
 fflush(stderr);
}
/* mclock -- Return time in milliseconds. */
unsigned long mclock(void)
{
 unsigned long current_time;
 time(&current_time);
 return current_time * 1000;
}























December, 1992
INSIDE THE ISO-9660 FILESYSTEM FORMAT


Untangling CD-ROM standards




William Frederick Jolitz and Lynne Greer Jolitz


Bill and Lynne are the developers of 386BSD and authors of the long-running
DDJ series, "Porting UNIX to the 386." They can be contacted at ljolitz
@cardio.ucsf.EDU or on CompuServe at 76703,4266.


Over the last ten years, we've gone from shuffling through floppies to using
hard disks and client/server networks to store-and-retrieve applications
software. However, since data sets in use on the PC have been doubling in size
every year, even hard disks are hard-pressed to meet demand. The need to
archive and retrieve ever-growing amounts of information, coupled with the
storage demands of today's software, has resulted in a search for better
storage technologies, which cannot only store larger amounts of data, but also
allow retrieval to be simple, error free, low cost, and timely. While fixed
disk-drive technology has kept up admirably with current requirements, the
pace of change in removable media technologies has been disappointing at best,
with one exception: CD-ROM.
Skeptics point out that CD-ROM technology is close to a decade old, claim it's
been made obsolete by read/write medium and improvements in recording density,
and declare that CD-ROM drives and titles are only of interest to a limited
audience. The economics of large software distribution make CD-ROM an
excellent alternative to floppy distribution. Once a CD-ROM is mastered (at a
typical cost of $1200.00), each individual CD-ROM costs the developer
approximately $1.75. Instead of many failure-prone floppies, only one durable
CD-ROM need be supplied per customer, with room for expansion later if need
be.
Many firms with large software distributions or applications (Sun, Microsoft,
and ParcPlace, for instance) or large databases (such as the Oxford English
Dictionary) have moved to CD-ROM as their distribution media, and several PC
manufacturers now supply PCs with CD-ROM drives as standard equipment.


Differences Between CD and CD-ROM


The media on which both audio CDs and data CD-ROM information is recorded is a
wonderful example of engineering simplicity--a simple circle of plastic
grooved to hold the fair portion of a gigabyte. Very little can go wrong: No
magnetic fields can be erased; no temperature-sensitive compound (like those
found in erasable-optical) can be cooked or frozen; and no fingerprints on the
tracks can create errors.
However, CDs and CD-ROMs are not completely interchangeable. While you can
insert a CD into a CD-ROM player and plug in headphones to hear music, you
cannot read the CD digital information directly off the disk. This is because
CDs and CD-ROMs are mastered in different formats. By the same token, while
you can "play" a CD-ROM in a CD player, the sound is not exactly
stimulating--all CD-ROMs play the same droning squeal. (It is possible to play
audio off a portion of a CD-ROM by arranging the formats appropriately.)


Some Things Never Change...


CD-ROM drives in general function like any other write-protected disk drive,
and at the device-driver level appear quite ordinary. (See the textbox
entitled, "An Overview of CD-ROM Hardware.") However, CDs (including CD-ROMs)
do not have a fixed number of sectors in a fixed-arm position. Instead, an
inward spiral of records is arranged to maintain minimal latency from record
to record. Thus, audio playback does not drop out or lag behind on obtaining
encoded data for the digital-to-analog converter. (If you "shake" a CD player
while it's operating, it will lose the track it's on, fall behind, and drop
out the signal.)
CDs are not random access, but rather sequential access, which is why they
tend to be slow--the head "hunts" to find the desired record. (The fastest
CD-ROMs have 260-280 millisecond access rates with 300 Kbytes/sec transfer
rates.) The records (or sectors, in the case of CD-ROMs) are indexed by track
and running time into the track in the same way cylinder, head, and sector
indices are used in a hard-disk drive. (With SCSI, these records are hidden by
"logical" address translation in the drive's controller.) Unlike with hard
drives, a separate head is not used for timing information, so each time a
CD-ROM is moved for random access it must search up and down the spiral in the
vicinity the head let down to find the desired sector. This "latency penalty"
is a key difference between CD-ROM drives and other disk drives.
CD-ROM sector sizes are large, usually two Kbytes per sector, and larger
sector sizes are possible. As a result, the CD-ROM filesystem uses "logical"
sector sizes to standardize the increments to which the drive's contents can
be referred. Some SCSI CD-ROM drives have a feature that allows them to make a
drive appear to be at any selected block size (even ones less than the
physical sector size of the actual CD-ROM) to accommodate software which
insists on certain sector sizes.


Filesystem Organization


A CD-ROM may be mastered with any kind of information on it. (Sun Microsystems
uses the Berkeley UNIX UFS filesystems on many CD-ROMs, and CD-ROMs embedded
in equipment such as bitmaps for laser printers often have no filesystem
arrangement.) However, because CD-ROMs are especially suited to the volume
publishing of information, a standard filesystem useful across many kinds of
architectures and appliances (such as a CD-ROM viewer or CD-I player) is very
desirable. All early CD-ROM personal-computer applications used the High
Sierra format, which arranged file information in a dense, sequential layout
to minimize nonsequential access.
When a CD-ROM is browsed up and down directories, large delays occur as the
drive moves its heads. Reading any of the files in that same directory results
in much less delay, however. (This effect can be best seen in the software
archive disks, where the aggregate contents of files in a directory are large,
so the distance spanned by each directory is significant.)
The High Sierra filesystem format uses a hierarchical (eight levels of
directories deep) tree filesystem arrangement, similar to UNIX and MS-DOS. It
organizes information in a breadth-first traversal of this tree, with
individual files contiguously allocated in dense organization, to reduce
storage latency; see Figure 1.
Unlike UNIX and MS-DOS (and many other) filesystems which have blocks
allocated from a separate list of nonsequential disk blocks, High Sierra
stores file contents in a sequential extent-based arrangement, so that the
file contents appear on consecutive logical sectors. This reduces file-access
latency. Since the medium is read only, one does not have to "compact" out the
holes that grow in extent-based filesystems used for read/write purposes
(which justifies the nonsequential nature of block-allocated filesystems).
High Sierra has a minimal set of file attributes (directory or ordinary file
and time of recording) and name attributes (name, extension, and version). The
designers realized they could never get people to agree on a unified
definition of file attributes, so the minimum "common" information was
encoded, and a place for future optional extensions (system use area) was
defined for each file.
High Sierra was soon adopted (with changes) as an international standard
(ISO-9660-1988), and the ISO-9660 filesystem format is now used throughout the
industry. It's truly remarkable that a standard this significant was developed
so quickly and accurately.


ISO-9660 in Detail


An ISO-9660 CD-ROM is described in Figure 2. A reserved field at the beginning
of the disk is present for use in booting the CD-ROM on a computer.
Immediately afterwards, a series of volume descriptors details the contents
and kind of information contained on the disk (somewhat akin to the partition
table of MS-DOS).
A volume descriptor is broken up into two parts; one specific to the standard
itself (the type of volume descriptor), and the other detailing the
characteristics of the descriptor. The volume descriptor is constructed in
this manner so that if a program reading the disk does not understand a
particular descriptor, it can skip over it until it finds one it recognizes,
thus allowing the use of many different types of information on one CD-ROM.
However, it must have a primary descriptor describing the ISO-9660 filesystem,
and it must have an ending descriptor (a variable-length table which contains
information on how many other descriptors are present).
It is possible to have many kinds of filesystems and information arrangements
on a single CD-ROM. However, while many other kinds of descriptors can be used
to optionally record non-ISO defined information contents, the primary volume
descriptor is always present.
In order to accommodate the two common byte orders, Big Endian (680x0) and
Little Endian (80x86), ISO-9660 has data types which allow either and
consequently are twice as big. The "least significant" half holds the Little
Endian and the "most significant" half the Big Endian. Thus, the 32-bit
integer (0x11223344) is represented as the byte sequence (0x44, 0x33, 0x22,
0x11, 0x22, 0x22, 0x33, 0x44)--essentially a binary palindrome.


ISO-9660 Primary Volume Descriptor



The ISO-9660 primary volume descriptor describes the characteristics of the
ISO-standard filesystem information present on a given CD-ROM (refer to Figure
3 ). It acts much like the superblock of the UNIX filesystem, providing
details on the ISO-9660 compliant portions of the disk. While we can have many
kinds of filesystems on a single ISO-9660 CD-ROM, we can have only one
ISO-9660 file structure (found as the primary volume-descriptor type).
Figure 3: File structure of an ISO-9660 and High Sierra primary volume
descriptor.

 /* volume descriptor types -- type field of each descriptor */
 #define VD_PRIMARY 1
 #define VD_END 255

 /* ISO 9660 primary descriptor */
 #define ISODCL(from, to) (to - from + 1)

 #define ISO_STANDARD_ID "CD001"

 struct iso_primary_descriptor {
 char type [ISODCL ( 1, 1)];
 char id [ISODCL ( 2, 6)];
 char version [ISODCL ( 7, 7)];
 char reserved1 [ISODCL ( 8, 8)];
 char system_id [ISODCL ( 9, 40)]; /* achars */
 char volume_id [ISODCL ( 41, 72)]; /* dchars */
 char reserved2 [ISODCL ( 73, 80)];
 char volume_space_size [ISODCL ( 81, 88)];
 char reserved3 [ISODCL ( 89, 120)];
 char volume_set_size [ISODCL (121, 124)];
 char volume_sequence_number [ISODCL (125, 128)];
 char logical_block_size [ISODCL (129, 132)];
 char path_table_size [ISODCL (133, 140)];
 char type_1_path_table [ISODCL (141, 144)];
 char opt_type_1_path_table [ISODCL (145, 148)];
 char type_m_path_table [ISODCL (149, 152)];
 char opt_type_m_path_table [ISODCL (153, 156)];
 char root_directory_record [ISODCL (157, 190)];
 char volume_set_id [ISODCL (191, 318)]; /* dchars */
 char publisher_id [ISODCL (319, 446)]; /* achars */
 char preparer_id [ISODCL (447, 574)]; /* achars */
 char application_id [ISODCL (575, 702)]; /* achars */
 char copyright_file_id [ISODCL (703, 739)]; /* dchars */
 char abstract_file_id [ISODCL (740, 776)]; /* dchars */
 char bibliographic_file_id [ISODCL (777, 813)]; /* dchars */
 char creation_date [ISODCL (814, 830)];
 char modification_date [ISODCL (831, 847)];
 char expiration_date [ISODCL (848, 864)];
 char effective_date [ISODCL (865, 881)];
 char file_structure_version [ISODCL (882, 882)];
 char reserved4 [ISODCL (883, 883)];
 char application_data [ISODCL (884, 1395)];
 char reserved5 [ISODCL (1396, 2048)];

 }:

 /* High Sierra format primary descriptor */
 #define HSFDCL (from, to) (to - from + 1)

 #define HSF_STANDARD_ID "CDROM"

 struct hsf_primary_descriptor {
 char volume_lbn [HSFDCL ( 1, 8)];
 char type [HSFDCL ( 9, 9)];
 char id [HSFDCL ( 10, 14)];
 char version [HSFDCL ( 15, 15)];

 char reserved1 [HSFDCL ( 16, 16)];
 char system_id [HSFDCL ( 17, 48)]; /* achars */
 char volume_id [HSFDCL ( 49, 80)]; /* dchars */
 char reserved2 [HSFDCL ( 81, 88)];
 char volume_space_size [HSFDCL ( 89, 96)];
 char reserved3 [HSFDCL ( 97, 128)];
 char volume_set_size [HSFDCL (129, 132)];
 char volume_sequence_number [HSFDCL (133, 136)];
 char logical_block_size [HSFDCL (137, 140)];
 char path_table_size [HSFDCL (141, 148)];
 char manditory_path_table_lsb [HSFDCL (149, 152)];
 char opt_path_table_lsb_1 [HSFDCL (153, 156)];
 char opt_path_table_lsb_2 [HSFDCL (157, 160)];
 char opt_path_table_lsb_3 [HSFDCL (161, 164)];
 char manditory_path_table_msb [HSFDCL (165, 168)];
 char opt_path_table_msb_1 [HSFDCL (169, 172)];
 char opt_path_table_msb_2 [HSFDCL (173, 176)];
 char opt_path_table_msb_3 [HSFDCL (177, 180)];
 char root_directory_record [HSFDCL (181, 214)];
 char volume_set_id [HSFDCL (215, 342)]; /* dchars */
 char publisher_id [HSFDCL (343, 470)]; /* achars */
 char preparer_id [HSFDCL (471, 598)]; /* achars */
 char application_id [HSFDCL (599, 726)]; /* achars */
 char copyright_file_id [HSFDCL (727, 758)]; /* dchars */
 char abstract_file_id [HSFDCL (759, 790)]; /* dchars */
 char creation_date [HSFDCL (791, 806)];
 char modification_date [HSFDCL (807, 822)];
 char expiration_date [HSFDCL (823, 838)];
 char effective_date [HSFDCL (839, 854)];
 char file_structure_version [HSFDCL (855, 855)];
 char reserved4 [HSFDCL (856, 856)];
 char application_data [HSFDCL (857, 1368)];
 char reserved5 [HSFDCL (1369, 2048)];

Contained within the primary volume descriptor is the root-directory record
describing the location of the contiguous root directory. (As in UNIX,
directories appear as files for the operating system's special use.) Within
this region, directory entries are successively stored. The evaluation of the
ISO-9660 filenames is begun at this location. The root directory is stored as
is any other file--as an extent or sequential series of sectors that contains
each of the directory entries appearing in the root. In addition, since
ISO-9660 works by segmenting the CD-ROM into logical blocks, the size of these
blocks is found in the primary volume descriptor as well.


Directory-entry Format


Each directory entry begins with a length octet describing the size of the
entry. Entries themselves are of variable length, up to 255 octets in size.
Attributes for the file described by the directory entry are stored in the
directory entry itself (unlike UNIX).
In Figure 4, the root-directory entry is a variable-length object, so that the
name can be of variable length. (No other part in the directory entry is of
variable length.)
Figure 4: Data structure of a CD-ROM filesystem directory entry.

 /* CDROM file system directory entries */

 /* file flags: */
 #define CD_VISABLE 0x01 /* file name is hidden or visable to user */
 #define CD_DIRECTORY 0x02 /* file is a directory and contains entries */
 #define CD_ASSOCIATED 0x04/* file is opaque to filesystem, visable
 to system implementation */
 #define CD_EAHSFRECORD 0x04 /* file has HSF extended attribute record
 fmt */
 #define CD_PROTECTION 0x04 /* used extended attributes for protection */
 #define CD_ANOTHEREXTNT 0x80 /* file has at least one more extent */

 struct iso_directory_record {
 char length [ISODCL (1, 1)];
 char ext_attr_length [ISODCL (2, 2)];

 char extent [ISODCL (3, 10)];
 char size [ISODCL (11, 18)];
 char date [ISODCL (19, 25)];
 char flags [ISODCL (26, 26)];
 char file_unit_size [ISODCL (27, 27)];
 char interleave [ISODCL (28, 28)];
 char volume_sequence_number [ISODCL (29, 32)];
 char name_len [ISODCL (33, 33)];
 char name [0];

 }:

 struct hsf_directory_record {

 char length [HSFDCL (1, 1)];
 char ext_attr_length [HSFDCL (2, 2)];
 char extent [HSFDCL (3, 10)];
 char size [HSFDCL (11, 18)];
 char date [HSFDCL (19, 24)];
 char flags [HSFDCL (25, 25)];
 char reserved1 [HSFDCL (26, 26)];
 char interleave_size [HSFDCL (27, 27)];
 char interleave [HSFDCL (28, 28)];
 char volume_sequence_number [HSFDCL (29, 32)];
 char name_len [HSFDCL (33, 33)];
 char name [0];
 }:



File Attributes


File attributes are very simple in ISO-9660. The most important file attribute
is determining whether the file is a directory or an ordinary file. Additional
attributes which make the file "invisible" to various programs, but present
nonetheless. These are used by some systems to store information adjacent to
"visible" files without letting the user note their presence. Data and time
stamps also exist for each file.
The ISO-9660 directory entries and attributes do not provide enough
information for UNIX file attributes to be reconstructed, and they require the
use of extensions (such as the Rock Ridge extensions) to be complete.


Filenames


Filenames in ISO-9660 correspond to a DOS-like representation, with an
uppercase, fixed-size base name, a delimiter (a period) to separate filenames
from the extension, and a three-letter extension name (also uppercase).
Following the extension, you can optionally add on a delimiter (a semicolon)
and a revision number of the file. (For example, a typical filename would be
FOO.BAR;1.) There are additional restrictions on the type of allowed
characters beyond that of alpha characters.
The choice of filename is thus restricted to allow for the vast number of
different systems that existed at the time the standard was determined. While
the directory entries allow much larger names than this, the characteristics
and size of the filename were developed to achieve "level-one" compliance with
the original High Sierra format.
Unfortunately, many systems with ISO-9660 capability are not compatible with
these naming conventions. (For example, on a UNIX system, a semicolon is used
as a command delimiter in the shell, among other things.) Therefore, systems
programmers place code within the ISO filesystem to translate the name into
something more acceptable. (Again, for UNIX systems, uppercase letters are
translated to lower case, semicolons and trailing version numbers are removed,
modes are translated, and the ownership of the file is replicated from the
UNIX directory on which the ISO-9660 filesystem is mounted.)


File Pathname Traversal


There are two ways to locate a file on an ISO-9660 filesystem. One way is to
successively interpret the directory names and look through each directory
file structure to find the file (much the way MS-DOS and UNIX work to find a
file). The other way is through the use of a precompiled table of paths, where
all the entries are enumerated in the successive contents of a file with the
corresponding entries. (Since some systems do not have a mechanism for
wandering through directories, they obtain a match by consulting the table.)
While a large linear table seems a bit arcane, it can be of great value, as
you can quickly search without wandering across the disk (thus reducing seek
time).
More Details.


File Contents


The ISO-9660 standard says practically nothing about the contents of files
themselves--they can contain any kind of data one wishes to store. (The one
exception is that of "extended attributes," which will be discussed in a
future article.)



cdromcheck: Decoding a CD-ROM


Listing One (page 114) is cdromcheck, a simple C program that checks the
CD-ROM to determine if it is arranged in High Sierra or ISO-9660 filesystem
format and lists the volume descriptors present; see Listing Two, page 114.
One important difference between High Sierra and ISO-9660 formats is outlined
in a listing of the volume descriptor. The first CDROM examined, Rich Morin's
"Prime Time Freeware" vol 1.1 CDROM, lists a single ISO-9660 volume
descriptor. Discovery System's "CDROM Sampler," on the other hand, lists two
High Sierra volume descriptors, varying only in their volume logical-block
number. The High Sierra format listed replicated volume descriptors (though
not the data itself) in case a volume descriptor was damaged. This practice
has since been deemed unnecessary and is absent in the ISO-9660 standard.
There are three areas of difference between ISO-9660 format and High Sierra:
the format of the volume descriptors, the constant that signifies the kind of
volume descriptor, and a minor change to accommodate a time zone. It's a
tribute to the designers of the High Sierra filesystem that so few changes
were required to make it a standard.


cdromcat: Viewing a CD-ROM File


Listing Four (cdromcat.c), page 115, is a simple program to interpret the
CD-ROM filesystem and return the contents of a file. It consists of three
major sections:
1. A user-application section which allows the user to interact with the
remainder of cdromcat.
2. The filesystem primitives for the CD-ROM filesystem.
3. The object output routines which format and print the output (in this
example, the contents of the PTF CD-ROM examined with cdromcheck). Object
output routines are never used within the operating system itself in this
manner, but they are a mandatory component of any good user applications
program.
The filesystem primitives for the CD-ROM filesystem are the heart of the
program. First we check for the presence of a CD-ROM filesystem, then obtain
the blocks for a directory entry on the CD-ROM, search the contents of this
directory entry for a file, and then lookup the pathname filename (by
translating it), locate it, and return as found directory entry.
Throughout cdromcat.c, we use as a handle a machine-independent structure
which refers to the named-associated file, with macros allowing for object
translation into either ISO-9660 or High Sierra filesystem format. Note that
filesystems are excellent vehicles for object-oriented programming (such as
coding in C++).
While both ISO-9660 and High Sierra filesystems allow for different byte
ordering, the trivial macros used in the header file definitions for our
CD-ROM filesystem (cdromfs.h, Listing Three, page 114) assume that cdromcat.c
is being run on a Little Endian system, in our case a 386/486 PC. We have left
it to the reader to enhance this program to run on a Big Endian (68000 or
SPARC) system.


386BSD isofs: ISO-line 9660 Filesystem


A complete implementation of the ISO-9660 filesystem can be found in 386BSD
Release 0.1 and all future releases. Initially done by Pace Willison, this
implementation allows CD-ROMs to be mounted as if they were an ordinary 386BSD
filesystem. After mounting, one can use the shell to browse directories
(through the use of the ls and cd commands).
Upon examination of the 386BSD version, it should become quite apparent that
writing a filesystem for inclusion in the 386BSD kernel is significantly
different from the little applications programs discussed earlier. The code in
the ISO-9660 filesystem implementation in 386BSD is really a series of methods
to interpret virtual filesystem node (vnode) operations given the underlying
virtual filesystem (VFS). These operations are specific kernel interfaces used
during all filename translations and operations within the kernel.
The VFS interface itself does not refer in any way to the actual physical
device, nor does the code within the ISO-9660 filesystem. This intimacy is
exported to the lower I/O filesystem via the block interface, beneath which
the decision is made to obtain the information off a device. Since the VFS and
the ISO-9660 filesystem code are not device specific (only filesystem
specific), you could actually substitute an ordinary disk drive for the CD-ROM
drive and they would work just the same.
The level of abstraction is simple and hierarchically arranged. At the top,
the kernel system calls deal with the files. Below this lies the VFS, its
virtual operations (vops) determining what specific filesystem type is
present. Below the VFS lies the filesystem itself, followed by the block I/O
interface to the device driver, which actually performs the operations on the
device that has the ISO-9660 filesystem recorded. This arrangement allows for
a more elegant and flexible mechanism than specific user applications.
386BSD Release 0.1 for the 386/486 AT PC (binary, source, and additional
packages) is currently available via anonymous ftp at agate.berkeley.edu (IP
address 128.32.136.1) or any of its mirror sites. You can obtain the Tiny
386BSD installation floppy to qualify, partition, download, and
install/extract the rest of the distribution as part of the DDJ Careware
Program. Simply send a formatted, error-free, high-density (either 3.5- or
5.25-inch) floppy and a SASE mailer to: Tiny 386BSD, Dr. Dobb's Journal, 411
Borel Avenue, San Mateo, CA 94402, and we will make a copy and send it back to
you. There is no charge for the service, but if you would like to slip in a
dollar or more for the Children's Support League of the East Bay (as part of
DDJ's Careware Program), we'll make sure it gets to the disadvantaged children
helped by this great charity!


CD-ROM Extensions


In a future article, we'll discuss extensions to CD-ROM, including the Rock
Ridge extensions (soon to become an IEEE standard), CDI, and CDROM-XA. With
extensions, you can allow a CD-ROM to appear like a given target operating
system (such as UNIX filesystem). By encoding these extensions (sharing use
protocols), you can also have separate sets of attributes for the same
filesystem, thus allowing organization of extended information for different
systems (such as VMS and UNIX) in a nonconflicting way. We will also revisit
our cdromcat.c program (Listing Four) and revise it to read ISO-9660 CD-ROMs
with Rock Ridge extensions.


An Overview of CD-ROM Hardware


CD-ROM drives, like other write-protected disk drives, use a moving head
pasing over a rotating media to index and obtain a desired sector of
information. One significant difference between a CD-ROM and other drives,
however, lies in the access (or seek) time of the drive itself. Originally,
CD-ROM drives were built from the same drive mechanisms used in audio
products, and it took several seconds to scan across the disk to find a track
(still incredibly fast compared to an audio cassette).
Newer, high-end CD-ROM drives allow fractional-second (200-millisecond) seek
times--more appropriate for the pace of software applications. These drives
are slower than existing Winchester disk drives because the head assemblies
weigh more (lens system, laser, and photo diode) and require fine positioning.
Another limitation on access time is that data is recorded in a constant
bit-rate spiral, which hinders random indexing without additional rotational
searching.
Low-end drives accept naked CD-ROMs, just like an ordinary CD player. High-end
drives, in contrast, use a plastic carrier to prevent wear from CD-ROM
insertions.
Almost all CD-ROM drives offer a stereo audio jack that allows the drive to
play ordinary CDs. However, this does not necessarily mean that a CD-ROM drive
can be used to read a CD's digitally encoded audio information, although
CD-ROMs can contain voice tracks that can be played.
CD-ROM drives are usually inter-faced one of two ways: Early CD-ROMs used a
custom interface board sold with the unit; later drives used the Small
Computer Systems Interface (SCSI) to access the drive. While custom interfaces
(very similar to floppy-disk interfaces) are present to this day in low-end
units, SCSI units are becoming more popular as the demands for bandwidth and
interconnection displace the cost of a more elaborate interface.
--L.G.J. & W.F.J.

_INSIDE THE ISO-966 FILESYSTEM FORMAT_
by William Frederick Jolitz and Lynne Greer Jolitz


[LISTING ONE]

/* cdromcheck: A simple program to check what kind of CDROM we have, and to
 * list the volume descriptors that are present, if any. */

#include <stdio.h>
#include "primary_descriptor"
#include "directory_entry"

#define VD_LSN 16 /* first logical sector of volume descriptor table */

#define CDROM_LSECSZ 2048 /* initial logical sector size of a CDROM */

char buffer[CDROM_LBS];
static hsffmt, isofmt;
void doiso(struct iso_primary_descriptor *, int);
void dohsf(struct hsf_primary_descriptor *, int);

#define HSF 1
#define ISO 2
int cdromfmt;

char *cdromfmtnames[] = {
 "unknown format",
 "High Sierra",
 "ISO - 9660"
};
char *voltypenames[] = {
 "Boot Record",
 "Standard File Structure",
 "Coded Character Set File Structure",
 "Unspecified File Structure",
};
#define NVOLTYPENAMES (sizeof(voltypenames)/sizeof(char *))
int
main(int argc, char *argv[])
{
 struct iso_primary_descriptor *ipd;
 struct hsf_primary_descriptor *hpd;
 int cdfd;

 cdfd = open("/dev/ras2d", 0);
 /* locate at the beginning of the descriptor table */
 lseek(cdfd, VD_LSN*CDROM_LSECSZ, SEEK_SET);
 ipd = (struct iso_primary_descriptor *) buffer;
 hpd = (struct hsf_primary_descriptor *) buffer;
 /* walk descriptor table */
 for(;;) {
 unsigned char type;
 read(cdfd, buffer, sizeof(buffer));
 /* determine ISO or HSF format of CDROM */
 if (cdromfmt == 0) {
 if (strncmp (ipd->id, ISO_STANDARD_ID, sizeof(ipd->id)) == 0)
 cdromfmt = ISO;
 if (strncmp (hpd->id, HSF_STANDARD_ID, sizeof(hpd->id)) == 0)
 cdromfmt = HSF;
 if (cdromfmt)
 printf("%s Volume Descriptors:\n", cdromfmtnames[cdromfmt]);
 else {
 printf("%s\n", cdromfmtnames[0]);
 exit(0);
 }
 }
 /* type of descriptor */
 if (cdromfmt == ISO)
 type = (unsigned char)ipd->type[0];
 else
 type = (unsigned char)hpd->type[0];

 /* type of volume */

 if (type < NVOLTYPENAMES)
 printf("\t%s\n", voltypenames[type]);
 else if (type != VD_END)
 printf("\t Reserved - %d\n", type);

 /* terminating volume */
 if (type == VD_END)
 break;
 /* ISO 9660 filestructure */
 if (cdromfmt == ISO && type == VD_PRIMARY
 && strncmp (ipd->id, ISO_STANDARD_ID, sizeof(ipd->id)) == 0) {
 doiso(ipd, cdfd);
 isofmt++;
 continue;
 }
 /* (obselete) High Sierra filestructure */
 if (cdromfmt == HSF && type == VD_PRIMARY
 && strncmp (hpd->id, HSF_STANDARD_ID, sizeof(hpd->id)) == 0) {
 dohsf(hpd, cdfd);
 hsffmt++;
 continue;
 }
 printf("\n");
 }
 return (0);
}
char *iso_astring(char *, int len);
/* rude translation routines for interpreting strings, words, halfwords */
#define ISO_AS(s) (iso_astring(s, sizeof(s)))
#define ISO_WD(s) (*(unsigned *)(s))
#define ISO_HWD(s) (*(unsigned short *)(s))
/* dig out the details of a ISO - 9660 descriptor */
void
doiso(struct iso_primary_descriptor *ipd, int fd) {
 printf(" Volume ID:\t\t%s\n", ISO_AS(ipd->volume_id));
 printf(" Logical Block Size:\t%d\n", ISO_HWD(ipd->logical_block_size));
 printf(" Volume Set ID:\t\t%s\n", ISO_AS(ipd->volume_set_id));
 printf(" Publisher ID:\t\t%s\n", ISO_AS(ipd->publisher_id));
 printf(" Preparer ID:\t\t%s\n", ISO_AS(ipd->preparer_id));
 printf(" Application ID:\t\t%s\n", ISO_AS(ipd->application_id));
 printf(" Copyright File ID:\t%s\n", ISO_AS(ipd->copyright_file_id));
 printf(" Abstract File ID:\t%s\n", ISO_AS(ipd->abstract_file_id));
 printf(" Bibliographic File ID:\t%s\n", ISO_AS(ipd->bibliographic_file_id));
 printf(" Creation Date:\t\t%s\n", ISO_AS(ipd->creation_date));
 printf(" Modification Date:\t%s\n", ISO_AS(ipd->modification_date));
 printf(" Expiration Date:\t%s\n", ISO_AS(ipd->expiration_date));
 printf(" Effective Date:\t%s\n", ISO_AS(ipd->effective_date));
}
/* dig out the details of a High Sierra Descriptor */
void
dohsf(struct hsf_primary_descriptor *hpd, int fd) {
 printf(" Volume Logical Block Number:\t%d\n", ISO_WD(hpd->volume_lbn));
 printf(" Volume ID:\t\t%s\n", ISO_AS(hpd->volume_id));
 printf(" Logical Block Size:\t%d\n", ISO_HWD(hpd->logical_block_size));
 printf(" Volume Set ID:\t\t%s\n", ISO_AS(hpd->volume_set_id));
 printf(" Publisher ID:\t\t%s\n", ISO_AS(hpd->publisher_id));
 printf(" Preparer ID:\t\t%s\n", ISO_AS(hpd->preparer_id));
 printf(" Application ID:\t\t%s\n", ISO_AS(hpd->application_id));
 printf(" Copyright File ID:\t%s\n", ISO_AS(hpd->copyright_file_id));

 printf(" Abstract File ID:\t%s\n", ISO_AS(hpd->abstract_file_id));
 printf(" Creation Date:\t\t%s\n", ISO_AS(hpd->creation_date));
 printf(" Modification Date:\t%s\n", ISO_AS(hpd->modification_date));
 printf(" Expiration Date:\t%s\n", ISO_AS(hpd->expiration_date));
 printf(" Effective Date:\t%s\n", ISO_AS(hpd->effective_date));
}
static char __strbuf[200];
/* turn a blank padded character feild into the null terminated strings
 that POSIX/UNIX/WHATSIX likes so much */
char *iso_astring(char *sp, int len) {
 bcopy(sp, __strbuf, len);
 __strbuf[len] = 0;
 for (sp = __strbuf + len - 1; sp > __strbuf ; sp--)
 if (*sp == ' ')
 *sp = 0;
 return(__strbuf);
}






[LISTING TWO]

From Rich Morin's "Prime Time Freeware", Vol 1.1 CDROM:

ISO - 9660 Volume Descriptors: Standard File Structure
 Volume ID: PTF_1_1
 Logical Block Size: 2048
 Volume Set ID:
 Publisher ID:
 Preparer ID: MERIDIAN_DATA_CD_PUBLISHER
 Application ID:
 Copyright File ID:
 Abstract File ID:
 Bibliographic File ID:
 Creation Date: 1992011101452700
 Modification Date: 1992011101452700
 Expiration Date: 0000000000000000
 Effective Date: 1992011101452700


From Discovery System's CD-ROM Sampler:

High Sierra Volume Descriptors:

 Standard File Structure
 Volume Logical Block Number: 16
 Volume ID: CDROM_SAMP1
 Logical Block Size: 2048
 Volume Set ID: CDROM_SAMP1
 Publisher ID: DISCOVERY
 Preparer ID: DISCOVERY
 Application ID:
 Copyright File ID:
 Abstract File ID:
 Creation Date: 1987111215553400
 Modification Date: 1987111215553400

 Expiration Date: 0000000000000000
 Effective Date: 0000000000000000

 Standard File Structure
 Volume Logical Block Number: 17
 Volume ID: CDROM_SAMP1
 Logical Block Size: 2048
 Volume Set ID: CDROM_SAMP1
 Publisher ID: DISCOVERY
 Preparer ID: DISCOVERY
 Application ID:
 Copyright File ID:
 Abstract File ID:
 Creation Date: 1987111215553400
 Modification Date: 1987111215553400
 Expiration Date: 0000000000000000
 Effective Date: 0000000000000000






[LISTING THREE]

/* cdromfs.h: various definitions for CDROM filesystems. */

#define VD_LSN 16 /* first logical sector of volume descriptor table */
#define CDROM_LSECSZ 2048 /* initial logical sector size of a CDROM */

#define HSF 1
#define ISO 2

char *cdromfmtnames[] = {
 "unknown format",
 "High Sierra",
 "ISO - 9660"
};
char *voltypenames[] = {
 "Boot Record",
 "Standard File Structure",
 "Coded Character Set File Structure",
 "Unspecified File Structure",
};
/* rude translation routines for interpreting strings, words, halfwords */
#define ISO_AS(s) (iso_astring(s, sizeof(s)))
#define ISO_WD(s) (*(unsigned *)(s))
#define ISO_HWD(s) (*(unsigned short *)(s))
#define ISO_BY(s) (*(unsigned char *)(s))

#define NVOLTYPENAMES (sizeof(voltypenames)/sizeof(char *))
struct cdromtime {
 unsigned char years; /* number of years since 1900 */
 unsigned char month; /* month of the year */
 unsigned char day; /* day of month */
 unsigned char hour; /* hour of day */
 unsigned char min; /* minute of the hour */
 unsigned char sec; /* second of the minute */
 unsigned char tz; /* timezones, in quarter hour increments */

 /* or, longitude in 3.75 of a degree */
};
#define CD_FLAGBITS "vdaEp m" /* file flag bits */
/* Handy macro's for block calculation */
#define lbntob(fs, n) ((fs)->lbs * (n))
#define btolbn(fs, n) ((fs)->lbs * (n))
#define trunc_lbn(fs, n) ((n) - ((n) % (fs)->lbs)
#define roundup(n, d) ((((n) + (d)) / (d)) * (d))







[LISTING FOUR]

/* cdromcat -- A simple program to interpret the CDROM filesystem, and return.
 * the contents of the file (directories are formatted and printed, files are
 * returned untranslated). */

#include <sys/types.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include "primary_descriptor"
#include "directory_entry"
#include "cdromfs.h"

/* per filesystem information */
struct fs {
 char *name; /* kind of cdrom filesystem */
 int fd; /* open file descriptor */
 int lbs; /* logical block size */
 int type; /* which flavor */
} fsd;
/* filesystem directory entry */
struct directent {
 union fsdir {
 struct iso_directory_record iso_dir;
 struct hsf_directory_record hsf_dir;
 } fsd;
 /* actually, name contains name, reserved field, and extensions area */
 char name[255 - sizeof(union fsdir)];
} rootent;
/* filesystem volume descriptors */
union voldesc {
 struct iso_primary_descriptor iso_desc;
 struct hsf_primary_descriptor hsf_desc;
};
char *iso_astring(char *, int len);
char *cdrom_time(struct cdromtime *, int);
void printdirent(struct directent *, struct fs *);
void printdirents(struct directent *, struct fs *);
void printdirentheader(char *p);
int searchdirent(struct directent *, struct directent *, struct directent *,
 struct fs *);
void extractdirent(struct directent *, struct fs *);

int lookup(struct directent *, struct directent *, char *, struct fs *);
/* "fetch directory value" */
#define FDV(b, f, t) (((t) == ISO) ? (b)->fsd.iso_dir.##f \
 : (b)->fsd.hsf_dir.##f)
/* "fetch primary descriptor value" */
#define FPDV(b, f, t) (((t) == ISO) ? (b)->iso_desc.##f \
 : (b)->hsf_desc.##f)
/* user "application" program */
int
main(int argc, char *argv[])
{
 struct directent openfile;
 char pathname[80];
 /* open the CDROM device */
 if ((fsd.fd = open("/dev/ras2d", 0)) < 0) {
 perror("cdromcat");
 exit(1);
 }
 /* is there a filesystem we can understand here? */
 if (iscdromfs(&rootent, &fsd) == 0) {
 fprintf(stderr, "cdromcat: %s\n", fsd.name);
 exit(1);
 }
 /* print the contents of the root directory to give user a start */
 printf("Root Directory Listing:\n");
 printdirentheader("/");
 printdirents(&rootent, &fsd);
 /* print files on demand from user */
 for(;;){
 /* prompt user for name to locate */
 printf("Pathname to open? : ");
 fflush(stdout);
 /* obtain, if none, exit, else trim newline off */
 if (fgets(pathname, sizeof(pathname), stdin) == NULL)
 exit(0);
 pathname[strlen(pathname) - 1] = '\0';
 if (strlen(pathname) == 0)
 exit(0);
 /* lookup filename on CDROM */
 if (lookup(&rootent, &openfile, pathname, &fsd)){
 /* if a directory, format and list it */
 if (ISO_BY(FDV(&openfile, flags, fsd.type))
 & CD_DIRECTORY) {
 printdirentheader(pathname);
 printdirents(&openfile, &fsd);
 }
 /* if a file, print it on standard output */
 else
 extractdirent(&openfile, &fsd);
 } else
 printf("Not found.\n");
 }
 /* NOTREACHED */
}
/* ----------- Filesystem primatives ------------------- */
/* Check for the presence of a cdrom filesystem. If present, pass back
 * parameters for initialization, otherwise, pass back error. */
int
iscdromfs(struct directent *dp, struct fs *fs) {

 char buffer[CDROM_LSECSZ];
 union voldesc *vdp = (union voldesc *) buffer;
 /* locate at the beginning of the descriptor table */
 lseek(fs->fd, VD_LSN*CDROM_LSECSZ, SEEK_SET);
 /* walk descriptor table */
 for(;;) {
 unsigned char type;
 /* obtain a descriptor */
 read(fs->fd, buffer, sizeof(buffer));
 /* determine ISO or HSF format of CDROM */
 if (fs->type == 0) {
 if (strncmp (vdp->iso_desc.id, ISO_STANDARD_ID,
 sizeof(vdp->iso_desc.id)) == 0)
 fs->type = ISO;
 if (strncmp (vdp->hsf_desc.id, HSF_STANDARD_ID,
 sizeof(vdp->hsf_desc.id)) == 0)
 fs->type = HSF;
 }
 /* if determined, obtain root directory entry */
 if (fs->type) {
 type = ISO_BY(FPDV(vdp, type, fs->type));
 if (type == VD_PRIMARY) {
 bcopy (
 (caddr_t) FPDV(vdp, root_directory_record, fs->type),
 (caddr_t)dp, sizeof (union fsdir));
 fs->lbs =
 ISO_HWD(FPDV(vdp, logical_block_size, fs->type));
 }
 }
 /* terminating volume */
 if (type == VD_END)
 break;
 }
 fs->name = cdromfmtnames[fs->type];
 return (fs->type);
}
/* Obtain a "logical", i.e. relative to the directory entries beginning
 * (or extent), block from the CDROM. */
int
getblkdirent(struct directent *dp, char *contents, long lbn, struct fs *fs) {
 long filesize = ISO_WD(FDV(dp, size, fs->type)),
 extent = ISO_WD(FDV(dp, extent, fs->type));
 if (lbntob(fs, lbn) > roundup(filesize, fs->lbs))
 return (0);
 /* perform logical to physical translation */
 (void) lseek(fs->fd, lbntob(fs, extent + lbn), SEEK_SET);
 /* obtain block */
 return (read(fs->fd, contents, fs->lbs) == fs->lbs);
}
/* Search the contents of this directory entry, known to be a directory
itself,
 * looking for a component. If found, return directory entry associated with
 * the component. */
int
searchdirent(struct directent *dp, struct directent *fdp,
 struct directent *compdp, struct fs *fs) {
 struct directent *ldp;
 long filesize = ISO_WD(FDV(dp, size, fs->type)),
 comp_namelen = ISO_BY(FDV(compdp, name_len, fs->type)),
 lbn = 0, cnt;

 char *buffer = (char *) malloc(fs->lbs);
 while (getblkdirent(dp, buffer, lbn, fs)) {
 cnt = filesize > fs->lbs ? fs->lbs : filesize;
 filesize -= cnt;
 ldp = (struct directent *) buffer;
 /* have we a record to match? */
 while (cnt > sizeof (union fsdir)) {
 long entlen, namelen;
 /* match against component's name and name length */
 entlen = ISO_BY(FDV(ldp, length, fs->type));
 namelen = ISO_BY(FDV(ldp, name_len, fs->type));
 if (entlen >= comp_namelen + sizeof(union fsdir) && namelen == comp_namelen
 && strncmp(FDV(ldp,name,fs->type),
 FDV(compdp,name,fs->type), namelen) == 0) {
 bcopy ((caddr_t)ldp, (caddr_t)fdp, entlen);
 bcopy ((caddr_t)ldp, (caddr_t)compdp, entlen);
 free(buffer);
 return 1;
 } else {
 cnt -= entlen;
 ldp = (struct directent *)
 (((char *) ldp) + entlen);
 }
 }
 if (filesize == 0) break;
 lbn++;
 }
 free(buffer);
 return 0;
}
/* Lookup the pathname by interpreting the directory structure of the CDROM
 * element by element, returning a directory entry if found. Name translation
 * occurs here, out of the null terminated path name string. This routine
 * works by recursion. */
int
lookup(struct directent *dp, struct directent *fdp, char *pathname,
 struct fs *fs) {
 struct directent *ldp;
 struct directent thiscomp;
 char *nextcomp;
 unsigned len;
 /* break off the next component of the pathname */
 if ((nextcomp = strrchr(pathname, '/')) == NULL)
 nextcomp = strrchr(pathname, '\0');
 /* construct an entry for this component to match */
 ISO_BY(FDV(&thiscomp, name_len, fs->type)) = len = nextcomp - pathname;
 bcopy(pathname, thiscomp.name, len);
 /* attempt a match, returning component if found */
 if (searchdirent(dp, fdp, &thiscomp, fs)){
 /* if no more components, return found value */
 if (*nextcomp == '\0')
 return 1;
 /* otherwise, if this component is a directory,
 * recursively satisfy lookup */
 else if (ISO_BY(FDV(dp, flags, fs->type)) & CD_DIRECTORY)
 return (lookup(&thiscomp, fdp, nextcomp + 1, fs));
 }
 /* if no match return fail */
 else

 return(0);
}
/* --------------- object output routines for application ------------ */
/* Extract the entire contents of a directory entry and write this on
 * standard output. */
void
extractdirent(struct directent *dp, struct fs *fs) {
 long filesize = ISO_WD(FDV(dp, size, fs->type)),
 lbn = 0, cnt;
 char *buffer = (char *) malloc(fs->lbs);
 /* iterate over all contents of the directory entry */
 while (getblkdirent(dp, buffer, lbn, fs)) {
 /* write out the valid portion of this logical block */
 cnt = filesize > fs->lbs ? fs->lbs : filesize;
 (void) write (1, buffer, cnt);
 /* next one? */
 lbn++;
 filesize -= cnt;
 if (filesize == 0) break;
 }
 free(buffer);
}
/* Print directory header */
void
printdirentheader(char *path) {
 printf("Directory(%s):\n", path);
 printf("Flags:\tsize date sysa name\n");
}
/* Print all entries in the directory. */
void
printdirents(struct directent *dp, struct fs *fs) {
 struct directent *ldp;
 long filesize = ISO_WD(FDV(dp, size, fs->type)),
 lbn = 0, cnt;
 char *buffer = (char *) malloc(fs->lbs);
 while (getblkdirent(dp, buffer, lbn, fs)) {
 long entlen, namelen;
 cnt = filesize > fs->lbs ? fs->lbs : filesize;
 filesize -= cnt;
 ldp = (struct directent *) buffer;
 entlen = ISO_BY(FDV(ldp, length, fs->type));
 namelen = ISO_BY(FDV(ldp, name_len, fs->type));
 /* have we a record to match? */
 while (cnt > sizeof (union fsdir) && entlen && namelen) {
 printdirent(ldp, fs);
 /* next entry? */
 cnt -= entlen;
 ldp = (struct directent *) (((char *) ldp) + entlen);
 entlen = ISO_BY(FDV(ldp, length, fs->type));
 namelen = ISO_BY(FDV(ldp, name_len, fs->type));
 }
 if (filesize == 0) break;
 lbn++;
 }
 free(buffer);
}
/* Print a directent on output, formatted. */
void
printdirent(struct directent *dp, struct fs *fs) {

 unsigned extattlen;
 unsigned fbname, name_len, entlen, enttaken;
 /* mode flags */
 prmodes(ISO_BY(FDV(dp, flags, fs->type)));
/* Note: this feature of HSF is not used because of lack of semantic def. */
#ifdef whybother
 extattlen = ISO_BY(FDV(dp, ext_attr_length, fs->type));
 if (extattlen)
 printf(" E%3d", extattlen);
 else
 printf(" ");
#endif
 /* size */
 printf("\t%6d", ISO_WD(FDV(dp, size, fs->type)));
 /* time */
 printf(" %s",
 cdrom_time((struct cdromtime *) FDV(dp, date, fs->type),fs->type));
 /* compensate for reserved field used to word align directory entry */
 entlen = ISO_BY(FDV(dp, length, fs->type));
 name_len = ISO_BY(FDV(dp, name_len, fs->type));
 enttaken = sizeof(union fsdir) + name_len;
 if (enttaken & 1)
 enttaken++;
 fbname = ISO_BY(FDV(dp, name, fs->type));
 entlen -= enttaken;
 /* print size of CDROM Extensions field if present */
 if (entlen)
 printf(" %3d", entlen);
 else
 printf(" ");
 /* finally print name. compensate for unprintable names */
 if (name_len == 1 && fbname <= 1) {
 printf("\t%s\n", (fbname == 0) ? "." : "..");
 } else
 printf("\t%s\n",
 iso_astring(FDV(dp, name, fs->type), name_len));
};
/* print CDROM file modes */
prmodes(f) {
 int i;
 for(i=0; i < 8; i++) {
 if(CD_FLAGBITS[i] == ' ')
 continue;
 if(f & (1<<i))
 putchar(CD_FLAGBITS[i]);
 else
 putchar('-');
 }
 putchar(' ');
}
/* attempt to print a CDROM file's creation time */
char *
cdrom_time(struct cdromtime *crt, int type) {
 struct tm tm;
 static char buf[32];
 char *fmt;
 /* step 1. convert into a ANSI C time description */
 tm.tm_sec = crt->sec;
 tm.tm_min = crt->min;

 tm.tm_hour = crt->hour;
 tm.tm_mday = crt->day;
 /* month starts with 1 */
 tm.tm_mon = crt->month - 1;
 tm.tm_year = crt->years;
 tm.tm_isdst = 0;
/* Note: not all ISO-9660 disks have correct timezone field */
#ifdef whybother
 /* ISO has time zone as 7th octet, HSF does not */
 if (type == ISO) {
 tm.tm_gmtoff = crt->tz*15*60;
 tm.tm_zone = timezone(crt->tz*15, 0);
 fmt = "%b %e %H:%M:%S %Z %Y";
 } else
#endif
 fmt = "%b %e %H:%M:%S %Y";
 /* step 2. use ANSI C standard function to format time properly */
 (void)strftime(buf, sizeof(buf), fmt, &tm);
 return (buf);
}
static char __strbuf[200];
/* turn a blank padded character field into the null terminated strings
 that POSIX/UNIX/WHATSIX likes so much */
char *iso_astring(char *sp, int len) {
 bcopy(sp, __strbuf, len);
 __strbuf[len] = 0;
 for (sp = __strbuf + len - 1; sp > __strbuf ; sp--)
 if (*sp == ' ')
 *sp = 0;
 return(__strbuf);
}








[LISTING FIVE]

bill 1 % cdromcat
Root Directory Listing:
Directory(/):
Flags: size date sysa name
-d---- 2048 Jul 27 12:49:16 1992 .
-d---- 2048 Jul 27 12:49:16 1992 ..
------ 394127 Jul 24 01:21:06 1992 0.ALL;1
------ 289565 Jul 24 01:21:06 1992 0.ASK;1
------ 2454 Jul 23 23:57:56 1992 0.DOC;1
-d---- 2048 Jul 27 12:50:06 1992 A2Z
-d---- 2048 Jul 27 12:50:07 1992 AI
-d---- 2048 Jul 27 12:50:07 1992 ARCHIVE
-d---- 2048 Jul 27 12:50:08 1992 CAD
-d---- 2048 Jul 27 12:50:08 1992 DATABASE
-d---- 2048 Jul 27 12:50:08 1992 DATACOMM
-d---- 2048 Jul 27 12:50:10 1992 DESKTOP
-d---- 2048 Jul 27 12:50:10 1992 DOCPREP
-d---- 2048 Jul 27 12:50:13 1992 GAME

-d---- 2048 Jul 27 12:50:13 1992 GRAPHICS
-d---- 2048 Jul 27 12:50:16 1992 LANGUAGE
-d---- 2048 Jul 27 12:50:27 1992 MATH
-d---- 2048 Jul 27 12:50:27 1992 MISC
-d---- 2048 Jul 27 12:50:28 1992 MUSIC
-d---- 2048 Jul 27 12:50:29 1992 OS
-d---- 2048 Jul 27 12:50:29 1992 PGM_TOOL
-d---- 2048 Jul 27 12:50:30 1992 SCIENCE
Pathname to open? : 0.DOC;1
Topic: (/)

Description: This is the top level directory of the PTF disc.

Notes:
<... contents of file ... >

Pathname to open? : OS
Directory(OS):
Flags: size date sysa name
-d---- 2048 Jul 27 12:50:29 1992 .
-d---- 2048 Jul 27 12:49:16 1992 ..
------ 593 Jul 23 23:53:40 1992 0.DOC;1
-d---- 2048 Jul 27 12:50:29 1992 CONDOR
-d---- 2048 Jul 27 12:50:29 1992 MACH
-d---- 2048 Jul 27 12:50:29 1992 MDQS
-d---- 2048 Jul 27 12:50:29 1992 PLAN_9
Pathname to open? : OS/PLAN_9
Directory(OS/PLAN_9):
Flags: size date sysa name
-d---- 2048 Jul 27 12:50:29 1992 .
-d---- 2048 Jul 27 12:50:29 1992 ..
------ 732 Jul 23 23:41:38 1992 0.DOC;1
------ 31 Jul 23 23:03:30 1992 0.LST;1
------ 230093 Jul 23 23:03:30 1992 PAPERS.ATZ;1
------ 472 Jul 23 23:03:30 1992 PAPERS.LTV;1
Pathname to open? :
bill 2 %

























December, 1992
BLOBS AND OBJECT-ORIENTED DATABASE ENGINES


Storing image and sound data


 This article contains the following executables: DFLT15.ARC D15TX.ARC


Sam Felton


Sam is a software engineer at Raima Corporation and can be contacted at 1605
NW Sammamish Road, #200, Issaquah, WA 98027-5378.


One of the most valuable tools in any system or application designer's kit is
a database engine--a library that, when linked into your application, gives
your software the power of a database management system without having to
write one yourself.
Utilizing object-oriented programming techniques to design a database engine
gives you a set of base classes from which you can inherit a DBMS. But how do
you use such a tool and what does it give you?
Among other things, this tool gives your objects persistence, meaning that
objects "persist" within a database or file, even when the software that
created them is not running. Thus, interface objects can be stored in a
database as resources, and you can display, modify, and replace them as
needed. Retrieval is easy--by key or by set, depending on the database model
you choose. You can even create new objects by inheriting from more than one
ancestor class and adding specific new functionality yourself.
Suppose, however, you want to store sound and image data in your database. No
problem, right? A simple index should allow fast lookup and playback of this
data.
Indexing is simple. But what about the data? You have no idea at compile time
what size or form that data will take. How do you store an object of varying
size when your database expects fixed-length records? And then there is the
format--different methods must be developed to handle either the video or the
sound. Ideally, you would want to create a single generic method for each
activity; that is, Store() to place the item in the database, Retrieve() to
get it out, and so on.
Thus, the GUI milieu has provided us with a new challenge--storing image and
sound data. A new approach must be found.


The Revenge of the BLOB


The obvious approach to storing images and sound is to find a process of
storing generic, variable-length objects in a database. In this article I'll
demonstrate exactly that--a method for efficient, generic, large-entity
storage.
In modern database parlance, sound and video data fall under the category
known as "binary large objects," or BLOB for short. A BLOB is basically a
large data stream in any format which is to be stored in the database.
One approach to storing BLOBs is to use an object-oriented database engine in
conjunction with an object-oriented language. To illustrate how you can do
this, I'll use as an example the Raima Object Manager (ROM), an
object-oriented database engine my company produces. ROM is capable of storing
records in both indexed and network-model methods. The network model is
essential to my approach of storing BLOBs because it easily manages the
relationship of items in a 1:n cardinality; more importantly, those items can
be accessed with considerable speed. The mechanics are outside the scope of
this article; suffice to say, however, that it is faster to traverse a linked
list than to look up items in an index, and we need to be able to do both.
In addition to ROM, the other tools on my workbench include Borland's C++
compiler for MS-DOS and Windows (you could also use Microsoft's C/C++ or any
other C++ 2.1-compliant compiler or preprocessor) and for interface design,
the Zinc Interface Library (ZIL).


Grasping the Amorphous


The two most widely used conceptual models in database design are the
relational and the network model. In relational systems, two-dimensional
tables consisting of attributes (columns) and instances (rows or tuples) are
defined. These tables (called "relations") are linked through indexing on
individual columns that serve as keys by which you may locate that table.
The network model, on the other hand, uses entities called "records"
(equivalent to rows in the relational model) that contain fields in which data
is contained. They roughly resemble the Pascal RECORD or the C struct. These
records are linked via the use of sets (typically a linked-list structure).
These sets have a cardinality of 1:n--that is, one owner record may have many
member records.
In ROM, we use both models to organize our data to build a database of keyed
records, each of which will manage a multimember set. Each member of the set
will contain tagless fields consisting of a different sized, fixed-length
array of bytes. To store the BLOB, we'll use a best-fit slicing algorithm to
divide it up into chunks of 500, 2000, and 4000 bytes. (After adding a few
bytes for set-connection overhead, these sizes equate to commonly found
disk-cluster sizes; thus, we can reduce the amount of external fragmentation
created on the disk.)
Next, we create a set which links the manager object (DISP) with each record
(BITSREC 1, 2, and 3); see Figure 1. We can use indexed access (for the DISP
manager object, so that we may call up bitmaps by their ID) and the network
access (for the BITSREC objects, so that we may reassemble quickly). The
schema definition of this database is in Listing One (page 120), written in
Raima Data Definition Language (RDDL), a C-like language in respects to its
definition of records. Any atomic C data type can be used in a RDDL
record--short, long, double, and so on.
In Listing One I've defined "tagless" fields in each record. This signifies to
the DDL processor that the record is one large buffer, containing no
individual fields. The tagless field definitions have defined a
two-dimensional array of char as the storage buffer in the record because the
DDL processor assumes that a one-dimensional array of bytes is a string, and C
string-handling rules take effect. In this case, we do not want to treat a
BLOB as a string!
The DDL processor reads this file and produces two files from it: blob.dbd,
the database descriptor file that contains the structures the library objects
use to access the database; and blob.h, which contains C structs that are the
exact equivalent of the record structures in the database; see Listing Two,
page 120. (This file is a C .H file instead of a C++ .HPP file because the
Raima Database Manager, a non-object-oriented predecessor to ROM, uses this
same file to allow C programmers to access the database.) I'll use multiple
inheritance to incorporate these structs into our classes.
The next step is to define C++ classes. ROM provides an OmBitmap class
specifically for Microsoft Windows bitmaps. This class is specially designed
to manage BLOBs containing Windows bitmaps, and to be able to store and
retrieve bitmaps to and from the clipboard, as well as Show() the bitmaps onto
a given window handle. The OmBitmap class is actually derived from WinBlob,
which derives from OmBlob, which derives from a multimember sethandler class
called Polymorph (so named because multimember sets are inherently
polymorphic). The OmBitmap class definition is shown in Listing Three, page
120.
From the OmBitmap class, we get all the tools necessary to manage the multiple
BLOB chunks, which combined, form the body of the bitmap. Note that the
OmBitmap has methods, such as LoadFromClip and CopyToHandle, that make it
simple to have a Paste command that can read the Windows clipboard, and paste
the bitmap right into our BLOB storage. These members are protected for the
obvious reason that outsiders should not, as a rule, be manipulating access to
the clipboard.


BLOBing for Bitmaps


The first thing to do is get the data into the database. To demonstrate this,
I'll use code taken from examples in the ROM documentation. Since this code is
fully documented and described in the toolkit, I'll only show portions here.
Let's start with Listing Four, page 120.
Besides cheating on Windows again by using the winio functions, we have
subclassed a StoreDb object to form our database, called "BitBlob." The
database, once it has been created, can be instantiated inside of the
StoreTask class.
The purpose of the StoreTask class is to provide session-by-session control
over the databases and to make sure we can manage locking between concurrent
tasks. Since this is C++, however, additional management features can be
included. For example, if we decide that we needed multiple databases, we can
manage their opening and closing here, as well as selecting their open modes,
and so on. The task can act as a way of controlling accesses by MDI client
windows, as well, if we choose to use that environment.
Next, notice the definition of the Display class coderived from StoreObj and
the DDLP-produced struct, Disp. Disp effectively contains our data, as shown
in Listing Two. The resultant object instance of this class will amount to
displayable bitmaps; notice the Show() method, which displays the name of the
key (DispName).
The STOREDIN macro associates an object with a particular database by
assigning its private database pointer to the StoreDb-derived class specified.
This is a convenient way to associate an object with its database.
The OWNEROF macro generates several member functions: the overloaded operator
functions >>, which enable navigation to a given member object via the proper
set; and the Members function, which returns the count of members in the set.
Note in the .CPP file that the main portion of the code is located in the
RunProgram member of the StoreTask-derived class. RunProgram does a number of
things, all from its menu. The s case calls the Save function. After you have
copied the bitmap data into the BLOB's buffer area, you perform a UserNew to
create corresponding instance records in the database. Then, you call Save to
fill those records. The Save function divides BLOBs into their corresponding
chunks, and then fills the database records as it goes, until it runs out of
data to fill with. In doing so, you pass in a reference to a Display object,
so that it knows which record instance is to manage it.
As a result of this operation, we've now stored an object and named it in the
key portion of the Disp class, so it can be retrieved. This is done in the f
case by getting a name, creating a KeyObj (which is really just a key
instance), assigning the character string to search for, and passing the whole
thing to Find, the Disp object's member function, which searches for that
particular bitmap. Show, the BLOB's member function, displays it.
This illustrates keyed access. But what if we only want to display the BLOB
bitmaps in order?

Listings Five (page 121) and Six (page 158) do this. The slide viewer uses ZIL
to create a window in which we may display slides. As you can see from the
code, we create a special window subclass in which we instantiate the task and
objects.
We then associate the objects with actual persistent items by using the
overloaded operator [], which has been modified to provide sequential access
to instance records, thus associating them with a specific instance record. To
display, merely convert the window's coordinates to logical coordinates, pass
them to the Show member function, and blast away. The bitmap appears in the
window.
The EVT_NEXT and EVT_PREVIOUS events call yet another pair of overloaded
operators, the double-plus (++) and double-minus (--) functions. These have
been overloaded to cause the memory object to associate itself with the
sequentially following or preceding instance record in the database. It is
very simple to read the code, and the actions of the operators are fairly
intuitive. This makes the code much easier to write and maintain.


AfterBytes


As you can see, there are several ways to utilize BLOB objects. The important
thing to remember is that the implementation is crucial. With the method
defined by ROM, pieces are reassembled quickly and it's easy to add to or
subtract from an existing BLOB's size and reorganize its indexing.
_BLOBS AND OBJECT-ORIENTED DATABASE ENGINES_
by Sam Felton


[LISTING ONE]

/* blob.ddl--contains the schema definition, written in Raima Data
Definition Language */

database BLOB [4048] {
 /* here we tell it into which file to put each object */
 data file "blob.d01" contains Disp;
 data file "blob.d02" contains BitsRec1;
 data file "blob.d03" contains BitsRec2;
 data file "blob.d04" contains BitsRec3;
 key file "blob.key" contains DispName;

 record Disp
 {
 key char DispName[21]; /* Name of our bitmap */
 }
 record BitsRec1 /* BLOB chunks... */
 {
 char bit1[1][4000];
 }
 record BitsRec2
 {
 char bit2[1][2000];
 }
 record BitsRec3
 {
 char bit3[1][512];
 }
 set Disp_BitMap
 {
 order last;
 owner Disp;
 member BitsRec1;
 member BitsRec2;
 member BitsRec3;
 }
}






[LISTING TWO]


/* Raima Data Manager Version 3.21A */
/* database blob record/key structure declarations */
struct disp {
 char dispname[21];
};
struct bitsrec1 {
 char bit1[1][4000];
};
struct bitsrec2 {
 char bit2[1][2000];
};
struct bitsrec3 {
 char bit3[1][512];
};
/* record, field and set table entry definitions */
/* File Id Constants */
/* Record Name Constants */
#define DISP 10000
#define BITSREC1 10001
#define BITSREC2 10002
#define BITSREC3 10003
/* Field Name Constants */
#define DISPNAME 0L
#define BIT1 1000L
#define BIT2 2000L
#define BIT3 3000L
/* Field Sizes */
#define SIZEOF_DISPNAME 21
#define SIZEOF_BIT1 4000
#define SIZEOF_BIT2 2000
#define SIZEOF_BIT3 512
/* Set Name Constants */
#define DISP_BITMAP 20000







[LISTING THREE]

// OMBITMAP.HPP-- C++ header file for Object Manager, an object storage class
// library/ODBMS -- written by Paul Gallagher 1992 -- for C++ 2.0 or above
// Copyright (C)1991 Raima Corporation All Rights Reserved
#ifdef WINDOWS
#ifndef OMBITMAP_HPP
#define OMBITMAP_HPP

#include <winblob.hpp>

class OM_EXPORT OmBitmap : public WinBlob
{
protected:
 BITMAP DB_FAR * Bmap; // stored in Blob header
 int GetHeader(Pvoid p);
 int SetHeader(Pvoid, long len);
 T_F LoadFromClip(HANDLE hdata);
 HANDLE CopyToHandle(); // used to copy to clipboard

public:
 OmBitmap(HWND h,int set): WinBlob(h, set) { Bmap = new BITMAP; }
 ~OmBitmap() { delete Bmap; };
 WORD ClipFormat() { return CF_BITMAP; }
 void Show(short xStart, short yStart);
};
#endif // OMBITMAP_HPP
#endif // WINDOWS






[LISTING FOUR]

// BITMAP.HPP
//---- Illustrates the creation and use of an OmBitMap class ---
#include <storedb.hpp> // task/database ObjectManager includes
#include <storeobj.hpp>
#include "blob.h" // blob database structures from DDLP
#include <keyobj.hpp>
#include <ombitmap.hpp>

extern "C" {
#include "..\winio\iosetup.h"
};
void RunProgram(HWND h); // called from main
 //----- Database class defintions for blob database -----
class BitBlob : public StoreDb
{
public:
 // CONSTRUCTOR
 BitBlob();
 DEFINE_DB_LOCATOR;
};
 //----- Task class defintions -----
class BitBlobTask : public StoreTask
{ BitBlob BlobDb;
};
class MyBitmaps;
 //-----Display class--The "owner" of the MyBitmap; contains name of bitmap
class Display : public StoreObj, public Disp
{
private:
 int RecType() { return DISP; }
public:
 // Display() default constructor is Keyed.
 Display() : StoreObj(KeyObj(DISPNAME)) {}
 void Show() { printf("%s\n",DispName); }
 T_F UserNew();
 STOREDIN(BitBlob);
 OWNEROF(MyBitmaps,DISP_BITMAP);
};
 //----- derive BitMap class -----
class MyBitmaps : public OmBitmap
{
private:
public:

 MyBitmaps(HWND h) : OmBitmap(h, DISP_BITMAP) {}
 ~MyBitmaps() {}
 void Show(short xStart, short yStart,char DB_FAR *name );
 STOREDIN(BitBlob);
 MEMBEROF(Display,DISP_BITMAP);
};

// BITMAP.CPP
#include "bitmap.hpp"
//----- Source for WinMain and Run Program -----
DB_INIT(BitBlob); // For Borland, Initialize database pointer
extern "C" {
int PASCAL WinMain(HANDLE hInstance, HANDLE hPrevInstance,
 LPSTR /*lpCmdLine*/, int nCmdShow)
{
 %
 %
 %
 BitBlobTask task; // Instantiate a Task
 hw= winio_hwnd();
 if (task.Okay()) // make sure task setup was okay
 RunProgram(hw);
 winio_close();
 return 0;
}
};
// Ok, it's not "pure Windows," but it's easier to understand. Using winio
// instead of real windows, but this could be replaced with a basic message
loop....
 void
RunProgram(HWND hw)
{
 char tmp[60];
 Display display;
 MyBitmaps bm(hw);
 bm.Init(); // Post-constructor setup routine
 for (int i=0; i < 10 ; i++)
 printf("\n");
 do
 {
 printf("a) Show all f)find l) list c) clip p) paste\ns) save
 d)display D) DELETE q) quit : ");
 gets(tmp);
 if (*tmp == 'q')
 break;
 switch (*tmp)
 {
 %
 %
 %
 case 's': // SAVE THE BITMAP UNDER A NEW NAME
 {
 display.UserNew(); // CREATE A NEW PERSISTENT PART
 printf("Storing\n");
 bm.Save((StoreObj DB_FAR &)display); // SAVE MEMORY PART
 // INTO PERSISTENT PART
 break;
 }
 case 'd': // DISPLAY THE BITMAP IN MEMORY
 { printf("Display\n");

 bm.Show(0,0,(char DB_FAR *)display.DispName);
 gets(tmp);
 break;
 }
 case 'f': // FIND A BITMAP BY NAME AND READ IT IN
 { printf("Enter Bitmap name : ");
 gets(tmp);
 display.Find(KeyObj(DISPNAME,tmp));
 if (display.Okay())
 {
 display >> bm;
 bm.Show(0,0,(char DB_FAR *)display.DispName);
 }
 break;
 }
 }
 } while (1);
}
// ---- Objects Source--shows outputs the name of the bitmap over the Bitmap
 void
 MyBitmaps ::
Show(short xStart, short yStart, char DB_FAR *name )
{
 OmBitmap:: Show(xStart,yStart); // Call Base class to show bitmap
 HDC hDC; // Get a window DC
 hDC = GetDC(hwind);
 TextOut(hDC,xStart,yStart,name,_fstrlen(name)); // Print the name
 ReleaseDC(hwind,hDC);
}
 T_F
 Display ::
UserNew()
{
 printf("Enter Display Name > ");
 gets (DispName);
 NewObj();
 return True;
}
 // Database BitBlob Constructor-must be in directory with the blob database
BitBlob :: BitBlob() : StoreDb("blob",PDB_LOCATOR)
{ if (Open() != True)
 printf("Error Opening Tutorial Database, Has it been initialized?\n");
}





[LISTING FIVE]

// tech.hpp - class definitions for slide player
// Revision history: -- 1.00 SPF

#ifndef TECH_HPP

#define TECH_HPP
#define EVT_QUIT 10000 // our event mapping
#define EVT_NEXT 10001
#define EVT_PREVIOUS 10002

#define EVT_FIRST 10003

#include <storedb.hpp> // task/database ObjectManager includes
#include <storeobj.hpp>
#include "blob.h" // blob database structures from DDLP
#include <keyobj.hpp>
#include <ombitmap.hpp>

//// These should look familiar - they are essentially stolen from the
//// bitmap.hpp file, with slight exceptions
 // Database class defintions for blob database
class BitBlob : public StoreDb
{
public:
 // CONSTRUCTOR
 BitBlob();
 DEFINE_DB_LOCATOR;
};
 //---- Task class defintions ----
class BitBlobTask : public StoreTask
{ BitBlob BlobDb;
};
class MyBitmaps;
 //---- Display class--"owner" of MyBitmap ----
class Display : public StoreObj, public disp
{
private:
 int RecType() { return DISP; }
public:
 // Display() default constructor is Keyed.
 Display() : StoreObj(KeyObj(DISPNAME)) {}
 STOREDIN(BitBlob);
 OWNEROF(MyBitmaps,DISP_BITMAP);
};
 //---- derive BitMap class ----
class MyBitmaps : public OmBitmap
{
private:
public:
 MyBitmaps(HWND h) : OmBitmap(h, DISP_BITMAP) {}
 ~MyBitmaps() {}
 STOREDIN(BitBlob);
 MEMBEROF(Display,DISP_BITMAP);
};
// Subclass of UIW_WINDOW for my own purposes
// Welcome to ZIL, folks!
class EXPORT MY_WINDOW : public UIW_WINDOW
{
public:
 MY_WINDOW();
 virtual ~MY_WINDOW() {}
 EVENT_TYPE Event(const UI_EVENT &event); // our local event processor
private:
 static EVENT_TYPE FirstEvent(UI_WINDOW_OBJECT *object, UI_EVENT &event,
 EVENT_TYPE ccode);
 static EVENT_TYPE QuitEvent(UI_WINDOW_OBJECT *object, UI_EVENT &event,
 EVENT_TYPE ccode);
 static EVENT_TYPE NextEvent(UI_WINDOW_OBJECT *object, UI_EVENT &event,
 EVENT_TYPE ccode);

 static EVENT_TYPE PreviousEvent(UI_WINDOW_OBJECT *object, UI_EVENT &event,
 EVENT_TYPE ccode);
 POINT display_origin; // a place to start the bitmap display
 BitBlobTask task; // our StoreTask subclass
 Display display; // our display manager object
 MyBitmaps bm; // our OmBitmap subclass (bitmap data)
};
#endif //TECH_HPP







[LISTING SIX]

// tech.cpp - methods for slide viewer program. This is a Bitmap slideshow
// presentation program.It uses Raima Object Manager and the Zinc Interface
// Library (ZIL) Revision history: 1.0 SPF

#define USE_MAIN_WIN TRUE
#define USE_RAW_KEYS

// tell compiler to use pre-compiled headers...
#pragma hdrfile
#include <dos.h>
#include <stdlib.h>
#include <errno.h>
#include <iostream.h>
#include <ui_win.hpp>
#include <ui_evt.hpp>
#include <stdio.h>
#include <string.h>
#include <process.h>
#include <dir.h>
#include "dlg.hpp"
#include "tech.hpp"
#pragma hdrstop

DB_INIT(BitBlob); // For Borland and many UNIX compilers/preprocessors, we
must
 // initialize static pointers via a prototype before they can
 // be referenced anywhere (even in a constructor). Here, we
 // initialize the pointer to the database in the StoreTask item
 // This function is automated via a macro in Object Manager.
 // Database BitBlob Constructor
 // must be in the directory with the blob database
BitBlob :: BitBlob() : StoreDb("blob",PDB_LOCATOR)
{ if (Open() != True)
 printf("Error Opening Tutorial Database, Has it been initialized?\n");
}
// Constructor for our slide display window, Zinc style.
MY_WINDOW::MY_WINDOW()
 :UIW_WINDOW("DLG.DAT~MAIN_WIN"),
 bm(this->screenID) // screenID is the HWND in the object
{
 // Set the user functions to the pop-up items.
 UIW_POP_UP_ITEM *popup;
 unsigned short the_item;

 // Change point of display origin for bitmap from Client Coordinates to
 // Screen Coordinates for use by the Show() command
 display_origin.x = 5;
 display_origin.y = 10;
 ClientToScreen(this->screenID,&display_origin);
 // I know these look repetitive, but they are only done once,
 // so why waste the call overhead?
 the_item = QUIT_ITEM;
 popup = (UIW_POP_UP_ITEM *)this->Information(GET_NUMBERID_OBJECT,&the_item);
 popup->userFunction = MY_WINDOW::QuitEvent;
 popup->HotKey('Q');
 the_item = NEXT_SLIDE;
 popup = (UIW_POP_UP_ITEM *)this->Information(GET_NUMBERID_OBJECT,&the_item);
 popup->userFunction = MY_WINDOW::NextEvent;
 popup->HotKey('N');
 the_item = PREVIOUS_SLIDE;
 popup = (UIW_POP_UP_ITEM *)this->Information(GET_NUMBERID_OBJECT,&the_item);
 popup->userFunction = MY_WINDOW::PreviousEvent;
 popup->HotKey('P');
 the_item = FIRST_SLIDE;
 popup = (UIW_POP_UP_ITEM *)this->Information(GET_NUMBERID_OBJECT,&the_item);
 popup->userFunction = MY_WINDOW::FirstEvent;
 popup->HotKey('F');
 bm.Init(); //here, we initialize the BLOB instance - this must
 // be done AFTER instantiation is complete
}
/////////////////////////////////////EVENTS//////////////////////////////////
// First slide displayed
#pragma argsused
EVENT_TYPE MY_WINDOW::FirstEvent (UI_WINDOW_OBJECT *object, UI_EVENT &event,
 EVENT_TYPE ccode)
{
 UI_EVENT tEvent;
 tEvent = event;
 if (ccode != L_SELECT)
 return ccode;
 tEvent.type = EVT_FIRST;
 object->eventManager->Put(tEvent);
 return ccode;
}
// Next slide displayed
#pragma argsused
EVENT_TYPE MY_WINDOW::NextEvent (UI_WINDOW_OBJECT *object, UI_EVENT &event,
 EVENT_TYPE ccode)
{
 UI_EVENT tEvent;
 tEvent = event;
 if (ccode != L_SELECT)
 return ccode;
 tEvent.type = EVT_NEXT;
 object->eventManager->Put(tEvent);
 return ccode;
}
//Previous slide displayed
#pragma argsused
EVENT_TYPE MY_WINDOW::PreviousEvent (UI_WINDOW_OBJECT *object, UI_EVENT
&event,
 EVENT_TYPE ccode)
{
 UI_EVENT tEvent;

 tEvent = event;
 if (ccode != L_SELECT)
 return ccode;
 tEvent.type = EVT_PREVIOUS;
 object->eventManager->Put(tEvent);
 return ccode;
}
// QuitEvent is cancel - bail out immediately
#pragma argsused
EVENT_TYPE MY_WINDOW::QuitEvent(UI_WINDOW_OBJECT *object, UI_EVENT &event,
 EVENT_TYPE ccode)
{
 UI_EVENT tEvent;
 tEvent = event;
 if (ccode != L_SELECT)
 return ccode;
 tEvent = EVT_QUIT;
 object->eventManager->Put(tEvent);
 return ccode;
}
// This is our window's event switch...
EVENT_TYPE MY_WINDOW::Event(const UI_EVENT &event)
{
 EVENT_TYPE ccode = event.type;
 const UI_EVENT exit_event(L_EXIT);
 const UI_EVENT redisp(S_REDISPLAY);
 switch(ccode)
 {
 case EVT_QUIT:
 eventManager->Put(exit_event);
 break;
 case EVT_FIRST:
 display[FIRST]; // get our first bitmap
 if(display.Okay())
 {
 this->Event(redisp);
 display >> bm; // navigate to blob
 // display it at these coords.
 bm.Show(display_origin.x, display_origin.y);
 }
 break;
 case EVT_NEXT:
 display++; // next bitmap... etc...
 if(display.Okay())
 {
 this->Event(redisp);
 display >> bm;
 bm.Show(display_origin.x, display_origin.y);
 }
 break;
 case EVT_PREVIOUS:
 display--; // previous bitmap... etc...
 if(display.Okay())
 {
 this->Event(redisp);
 display >> bm;
 bm.Show(display_origin.x, display_origin.y);
 }
 break;

 default:
 ccode = UIW_WINDOW::Event(event);
 };
 return ccode;
}
// WinMain, mostly Zinc stuff...
#pragma argsused
int PASCAL WinMain(HANDLE hInstance, HANDLE hPrevInstance,
 LPSTR lpCmdLine, int nCmdShow)
{
 // Initialize the display
 UI_DISPLAY *display = new UI_MSWINDOWS_DISPLAY(hInstance,
 hPrevInstance, nCmdShow);
 // Create the event manager and add devices.
 UI_EVENT_MANAGER *eventManager = new UI_EVENT_MANAGER(display);
 *eventManager
 + new UID_KEYBOARD
 + new UID_MOUSE
 + new UID_CURSOR;
 // Create the window manager.
 UI_WINDOW_MANAGER *windowManager =
 new UI_WINDOW_MANAGER(display, eventManager);
 // Initialize error system, attach to UI_WINDOW_OBJECT static member
 UI_WINDOW_OBJECT::errorSystem = new UI_ERROR_SYSTEM;
 // Create our main window
 *windowManager + new MY_WINDOW;
 // Wait for user response.
 EVENT_TYPE ccode;
 do
 {
 UI_EVENT event;
 eventManager->Get(event, Q_NORMAL);
 ccode = windowManager->Event(event);
 } while (ccode != L_EXIT && ccode != S_NO_OBJECT);
 // Clean up.
 delete UI_WINDOW_OBJECT::errorSystem;
 UI_WINDOW_OBJECT::errorSystem = NULL;
 delete windowManager;
 delete eventManager;
 delete display;
 return (0);
}




















December, 1992
A CURMUDGERY ON PROGRAMMING LANGUAGE TRENDS


What next?




Scott Guthery


Scott is a scientific advisor at Schlumberger's Austin System Center in
Austin, Texas, where he was the chief software architect of Schlumberger's
family of wellsit data-acquisition systems. He has a PhD in probability and
statistics from Michigan State University and has been programming since 1957.
He can be reached via e-mail at guthery@asc.slb.com.


The current problems of software development and maintenance are primarily
those of scale--hundreds of programmers scattered over tens of years tending
to millions of lines of code. Problems like these aren't addressed by trendy
tinkering with programming languages.
There's a programming folk theorem that what feels good to one programmer
writing a 10,000-line program in isolation will be good for 100 programmers
writing a one-million-line program as a team. Experience has shown this is
false. In fact, features like self-modifying code, signature coercions, and
programmer-defined operator overloading make a language feel good to the lone
programmer but get in the way of effective team programming.
Historically we've focused on the productivity of individual programmers and
developed tools to make this person more productive. We reasoned that if each
member of a team were more productive, then the team would be more productive,
too. But the product of a team is supposed to be something more than the sum
of its members' products. If technology increases the productivity of
individual team members but frustrates the crystallization of the team, it
cannot address the real problems of software engineering.
In short, a chemical plant isn't simply a scaled-up copy of the chemist's
laboratory; a one-million-line program isn't a concatenation of 10,000-line
programs; and a programming team isn't just a midnight hacker with a lot of
fingers.


Translating the Problem Becomes the Problem


Object-oriented programming will soon join the modular programming of the
'60s, portable programming of the '70s, and structured programming of the '80s
on the paradigm compost heap. One after the other, these breakthroughs to
common sense have been converted into reasons to engage in massive translation
of perfectly functional code.
These pitches always have a convincing ring. "It slices, it dices, and it does
it all with modular components so you can add extensions later." But before we
get extensions, a brand-new tool is introduced, and herein lies the real rub.
Sooner or later, the folks who pay the bills are going to ask if buying the
new code-o-matic is really a major advance over the knife; if translating the
software problem from one language to another contributes to solving the
problem, or has itself become part of the problem. While we've told the bill
payers about information hiding and how we can change the implementation
without changing the interface, it always seems we have to rewrite the whole
thing. "This will be the last time...honest" is one of programming's big lies.
We've yet to compute whether any of these rewriting frenzies have paid for
themselves because we're afraid the people paying the bills won't like the
answer. If you take into account programmer time and capital resources
(computers, disk space, and the like) and ignore opportunity loss (programs
and features that could have been written and sold or used in the "old"
programming language), I seriously doubt that these programming fads have been
cost effective. What they have done is enrich and entrench the programming
priesthood.
Can you imagine chemists changing the symbols of the elements or the diagrams
of organic molecules as often as we change programming languages? Can you
imagine mathematicians changing their notation every ten years and the
confusion it would cause? Does this constant change of notation and
methodology account for some of software's chaos? Does := == = = <- or not?


C++ is a Black Hole for Programs


Programmers perpetually pretend (at least in front of their bosses) that the
current programming fad is the last, and that although they were wrong the
last five times, this system rewrite is really the last one. They rarely ask
the question, "How hard is it going to be to free my programs from this
language fad and move them to the next one or back to the previous one?"
Object-oriented methodology has made a number of positive contributions to how
we think about programming and programming languages. However, the de facto
object-oriented programming language winner, C++, is not one of them. How hard
is it going to be to free our programs from C++? I believe that C++ will leave
behind a bigger software mess than PL/I, Ada, and Lisp combined, for two
reasons: 1. We lack a complete understanding of the discipline needed to use
object-oriented programming effectively; and 2. C++ has a feature-creep
history, nasty syntax, and complex semantics. And when we decide to escape
from C++, the following land mines await us.
Programming with puns. Overloaded operators, multiple inheritance, and coerced
function calls will be big problems for the C++-to-whatever team. The meaning
of any symbol and where control goes when it is encountered depends in complex
ways on the entire corpus of code present during compilation and linking.
These language-extension and polymorphic-pun features of C++ make reading and
debugging C++ difficult, particularly when the code has been written by a team
of programmers. They may make translation--particularly automatic
translation--impossible to any practical extent.
while(C) {version = C++}. C++ is turning out to be a royal dynasty of
programming languages rather than just one well-defined programming language.
Its syntax and semantics have changed repeatedly over the ten-plus years of
its adolescence. To free a program from C++, you'll have to know which version
of C++ and maybe even which C++ compiler it was written for. This was true of
"classic" languages too, but the differences were apparent in the code syntax
itself. With C++, the differences are often hidden in the semantics--how the
same symbols are interpreted differently by the compilers and the run-time
engine. (By the way, I hear there's a new version due out real soon now with
templates, exceptions, and threads. Do you mean the American or European
version?)
Anything you can do I can do IsA. Object-oriented programming methodology is
an open invitation to gratuitous generalization and the growing of formal
gardens of complexity kudzu. Indeed some OO wizards recommend that you add
code and levels of generality to object-oriented programs simply on the
grounds of completeness and possible future need. But the thrust of commercial
programming is to do more with less, not less with more. The idea is to
synthesize, to make programs smaller, to refine and focus them. If what you
really want is an omelette, you don't start with a Faberge egg.


An Object is Just a TAB Card


Admiral Grace Hopper, America's first and finest software engineer, said that
we really don't do anything new in computing, we just give new names to old
ideas. Consider the following:
 Hollerith Date Stroustrup --------------------------------

 Field Column Attribute Drum Card Table Class Card Row Object
 Gang Punch Join Inherit
What is this concept that we rediscover with such regularity? In its purest
form it is the familiar entity-attribute-relation (E-A-R) diagram. OOP is
simply the realization of this idea in a programming language; the relational
data model realizes it in a data store; and TAB cards and tabulating machines
realize it in hardware.
In drawing up an E-A-R diagram for a program (doing object-oriented design, if
you must be seen at a table at Maxim's), you're led inexorably toward a study
of the problem and its context and away from the solution and its technology.
This--and not multiply inherited, polymorphic, virtual mix-in friends--is the
key insight of object-oriented methodology: Use E-A-R diagrams to do program
design. After all, the understanding of and insight into the problem is the
wellspring of program simplicity and implementation efficiency. A good
architect studies bricks once, but studies people every time he designs a new
building.
Programming languages have not proven to be particularly good tools for
understanding problems. Or, said another way, the better a programming
language is at problem description and study, the worse it is as a programming
language. Furthermore, using the same notation to record your understanding of
a problem and the program that's going to harness that understanding may not
be a good idea. (There's probably a reason a printed circuit board or a chip
mask doesn't look like a schematic diagram.)
Call it something new (like an "object") if you must, but understand that the
idea has been around for a long time. Don't be afraid to learn about the idea
by studying it under its older names. And while you're dipping into
programming history, keep in mind that the whole point of the Jacquard loom
was software reuse.


Object A+Object B=Object C


So how do programming at scale and E-A-R diagrams come together? How do you
build large programs with E-A-R diagrams?

Object-oriented programming's use of E-A-R diagrams provides us with only one
constructor--the "is a" (IsA) relation. Unfortunately, this really isn't the
kind of constructor we need to build big programs. It's flat--it never gets
off the ground. A generic car is about the same size as a 1957 Chevrolet, and
a motorized transportation device is probably about the same size as a generic
car. IsA lets us build vaulting conceptual castles, but it doesn't let us
build parts to build parts with which to build real castles. And we need
fleets, traffic jams, and parking lots of cars and cities with car fleets,
traffic jams, and parking lots.
To do object-oriented programming at scale, we need more relations (the "R" in
E-A-R), particularly the "part of" (PartOf) relation. Furthermore, to make
PartOf work right, connecting a bunch of objects should lead to another
object. In E-A-R talk, such a composite entity is called a "container." It is
still a first-class entity which can participate in still further relations,
including more PartOf relations. In its Hopper-esque rush to be new, OOP
hasn't stopped to learn from its roots.
E-A-R languages also have to externalize; that is, they must give back to the
programmer the relation-following machinery. In fact, I'd argue that the
"methods" used to follow relations between objects are more important for
building big programs than the methods used to compute some feature within an
object. To convert an E-A-R diagram to a flowchart, just replace each box with
a line and each line with a box. Code is nothing more than the way we maintain
relations between data.


Saturday Morning at the Hardware Store


My father was a weekend handyman. Every Saturday morning started with a trip
to the hardware store to find exactly the right tools to do the weekend's
project. Thirty-degree ratchet wrench, underwater silicon glue, ultrasonic
varnish remover. Every task had to have just the right tool to accomplish it.
On Monday morning, Mom called the local professional handyman, who came with
the same set of well-worn tools that he took to every job to clean up and
finish up Dad's mess.
Programmers by disposition are enamored of technology. This is a strength when
it comes to abandoning old, inefficient ways and adopting new, more efficient
ways. But it's a weakness when new software tools arrive faster than we can
learn to use them effectively and we spend too much time at the wrong end of
their learning curves. The benefits of a truly effective new technology are
foregone when, in the rush to try out the next shiny whiz-bang tool, we
abandon the current one.
More Details.


...And Use Your Napkin


Most new programming-language technology is either yesterday's forgotten
failures or today's common sense. Here's a sampling of the latter:
Seek constructs that help with problems of scale; features that just gild the
putty waste your time and usually make matters worse.
Study the problem and understand its context; it is here and not in your
high-tech toolkit that elegant solutions will be found.
Assemble programs and systems from big parts; try to write as little code as
possible; work the code, not the debugger.
Pick your programming tools carefully, change them infrequently, and make sure
you get to the top of the learning curve.


Another Curmudgeon Reflects


Al Stevens
Editor's Note: Scott's curmudgery on language trends caused DDJ contributing
editor Al Stevens to look back over his years at the keyboard. Here's his
reaction to Scott's article; we'd like to hear yours.
Scott Guthery's article, "A Curmudgery on Programming Language Trends" stirred
feelings of deja vu. Scott's sentiments recall similar beatings that occurred
in my breast with some regularity over the years. There have been a lot of
those years. Scott and I started at about the same time in the late '50s: the
Eisenhower years, cars with fins, duck tails, white bucks, Thunder Road, Bo
Diddley, and Bird. His words move me to consider why those of our generation
so rigorously resist change. I've lived and suffered through almost every
paradigm shift known to the commercial data-processing craft. I rebelled at
almost every one. I was almost always wrong.
My first job was as a "card walloper," an operator in an EAM shop. EAM means
"electronic accounting machine," and it refers to a class of punched-card
processing machines. We programmed them by wiring plug boards, and we operated
them by hand-feeding cards into their hoppers and removing those cards from
their stackers. We were in the data flow. I liked that work. We designed,
programmed, and operated systems, often completing all three steps in the
course of one duty shift.
When they made me a programmer on the IBM 650 and 1401, I accepted because it
was an honor to be selected and it paid more, but I didn't like the idea.
Computer programmers wore ties, sat at desks, worked normal hours, and
operated the machines only when they were debugging programs. I didn't want to
be so far away from the hardware. Once the programmer's job was done, someone
else got to play with the gear. I learned machine code and then the SOAP II
and Autocoder assembly languages. I didn't like assembly language. It took
away the fun because it calculated the operation codes and operand addresses,
something we had been doing for ourselves. Assembly language was supposed to
make us more productive, though.
Soon we had a 1410, which had a complex input-output architecture. IBM brought
a macro system named IOCS (Input-Output Control System) into our Autocoder
shop. It would make programming easier, they said, make us more productive.
The macros took care of reading and writing card, disk, printer, and tape
records and testing for errors. This is no good, we said. They're taking the
fun away again. We're supposed to forget about how the electromechanical
devices work and concentrate more on the program's purpose.
Next came Cobol. It would not only make us more productive, it would increase
the number of computers that we could program. I didn't like it. I thought it
was awful, not for the reasons that programmers don't like Cobol now, but
because it completely divorced the programmer from the hardware. How could you
write an efficient program if you didn't manipulate the registers, if you
didn't even know what they were?
Along with Cobol came the Closed Shop paradigm. It was supposed to make the
computer, not us, more productive. Programmers were no longer allowed to see
the machine, much less operate it. We passed decks of cards through a
window--a real hole in the wall, not a Window--and got source, output, and
memory-dump listings back through a window. These were the dark days of
programming. This was one of the few rebellions that was in the right. It
drastically reduced programmer productivity because we dealt with one bug per
throughput, usually measured in days, all of which was okay to the empty suits
because it gave the illusion of optimized computer usage, a more costly and
valued commodity than programmer hours.
The Closed Shop paradigm was in tandem with the Big System paradigm.
Management was enamored with the flawed idea that a computer software system
should support every functional and performance requirement they could dream
up, and that its first version should do it all. No requirement was
negotiable, and none could be postponed for a later version. We saw, and still
see, mainframe installations that cannot run uninterrupted for more than a few
hours or days without the attention of a staff of systems programmers,
database administrators, and applications programmers.
Minicomputers and, later, microcomputers both returned us to the old ways.
They had primitive operating systems and assemblers and, at first, no
so-called "high-level" languages. Programmers from my generation loved them.
We had been spoiled--had gotten lazy and fat--by the compilers, database
managers, and automatic flowcharters, and we fled back to our Spartan roots in
droves. But ontogeny recapitulates philology, and both times the return to the
golden years was short-lived.
Both structured programming and the relational-database model made sense right
out of the chute because they directly addressed and partially solved common
and persistent design and development problems, and their superiority was
obvious. They were valid, and they worked. They have problems of their own,
but it's worth it to benefit from what they offer.
I looked at object-oriented programming in the middle '80s. I saw nothing
compelling. The problems I dealt with did not cry for a change in methods. I
wrote programs in C and the programs worked. Why change? But when it was
inevitable that the C++ star would rise, another look was called for. I was
skeptical. Without experience in other object-oriented languages, I found in
C++ a better method for expressing programs. Part of that is due to the
notational improvements that C++ brings to C. Another part is the ability to
build new data types that behave as if they belong to the language. Still
another part is the lure to regard and express software solutions from the
perspective of the data types rather than the functions.
Like C, C++ offers the programmer more freedom than discipline. But discipline
in programming is a personal matter, and without it a programmer in any
language is going to build bad programs, and a team of programmers is going to
build bad software systems. Which brings us to the real problem: C++ and
object-oriented programming do not correct the Big System paradigm. If you
must build a big system, then no language, no tool, no paradigm is going to
guarantee success. The size of the system, the complexity of the relationships
between data entities, and the number of people doing the design and
construction are what get in the way.
For example, I spent 1974 to 1978 in an office building in Roslyn, Virginia
helping to build a network of ten minicomputers with 20 microcomputer
workstations to process text-message traffic. Dozens of people and several
contractors and government agencies were involved. In those four years, the
project saw slipped schedules, blown budgets, and changes in direction. During
that same time, I observed from my window the ground breaking, construction,
and completion of a Hyatt hotel, several new office high-rises, and the new
underground Metro mass-transit system. When I left, the hotel, office
buildings, and subway were done and operating, but the computer system was
still foundering.
Despite all the attempts and all the new and improved methodologies, we have
never figured out how to manage big software projects to a successful,
planned, and orderly completion. The failure of a new paradigm to solve what
has never been solved before does not by itself invalidate the paradigm.
Scott observes that chemists and mathematicians do not change their notations
as frequently we do. Neither do construction engineers, doctors, astronomers,
geologists, biologists, and archeologists. Consider the age of their sciences
and that of our craft. Their notations are many hundreds of years old. Ours is
less than 50. When Scott and I started programming, the craft was less than
ten years old. There were as many programmers in the world than as attend a
typical small trade show today. The computing power that then filled a room
now fits in your pocket. The rapid changes in technology and the sheer number
of people involved demand and assure that the tools and techniques will change
and improve. The programming industry will grow. We cannot afford to stop
growing with it.




















December, 1992
PROGRAMMING PARADIGMS


Recognition: Ink, Speech, and Otherwise




Michael Swaine


Lexicus is a small entrepreneurial firm chartered to provide the most natural
handwriting-recognition systems available. I talked with Lexicus president and
cofounder Ronjon Nag about the company, the product, the technology, and about
the new data type of ink.
Since the company's founding about a year ago, Lexicus has produced a
recognizer that can recognize cursive, print, and mixed handwriting. Two
versions of the recognizer, in fact: one for Pen-Point and one for Windows.
Both products are currently in beta, and Lexicus hopes to have them in
products early next year. They're looking at other platforms as well.
One strength of Lexicus's recognizer is that it is a software-only solution.
Among the (achieved) design goals were the goals of having it run on an 80386
system with no hardware accelerators or chips and no special hardware, and of
having it work at the level of the operating system, so that it can provide
consistent, automatic handwriting recognition for all applications running
under the operating system.
It's a young company, privately funded, based in Palo Alto, California. That
puts it in Silicon Valley, but perhaps more significantly, it puts it within
walking distance of Stanford University. Lexicus has strong academic roots.
The staff of fewer than ten people is a degree-heavy bunch of computer
scientists, engineers, and psychologists. Just who fits in which category is
hard to say: Most of these people have cross-disciplinary backgrounds and
interests. They're all PhDs, from Harvard, MIT, Stanford, Cambridge, and
Oxford, and they tend to have impressive academic credentials. There is one
Rhodes scholar among them, and one Harkness scholar (a Harkness scholarship is
like a Rhodes scholarship in reverse; it sends a Briton to America to study).
The Harkness scholar is Ronjon Nag. After earning a PhD in speech recognition
from Cambridge University, he went on to work as a management consultant in
London. He came to America three years ago, picked up an MBA at MIT's Sloan
School of Management, and moved on to Stanford University's psychology
department. It was here that he met Lexicus cofounder Chris Kortge, the prime
inventor of Lexicus's handwriting-recognition technology.
DDJ: Your background is in speech recognition. But when you started a company,
you focused on handwriting recognition. Why?
RN: Speech [is] the glamour area for pattern-recognition scientists. Most
people doing recognition [are] involved in speech rather than handwriting,
which has been regarded as rather a fringe subject in the last few years. But
now, with the advent of pen computers, it's attracting a little more
attention. We saw this as an area where we could get in very quickly and make
a sensible product. We didn't have to develop the operating system. We didn't
have to build the hardware. And, even though the market's been slow recently,
we have less responsibility to create the market than the much larger players.
This is not the case in speech.
DDJ: Most handwriting-recognition approaches require printed characters as
input, but you went straight for cursive recognition. In fact, your system
deals with mixed cursive and printed characters. Given that printing-only
recognizers still don't do a perfect job, why did you start with this harder
problem?
RN: Why we did cursive handwriting is that there seemed to be a sufficiently
large number of players managing the print-recognition problem, and [that]
problem is fairly well explored in the academic literature. If you look at the
cursive-recognition problem, it's hardly been touched relative to print
recognition or even speech recognition.
DDJ: Don't you sort of steer away from the term "cursive," though?
RN: Lexicus is trying to do what we call "natural recognition."
First-generation recognizers could only do print. In fact, very early ones
could only do block caps. We consider ourselves a second-generation recognizer
company, trying to produce [a recognizer for] print, cursive, or a combination
of both. Most people write as a combination of both. So that's what people
want: a recognizer that recognizes their natural handwriting.
DDJ: Their own natural handwriting, or anyone's? Where do you stand on the
writer-independent versus writer-dependent dimension?
RN: Our approach is to produce one that is as writer independent as possible,
that works out of the box in the first instance. We'll be working on training
to increase that accuracy for any particular user, but we've placed very high
importance on it working straight away in the store or as soon as the person
has opened the box.
DDJ: But isn't writer-dependent training the way to go to squeeze out the
greatest accuracy possible?
RN: Training is definitely a way to go. But there may be environments where
training is just not possible. Where a machine may be shared amongst a number
of people, where people haven't got time to train, or where a machine may be
stationary and people come up to it with no prior experience. For people who
use it all the time, that's when you have to use training to get that extra
few percent accuracy.
DDJ: I know you won't talk about your algorithms, but can you characterize
them? Are they refinements of work we might find in the academic literature?
Are they purely your invention?
RN: We have a number of recognition algorithms that we have very strong
expertise in, within the members of our group. In general, any of the
recognition algorithms that are out there in the literature, we have somebody
who is an expert in it. And we have our own proprietary work as well.
DDJ: Well, we can talk about the algorithms that are in the academic
literature, anyway. What kinds of generally known algorithms are there for
handwriting recognition?
RN: The traditional techniques for doing ordinary handprint recognition that
are in the literature revolve around neural networks, hidden Markov models,
fuzzy logic, clustering algorithms; I've also seen dynamic programming
approaches. Unfortunately, if you try to implement one of these published
algorithms, they'll get you 75 percent accuracy or whatever on some sort of
good data set. It takes a lot more effort to make a real product.
DDJ: Part of your approach is dictionary based. You use different techniques
for recognizing cursive and printed characters. You bring expertise with
different algorithms to bear. So would it be fair to characterize your
approach as a hybrid?
RN: What we usually say is that we use multiple sources of information and
multiple techniques to solve the problem. The problem is so difficult that you
have to use whatever information you have. Some people ask us how much of the
work is done by the dictionary and how much by the letter recognizer, and
really, that assumes certain things about the way we're doing it. And, without
telling people how we do it, it just doesn't work like that. It works in a
very integrated way.
DDJ: Tell me a little about the business. There are fewer than ten of you and
you're all pretty much straight out of academia. How does that affect the way
you function as a business?
RN: We're not very like the typical start-up, where you have a finance guy, a
marketing guy, a CEO, a technical guy, and a programmer, that kind of thing.
We have a very strong academic collegiate base: Everyone's a PhD. We also
stress people who have a multidisciplinary environment. I have a business
background and also a technology background, at the PhD level. And that's
where we differ a little from classic start-up companies. We sort of think
from a systems point of view, an integrated point of view, and have
cross-disciplinary working methods. So everyone gets involved in marketing
decisions and technical decisions, and they can do that because they all have
the capability to contribute at that level.
DDJ: So how does that work in practice? Do you sit around a table and make
group decisions?
RN: Well, we have structured meetings and unstructured meetings. Doing things
that are unstructured you typically sit around a table. But we have some
structured methodologies of trying to brainstorm ideas out and those take a
long time, the structured methodologies.
DDJ: Who besides you has a background in psychology?
RN: Chris Kortge. He also has a background in computer science. Although it's
getting to the state where it's very difficult to distinguish between the
disciplines. Computer science is entering into areas of psychology, and
psychology is entering into areas of computer science via the AI community,
mainly driven by activity in neural networks, I guess.
DDJ: Ah, yes, neural networks. Don't you have some connection with Rumelhart
at Stanford?
RN: Right. Both Chris and I were affiliated with Dave Rumelhart at Stanford. I
was a visiting scholar and Chris was a graduate student of Rumelhart's.
Rumelhart has been a major driving force in pushing psychology to useful
applications.
DDJ: Before we leave the academic connections, I'm curious how that is working
out, coming from an academic background. In starting a business, are there
problems with that?
RN: I think we're what's fashionably known as a "learning organization." I had
some business background as a management consultant advising CEOs of large
companies. But that's a very different thing from running a small company,
where you have to do all the nuts and bolts yourself. Things that you think of
as managed by somebody else, like phones, you have to have somebody within the
company do. That sort of draws attention from developing creative products. In
some ways it's difficult, and in other ways it's an advantage, because we can
act very, very quickly, much more quickly than a more structured organization.
But we've got a number of very talented advisors to help us in situations
where there is difficulty.
DDJ: So you feel that you're a nimble company?
RN: The market is pretty dynamic and unpredictable at the moment. What we've
gained is a nimbleness, an ability to adapt very quickly. We're not set in our
ways. If a suitable opportunity came up, we would drop everything and go and
do it, if it was a sufficient incentive.
DDJ: You're working with a number of much larger companies. What's that like?
RN: Usually when we roll into a large company, we get to a pretty high level
pretty quickly because of the nature of the product we have, which is not
typical with most startups, where you have to have a really hard sell just to
get through the door. But if you are talking to a pen company, having cursive
handwriting recognition is so unique that you get a lot of attention.
DDJ: Large companies are becoming more open to acquiring technology now,
aren't they?
RN: It's something that larger companies have to face up to, that many of the
innovative technologies are being done by very small companies and if they
want to stay competitive they have to form strategic alliances with those
companies that are coming up with the innovations. It's a lot of fun. We get
sort of ego boosting when we visit these large companies.
DDJ: Do you find that different companies take different approaches--like
wanting to buy things outright, say, versus saying, "We don't want to give you
any money up front, but we'll talk royalties"?
RN: It depends on their own situation. Why do they want the technology? Do
they want it because they think it's so good that nobody else can beat it? Do
they want it because it's not very good but they need somewhere to start from
and then they'll make it good? Or do they just want it for convenience? The
cost structure of their product doesn't want to handle a royalty burden, they
just want to have an up-front cost, and that's it? Mostly we're talking
royalties at this point, rather than outright purchases. And that's usually
what most companies want to do. One or two might prefer to buy it outright,
but usually the sums are not large enough for us to consider it.
DDJ: Let's talk about ink as a data type.
RN: When you think about it, there are a number of dimensions to ink as data.
First of all there are the actual characteristics of ink itself. At the
simplest level it is just a bitmap, just points at particular coordinate
positions. At the next level, you may have the thickness of the ink. The next
dimension is the time information: In which order is the ink actually put onto
the page or onto the notebook or the tablet. So those are obvious direct
physical attributes. But then maybe there are other types of information that
you can think of, which have [affected] how people think of ink.
DDJ: Like what?
RN: There are a number of things that you can do with ink if you know that you
can switch between ink and its interpretation and back again. One scenario is
you have a page where [you have written a letter], and a program goes through
and translates each word into its appropriate text translation. Now if you
keep the ink, you can go back and see what the word actually looked like. You
can imagine this as your computer secretary. Normally, you might scribble down
a letter and hand it over to your secretary, who would type it up. But she
can't read your writing, so she does her best....
DDJ: ...makes some guesses...
RN: ... makes some guesses, just like a recognition algorithm does. Now you go
back to what you wrote. You still may not be able to read what you wrote, but
it may give you some clues as to what it was. Another scenario is that you may
write 25 pages of notes on some topic, and you may not want to have it all
translated. But say you're looking for the note on Lexicus, for example. In
this scenario, you get your program to translate all 25 pages. Now it won't
get all 25 pages right, but hopefully you'll have written "Lexicus" one or two
times in your notes, and you can call up that page. If you're thinking ahead,
you may even have a keyword heading for each page of notes as a search item.
And that brings us along to the other [aspect] of ink as a data type, and that
is how ink data can be linked to other kinds of data.

DDJ: For example, by linking whole pages of uninterpreted ink by one or two
keywords translated from the handwriting.
RN: You could also go to the next level. Linking leads you into language.
Language will in the future be the way to get extremely high-accuracy
recognition. At the moment, Lexicus uses a dictionary to increase the accuracy
of its cursive-recognition system. That gets you so far. The next level would
be language, where you can have grammars. This has been successfully applied
to speech recognition, where it's quite common to get the accuracy up using a
language model, trying to work out which word follows another word.
DDJ: Sounds good. What's the holdup?
RN: Unfortunately, these things take a lot of memory. It's unlikely [that we
will] see this appearing until memory prices get even cheaper. But this is a
natural progression for increasing the power of ink in applications.
DDJ: You're not talking about the power of the data type of ink per se, but
about ink linked to other representations: text, pictorial, language.
RN: Rather than concentrate on ink as a pure data type, one should think of
ink as a data type that is linked to all these other types of data. That's
where people can make the most impact.
























































December, 1992
C PROGRAMMING


CUA++




Al Stevens


I am writing this in September, having just returned from C++ in Action, the
Santa Clara, California edition of the conference whose name keeps changing.
The one I attended six months ago in New Jersey was C/C++ in Action. They've
dropped the C part. Seems like nobody wants to talk about C anymore. That's
OK, it's all been said. Before that it was C/C++ at Work. I gave a talk on C++
persistent objects, the essence of which you will find in my article,
"Persistent Objects in C++," elsewhere in this issue. The conference and
exposition were well attended, which is well deserved. Interest in C++
continues to heat up, and C++ in Action, under the steerage of technical
director Chris Skelly, covers the subject comprehensively and with many
relevant and fascinating sessions. I was not there long enough to attend as
many of them as I wanted, and they always manage to schedule the ones that
interest me most at the same time, but I did make it to P.J. (Bill) Plauger's
"C++ for Pragmatists" keynote address, which he whimsically subtitled "Making
the Move from C++ to C." Bill always manages to speak straight to the heart of
a programmer's concerns, at the same time delighting us by bumping off a
graven image or two.


D-Flat++ on the Desktop


Last month we began the D-Flat++ project by showing the header files that
describe the desktop and hardware device classes. A DF++ application program
starts with an application window on a desktop. The application instantiates
the application window and associates a menu and menu command member functions
with the application window. The application window connects itself to the
global desktop, which contains the device objects for the user interface. This
month we will look at the .CPP source files that contain the member functions
for the desktop and its devices.
Listing One, page 154, is desktop.cpp, which contains the code for the desktop
itself. It begins by establishing the exception-handling logic for
out-of-memory conditions. Although contemporary C++ literature defines
exception handling, no PC compiler implements it yet. The set_new_handler
function has been in C++ for a while, however, and it provides a place for
programs to go when the system runs out of memory. Borland C++ implements
set_new_handler according to C++ tradition, but Microsoft C++ uses its own
nonconventional format, so desktop.cpp starts out with a compile-time
conditional to comply with whichever compiler you are using. As of this
writing, I am developing with the Borland compiler and then porting the code
to the Microsoft compiler, which is usually a seamless port.
Each DF++ application starts with a default Desktop object named,
appropriately, "desktop." When the application declares an application window,
the application-window constructor associates itself with the global desktop
object. The Desktop class definition includes embedded objects for the devices
on the desktop. The Desktop class constructor sets the default window pointers
to null and establishes the address of the out-of-memory exception handler,
which exits from the program. Then the constructor tells the cursor object to
hide itself. Since the desktop object is global, its constructor runs ahead of
the main function. The destructor runs when main returns and tells the cursor
object to show itself again, the assumption being that the program runs from
the command line where the cursor is needed and that individual document
windows will show the cursor if they need it.
The Desktop::DispatchEvents method calls the DispatchEvents method for each
device object that can generate event messages. The keyboard, mouse, and clock
can generate such events, so each has a function to dispatch its events to
whichever window should receive them.
Listings Two through Seven are screen.cpp, mouse.cpp, cursor.cpp,
keyboard.cpp, speaker.cpp, and clock.cpp. These modules contain the member
functions for the device classes defined in the header files from last month.
The screen object. The screen.cpp file (Listing Two, page 154) manages output
to the screen as well as reading and writing video memory. The Screen
constructor determines the display type and video mode, page, and memory
address. It reaches into the PC's BIOS RAM Data Area to get the current
character height and width of the screen. This code, like that of the other
modules this month, is heavily dependent on the system hardware. The rest of
DF++ is as independent of hardware as possible. There is a lot of low-level
code in screen.cpp. The module directly accesses video memory and uses the
INT86 function to make video BIOS calls.
The class includes member functions to test for an EGA or VGA monitor, which
will determine the screen formats that you may select. There is a function to
scroll a portion of the screen and functions to read and write individual
characters of video memory based on screen coordinates. There is also a
function to write a string to video memory, which uses a block move rather
than a series of character moves. This is a performance strategy. Whenever it
can, DF++ writes to the screen one line at a time.
The last two functions get and put video memory in large blocks. The blocks
are described by Rect objects passed as parameters. The buffers are assumed to
be holders for video memory that can fit into the rectangles.
The mouse object. The mouse.cpp file (Listing Three, page 154) contains the
code to manage the Mouse object. The constructor determines if a mouse device
driver is installed by looking at the mouse's interrupt vector. If the vector
is 0 or if it points to an iret machine-language operation code, the mouse
driver is not running, and the program will not attempt to use the mouse.
The mouse constructor and destructor save and restore the mouse state. I'm not
sure how necessary that is unless you use DF++ to develop a TSR. Why anyone
would use C++ to write TSRs is beyond me, though. That'd be like driving to
work in the Concorde. Or George Bush campaigning from the back of a train.
(Uh-oh, I spoke too soon. There he is on TV, hanging off the back of his
private railroad car hollering at a crowd of bewildered country folk and
trying to look like HST. No way, Mr. President.)
The other functions get and set the mouse cursor position, show and hide the
mouse cursor, determine if the mouse has moved since the last time the program
checked, see if the button has been pressed or released, and describe the
mouse's boundaries of travel on the screen. The DispatchEvent function, called
by the Desktop object's function of the same name, calls functions to dispatch
events related to button presses and releases and mouse movements.
The MouseWindow function determines which window should receive the mouse
event. If a window has captured the focus, the event goes to that window
regardless of the mouse cursor's position. Otherwise, the event goes to
whichever window the mouse cursor is in. The global in Window function, in
another source file, returns a window pointer given a set of coordinates. It
determines the window by observing its position relative to other windows that
share the same coordinates but that are overlaid. The foremost window is the
one returned.
The DispatchLeftButton function controls the typematic-like behavior of the
mouse. If the user holds the mouse button down for a while, the program sends
repeated left-button events to the window. The DispatchRelease function not
only dispatches a release event, but sets a timer. If a second release event
occurs before the timer runs down, the function dispatches a double-click
event. The DispatchMove function sends a move event if the mouse cursor is in
a position other than where it was the last time the program looked at it.
The cursor object. Cursor.cpp (Listing Four, page 155) contains the code to
show, hide, move, read, and size the screen's keyboard cursor. These are all
standard BIOS operations except for saving and restoring the cursor. Its shape
and state of visibility are on a stack. Windows that use the cursor save the
cursor before they do anything with it, and they restore it when they are
done. Since this process can occur in nested windows, the stack assures that
the cursor is restored to its correct configuration as different windows open
and close, acquiring and releasing the focus.
The keyboard object. Keyboard.cpp (Listing Five, page 156) contains the member
functions for the Keyboard object. Its DispatchEvents function determines
which window should get a keyboard event by using the following rules of
precedence. If a window has captured the focus, that window gets the event.
Otherwise, the event goes to the window that has the focus, if any. If no
window has the focus, the event goes to the application window.
The Keyboard object sends two events, one when the user presses a key and one
when the shift status changes. The class also provides the AltConvert function
to convert Alt+ keys to their ASCII letter or number equivalent.
The speaker object. Speaker.cpp (Listing Six, page 156) contains the Beep
function to make a sound through the computer's audio system. The function
drives the system's speaker by programming a frequency, starting the sound,
and leaving it on for two ticks of the system clock. The resulting buzz is
used by DF++ to tell the user that a selection is not valid.
The clock object. Clock.cpp (Listing Seven, page 156) sends the clock-tick
event to the application window every 19 ticks of the system's interval timer.
This approximates one event per second, but it is not precise because timer
ticks occur 18.2 times per second. The purpose of the event is to allow the
application window's status bar to display the time of day without needing to
poll the system's time-of-day clock unnecessarily.


How to Get the Source Code


D-Flat++ is available in its first incarnation from the DDJ Forum libraries on
CompuServe or directly from me. Send a stamped, self-addressed diskette mailer
and a formatted 360K or 720K diskette to me at D-Flat++, Dr. Dobb's Journal,
411 Borel Ave., San Mateo, CA, 94402-3522. Specify that you want D-Flat++. If
you want D-Flat as well, specify that, too. The software is free, but if you
wish, you can participate in our Careware program by including a dollar, which
I will donate to the Brevard County Food Bank.
The first version of DF++ is minimal at best, and should serve to acquaint you
with the structure of the class library and how to use it. You could develop a
small application with it, but there is more to come.


Book Report: Windows++


For a long time I've wanted to build a class library similar to D-Flat++ but
for Microsoft Windows. That's not a new idea. There are already many such
libraries: Borland's Object Window for Libraries (OWL) and Microsoft's
Foundation Classes (MFC) are two prominent examples. Nonetheless, such a
library, devoid of the bulk that usually accompanies a commercial product that
covers all bases would make for an interesting project to develop and write
about. I've put it off because I've had no Windows programs to write. I never
really considered doing the library from this column because I do not want to
bind the column to the Windows platform for that long. It would make a good
book, however. And so it has.
Windows++, by Paul DiLascia (Addison-Wesley, 1992), is such a book. This is
the kind of programmer's book that I like. It teaches in an orderly
progression of lessons that build upon one another, and it includes a large
body of C++ source code that the reader can use to develop real programs. It's
a perfect stocking stuffer for the programmers on your list.
The book contains a full class library that encapsulates enough of the Windows
API to satisfy most applications. Where the library is incomplete, it tells
you how to enhance the library to include what you need. Chapter 1 introduces
the concept of the Windows++ class library, how the author came to develop it,
and shows you what a "hello, world" program will look like after you learn how
to use the library. Chapter 2 is about C++, but you really need a better
introduction if you do not already know the language. It's difficult to find a
reason for this chapter. If you already know C++, you do not need it. If you
do not know C++, you need more than what you get.
Chapter 3 regresses by writing the same "hello, world" program in C using the
Windows API and the SDK. The author assumes that you understand Windows
programming and that you will understand that program. He wants you to
contrast the complexity of the C version with that of the simpler C++ version
from Chapter 1. Then, step by step, he rewrites the C program, moving a piece
of it at a time into the encapsulated C++ class library. In the final step,
you are back to the original C++ program, which shows you that much of the
complexity of the Windows API is routine, and that the complexity can be
hidden in the encapsulated classes.
Chapter 4 is a gem. It describes the ridiculous state of Windows memory
management and the absurd hoops that programmers jump through to use different
memory models, and then it proceeds to solve the problem. You not only learn
how to solve it, you get the solution encapsulated in the class library so you
can forget about it.
Chapter 5 presents a couple of example programs, each one revealing new things
about Windows programming and about the Windows++ class library. Chapter 6
shows you how to manage mundane tasks such as disabling menu commands. You
learn to encapsulate routine user-interface procedures such as the ubiquitous
File Open dialog box and the Edit menu commands. Windows++ includes
encapsulation of multiple document-interface windows as well, and Chapter 6
covers that, too.
Dialog boxes get a full chapter of their own. You learn not only how to
include dialog boxes in your Windows++ programs but how the classes work to
implement the dialog boxes and their controls. The subject of graphics has its
own chapter, too. Early in the book, the author claims that Windows++ is
superior to Borland's OWL. Graphics is an example of that. Windows++ includes
classes to support the Windows graphics device interface (GDI), and OWL does
not.
A Windows++ bonus is the chapter on creating a dynamic link library (DLL) for
the class library. Besides explaining how DLLs themselves work and then
putting the Windows++ class library into a DLL, the chapter is a guide to
building DLLs for other class libraries as well, which is an important feature
for developers of programs that run in a multitasking environment.
I compiled all the Windows++ example programs and ran them. The Mandelbrot
program kills my system, making Windows behave erratically, but the author
could not reproduce the problem on his computer and had not heard of that
problem from anyone else. Everything else worked without a hitch. The book
delivers what it promises. It is unique. No other work that I have seen comes
close to what Windows++ brings to the C++ Windows programmer. Even if you use
a different class library or application framework for Windows development,
this book is invaluable for showing you how it all fits together.
The book does not claim to cover everything there is to know about Windows
programming, and the class library does not encapsulate all the features
available to a Windows programmer. There is no mention of using the
communications ports in a Windows program, a sure candidate for the kind of
encapsulation that Windows++ provides. The book ignores dynamic data exchange
(DDE) and object linking and embedding (OLE), two prominent Windows features.
Although the class library supports clipboard operations, the book gives one
brief paragraph to the subject, referring the reader to the source code for
details.

My only serious criticism is of a characteristic that I've been seeing a lot
of in current computer books. Although I like the breezy, informal style that
runs throughout the book, sometimes it gets too cute and the author too
chummy. Mind you, columnists can get away with it because you know us, but in
a technical book or article those occasional familiarities deflect our
attention from the subject, and for the moment we have to deal with the author
rather than with what he or she is teaching. As in all social circumstances
where people interact, the stranger must gain acceptance before revealing the
more personal sides of his or her personality. After talking to Paul for a
while, his book was easier to read because I knew his voice. Before that,
however, his unnecessary use of slang and clever asides seemed strained, and
it detracted from what is an otherwise excellent and relevant book.


Coming Attractions


In the coming months we'll look at how the DF++ application window works, how
to build menus and dialog boxes, and an example application or two. I've
developed a program for making textmode screen snapshots on a LaserJet, and
I'll be showing you that as well.
_C PROGRAMMING COLUMN_
by Al Stevens


[LISTING ONE]

// --------------- desktop.cpp

#include <new.h>
#include "desktop.h"

DeskTop desktop;

#ifdef MSC
int NoMemory(unsigned int)
{
 exit(-1);
 return 0;
}
#else
void NoMemory()
{
 exit(-1);
}
#endif
DeskTop::DeskTop()
{
 apwnd = infocus = firstcapture = focuscapture = NULL;
#ifdef MSC
 _set_new_handler(&NoMemory);
#else
 set_new_handler(NoMemory);
#endif
 syscursor.Hide();
}
DeskTop::~DeskTop()
{
 syscursor.Show();
}
Bool DeskTop::DispatchEvents()
{
 syskeyboard.DispatchEvent();
 sysmouse.DispatchEvent();
 sysclock.DispatchEvent();
 return (Bool) (apwnd != NULL);
}









[LISTING TWO]

// ----------- screen.cpp

#include <string.h>
#include "desktop.h"

Screen::Screen()
{
 if (isEGA() isVGA()) {
 // --- turn blinking off
 regs.x.ax = 0x1003;
 regs.h.bl = 0;
 int86(VIDEO, &regs, &regs);
 }
 // ---- get the video mode and page
 regs.h.ah = 15;
 int86(VIDEO, &regs, &regs);
 mode = regs.h.al;
 page = regs.x.bx;
 page &= 0xff00;
 mode &= 0x7f;
 // ---- Monochrome Display Adaptor or text mode
 if (isMono())
 address = 0xb000;
 else
 // ------ Text mode
 address = 0xb800 + page;
 width = *(unsigned char far *)( MK_FP(0x40,0x4a) );
 if (isVGA() isEGA())
 height = *(unsigned char far *)( MK_FP(0x40,0x84) )+1;
 else
 height = 25;
}
// ---- test for EGA
Bool Screen::isEGA(void)
{
 if (isVGA())
 return False;
 regs.h.ah = 0x12;
 regs.h.bl = 0x10;
 int86(VIDEO, &regs, &regs);
 return (Bool) (regs.h.bl != 0x10);
}
// ---- test for VGA
Bool Screen::isVGA(void)
{
 regs.x.ax = 0x1a00;
 int86(VIDEO, &regs, &regs);
 return (Bool) (regs.h.al == 0x1a && regs.h.bl > 6);
}
// --------- scroll the screen d: 1 = up, 0 = dn
void Screen::Scroll(Rect &rc, int d, int fg, int bg)
{
 desktop.mouse().Hide();
 regs.h.cl = rc.Left();
 regs.h.ch = rc.Top();

 regs.h.dl = rc.Right();
 regs.h.dh = rc.Bottom();
 regs.h.bh = clr(fg,bg);
 regs.h.ah = 7 - d;
 regs.h.al = 1;
 int86(VIDEO, &regs, &regs);
 desktop.mouse().Show();
}
// -------- read a character of video memory
unsigned int Screen::GetVideoChar(int x, int y)
{
 int c;
 desktop.mouse().Hide();
 c = peek(address, vad(x,y));
 desktop.mouse().Show();
 return c & 255;
}
// -------- write a character of video memory
void Screen::PutVideoChar(int x, int y, unsigned int c)
{
 if (x < width && y < height) {
 desktop.mouse().Hide();
 poke(address, vad(x,y), c);
 desktop.mouse().Show();
 }
}
// --------- Write a string to video memory
void Screen::WriteVideoString(char *s,int x,int y,int fg,int bg)
{
 if (x < width && y < height) {
 int len = strlen(s);
 int *ln = new int[len];
 int *cp1 = ln;
 int col = clr(fg,bg) << 8;
 while (*s) {
 *cp1++ = (*s & 255) col;
 s++;
 }
 if (x + len >= width)
 len = width - x;
 desktop.mouse().Hide();
 movedata(FP_SEG(ln),FP_OFF(ln),address,vad(x,y),len*2);
 desktop.mouse().Show();
 delete [] ln;
 }
}
// -- read a rectangle of video memory into a save buffer
void Screen::GetBuffer(Rect &rc, char *bf)
{
 if (rc.Left() >= width rc.Top() >= height)
 return;
 int ht = rc.Bottom()-rc.Top()+1;
 int bytes_row = (rc.Right()-rc.Left()+1) * 2;
 unsigned vadr = vad(rc.Left(), rc.Top());
 desktop.mouse().Hide();
 while (ht--) {
 movedata(address, vadr, FP_SEG(bf),
 FP_OFF(bf), bytes_row);
 vadr += width*2;

 bf = (char far *)bf + bytes_row;
 }
 desktop.mouse().Show();
}
// -- write a rectangle of video memory from a save buffer
void Screen::PutBuffer(Rect &rc, char *bf)
{
 if (rc.Left() >= width rc.Top() >= height)
 return;
 int ht = rc.Bottom()-rc.Top()+1;
 int bytes_row = (rc.Right()-rc.Left()+1) * 2;
 unsigned vadr = vad(rc.Left(), rc.Top());
 desktop.mouse().Hide();
 while (ht--) {
 movedata(FP_SEG(bf), FP_OFF(bf), address,
 vadr, bytes_row);
 vadr += width*2;
 bf += bytes_row;
 }
 desktop.mouse().Show();
}








[LISTING THREE]

// ------------- mouse.cpp

#include <stdio.h>
#include "desktop.h"

// -------- mouse constructor
Mouse::Mouse()
{
 // ------- see if mouse driver is installed
 unsigned char far *ms;
 ms = (unsigned char far *)
 MK_FP(peek(0, MOUSE*4+2), peek(0, MOUSE*4));
 // --- if the interrupt vector is null or points to a retf,
 // the mouse driver is not installed
 installed = (Bool) (ms != NULL && *ms != 0xcf);

 if (installed) {
 // --- get the mouse state buffer size
 CallMouse(BUFFSIZE);
 statebuffer = new char[regs.x.bx];
 // --- save the mouse state
 CallMouse(SAVESTATE, 0, 0,
 FP_OFF(statebuffer), FP_SEG(statebuffer));
 // --- reset the mouse
 CallMouse(RESETMOUSE);
 prevx = prevy =
 clickx = clicky =
 releasex = releasey -1;

 SetTravel(0, desktop.screen().Width()-1, 0,
 desktop.screen().Height()-1);
 }
}
Mouse::~Mouse()
{
 if (installed) {
 Hide();
 // --- restore the mouse state
 CallMouse(RESTORESTATE, 0, 0,
 FP_OFF(statebuffer), FP_SEG(statebuffer));
 delete [] statebuffer;
 }
}
void Mouse::GetPosition(int &mx, int &my)
{
 mx = my = 0;
 if (installed) {
 CallMouse(READMOUSE);
 mx = regs.x.cx/8;
 my = regs.x.dx/8;
 if (desktop.screen().Width() == 40)
 mx /= 2;
 }
}
void Mouse::SetPosition(int x, int y)
{
 if (installed) {
 if (desktop.screen().Width() == 40)
 x *= 2;
 CallMouse(SETPOSITION,0,x*8,y*8);
 }
}
Bool Mouse::Moved()
{
 int x, y;
 Bool rtn = False;
 if (installed) {
 GetPosition(x, y);
 rtn = (Bool) (x != prevx y != prevy);
 prevx = x;
 prevy = y;
 }
 return rtn;
}
void Mouse::Show()
{
 if (installed)
 CallMouse(SHOWMOUSE);
}
void Mouse::Hide()
{
 if (installed)
 CallMouse(HIDEMOUSE);
}
Bool Mouse::LeftButton()
{
 Bool rtn = False;
 if (installed) {

 CallMouse(READMOUSE);
 rtn = (Bool) ((regs.x.bx & 1) == 1);
 }
 return rtn;
}
Bool Mouse::ButtonReleased()
{
 Bool rtn = False;
 if (installed) {
 CallMouse(BUTTONRELEASED);
 rtn = (Bool) (regs.x.bx != 0);
 }
 return rtn;
}
void Mouse::SetTravel(int minx, int maxx, int miny, int maxy)
{
 if (installed) {
 if (desktop.screen().Width() == 40) {
 minx *= 2;
 maxx *= 2;
 }
 CallMouse(XLIMIT, 0, minx*8, maxx*8);
 CallMouse(YLIMIT, 0, miny*8, maxy*8);
 }
}
void Mouse::CallMouse(int m1,int m2,int m3,int m4, unsigned es)
{
 struct SREGS sregs;
 segread(&sregs);
 if (es != 0)
 sregs.es = es;
 regs.x.dx = m4;
 regs.x.cx = m3;
 regs.x.bx = m2;
 regs.x.ax = m1;
 int86x(MOUSE, &regs, &regs, &sregs);
}
// ------ get the window to send mouse events
DFWindow *Mouse::MouseWindow(int mx, int my)
{
 DFWindow *Mwnd;
 if (desktop.FocusCapture() != NULL)
 Mwnd = desktop.FocusCapture();
 else
 Mwnd = inWindow(mx, my);
 return Mwnd;
}
void Mouse::DispatchRelease()
{
 if (ButtonReleased()) {
 int mx, my;
 GetPosition(mx, my);
 DFWindow *Mwnd;
 if ((Mwnd = MouseWindow(mx, my)) == NULL)
 return;
 // ------- disable typematic check
 clickx = clicky = -1;
 delaytimer.DisableTimer();
 // ------- the button was released

 if (mx == releasex && my == releasey) {
 // ---- same position as last left button release
 if (doubletimer.TimerRunning()) {
 // -- second click before double timeout
 doubletimer.DisableTimer();
 Mwnd->DoubleClick(mx, my);
 releasex = releasey = -1;
 }
 }
 else {
 doubletimer.SetTimer(DOUBLETICKS);
 Mwnd->ButtonReleased(mx, my);
 releasex = mx;
 releasey = my;
 }
 }
}
void Mouse::DispatchMove()
{
 if (Moved()) {
 int mx, my;
 GetPosition(mx, my);
 DFWindow *Mwnd;
 if ((Mwnd = MouseWindow(mx, my)) != NULL) {
 Mwnd->MouseMoved(mx, my);
 clickx = clicky = -1;
 }
 }
}
void Mouse::DispatchLeftButton()
{
 if (LeftButton()) {
 int mx, my;
 GetPosition(mx, my);
 DFWindow *Mwnd;
 if ((Mwnd = MouseWindow(mx, my)) == NULL)
 return;
 if (mx == clickx && my == clicky) {
 if (delaytimer.TimedOut()) {
 // ---- button held down a while
 delaytimer.SetTimer(DELAYTICKS);
 // ---- post a typematic-like button
 Mwnd->LeftButton(mx, my);
 }
 }
 else {
 // --------- new button press
 delaytimer.SetTimer(FIRSTDELAY);
 if (Mwnd->SetFocus())
 Mwnd->LeftButton(mx, my);
 clickx = mx;
 clicky = my;
 }
 }
}
// -------- dispatch mouse events
void Mouse::DispatchEvent()
{
 DispatchRelease();

 DispatchMove();
 DispatchLeftButton();
}








[LISTING FOUR]

// ------------ cursor.cpp

#include <dos.h>
#include "cursor.h"
#include "desktop.h"

Cursor::Cursor()
{
 cs = 0;
 Save();
}
Cursor::~Cursor()
{
 Restore();
}
// ------ get cursor shape and position
void Cursor::GetCursor()
{
 regs.h.ah = READCURSOR;
 regs.x.bx = desktop.screen().Page();
 int86(VIDEO, &regs, &regs);
}
// -------- get the current cursor position
void Cursor::GetPosition(int &x, int &y)
{
 GetCursor();
 x = regs.h.dl;
 y = regs.h.dh;
}
// ------ position the cursor
void Cursor::SetPosition(int x, int y)
{
 regs.x.dx = ((y << 8) & 0xff00) + x;
 regs.h.ah = SETCURSOR;
 regs.x.bx = desktop.screen().Page();
 int86(VIDEO, &regs, &regs);
}
// ------ save the current cursor configuration
void Cursor::Save()
{
 if (cs < MAXSAVES) {
 GetCursor();
 cursorshape[cs] = regs.x.cx;
 cursorpos[cs] = regs.x.dx;
 cs++;
 }

}
// ---- restore the saved cursor configuration
void Cursor::Restore()
{
 if (cs) {
 --cs;
 regs.x.dx = cursorpos[cs];
 regs.h.ah = SETCURSOR;
 regs.x.bx = desktop.screen().Page();
 int86(VIDEO, &regs, &regs);
 SetType(cursorshape[cs]);
 }
}
/* ---- set the cursor type ---- */
void Cursor::SetType(unsigned t)
{
 regs.h.ah = SETCURSORTYPE;
 regs.x.bx = desktop.screen().Page();
 regs.x.cx = t;
 int86(VIDEO, &regs, &regs);
}
/* ----- swap the cursor stack ------- */
void Cursor::SwapStack()
{
 if (cs > 1) {
 swap(cursorpos[cs-2], cursorpos[cs-1]);
 swap(cursorshape[cs-2], cursorshape[cs-1]);
 }
}
/* ------ hide the cursor ------ */
void Cursor::Hide()
{
 GetCursor();
 regs.h.ch = HIDECURSOR;
 regs.h.ah = SETCURSORTYPE;
 int86(VIDEO, &regs, &regs);
}
/* ------ show the cursor ------ */
void Cursor::Show()
{
 GetCursor();
 regs.h.ch &= ~HIDECURSOR;
 regs.h.ah = SETCURSORTYPE;
 int86(VIDEO, &regs, &regs);
}







[LISTING FIVE]

// ----------- keyboard.cpp

#include <stdio.h>
#include <bios.h>
#include <dos.h>

#include "desktop.h"

/* ----- table of alt keys for finding shortcut keys ----- */
static int altconvert[] = {
 ALT_A,ALT_B,ALT_C,ALT_D,ALT_E,ALT_F,ALT_G,ALT_H,
 ALT_I,ALT_J,ALT_K,ALT_L,ALT_M,ALT_N,ALT_O,ALT_P,
 ALT_Q,ALT_R,ALT_S,ALT_T,ALT_U,ALT_V,ALT_W,ALT_X,
 ALT_Y,ALT_Z,ALT_0,ALT_1,ALT_2,ALT_3,ALT_4,ALT_5,
 ALT_6,ALT_7,ALT_8,ALT_9
};
/* ---- Test for keystroke ---- */
#ifndef MSC
Bool Keyboard::KeyHit()
{
 _AH = 1;
 geninterrupt(KEYBRD);
 return (Bool)((_FLAGS & ZEROFLAG) == 0);
}
#else
Bool Keyboard::KeyHit()
{
 return (Bool) (bioskey(1) != 0);
}
#endif
/* ---- Read a keystroke ---- */
int Keyboard::GetKey()
{
 int c;
 while (KeyHit() == False)
 ;
 if (((c = bioskey(0)) & 0xff) == 0)
 c = (c >> 8) 0x1080;
 else
 c &= 0xff;
 return c & 0x10ff;
}
/* ---------- read the keyboard shift status --------- */
int Keyboard::GetShift()
{
 regs.h.ah = 2;
 int86(KEYBRD, &regs, &regs);
 return regs.h.al;
}
/* ------ convert an Alt+ key to its letter equivalent ----- */
int Keyboard::AltConvert(int c)
{
 int i, a = 0;
 for (i = 0; i < 36; i++)
 if (c == altconvert[i])
 break;
 if (i < 26)
 a = 'a' + i;
 else if (i < 36)
 a = '0' + i - 26;
 return a;
}
Bool Keyboard::ShiftChanged()
{
 int sk = GetShift();

 Bool rtn = (Bool) (sk != shift);
 shift = sk;
 return rtn;
}
// ------ dispatch keyboard events
void Keyboard::DispatchEvent()
{
 // ---- find window for keyboard events
 DFWindow *Kwnd = desktop.FocusCapture() ?
 desktop.FocusCapture() :
 desktop.InFocus() ?
 desktop.InFocus() : desktop.ApplWnd();
 if (ShiftChanged())
 // ---- the shift status changed
 Kwnd->ShiftChanged(GetShift());
 if (KeyHit())
 // --- a key was pressed
 Kwnd->Keyboard(GetKey());
}








[LISTING SIX]

// -------- speaker.cpp

#include <dos.h>
#include <conio.h>
#include "speaker.h"
#include "dflatdef.h"

// -------- sound a tone
void Speaker::Beep()
{
 outp(0x43, 0xb6); // program the frequency
 outp(0x42, (int) (COUNT % 256));
 outp(0x42, (int) (COUNT / 256));
 outp(0x61, inp(0x61) 3); // start the sound
 // -------- wait two clock ticks
 const int far *clk = (int far *) MK_FP(0x40,0x6c);
 int then = *clk+2;
 while (*clk < then)
 ;
 outp(0x61, inp(0x61) & ~3); // stop the sound
}






[LISTING SEVEN]

// ------- clock.cpp


#include "desktop.h"

Clock::Clock()
{
 clocktimer.SetTimer(0);
}
void Clock::DispatchEvent()
{
 if (clocktimer.TimedOut()) {
 // -------- reset the timer
 clocktimer.SetTimer(19); // approx. 19 tics/second
 // -------- post the clock event
 if (desktop.ApplWnd() != NULL)
 desktop.ApplWnd()->ClockTick();
 }
}













































December, 1992
STRUCTURED PROGRAMMING


Toss It in the Cart!




Jeff Duntemann KG7JF


For those (few) of you whose neighborhoods haven't been invaded yet, let me
introduce you to a phenomenon: Retailing Writ Large. Price Club and its clones
like Sam's Club and Price Savers are popping up everywhere you look, with
60-foot corrugated steel ceilings and boxes of Honey Nut Cheerios so big that
when you get them home you realize they don't fit in any of your kitchen
cabinets.
On Sunday, it's the 21st century bazaar, with throngs of people pushing
massive carts down aisles wider than the street I grew up on, feverishly
grabbing toilet paper by the 64-pak and five-gallon buckets of barbecue sauce.
At every turn are line reps plying you with free samples of 36-grain Energy
Bars, Endless Rainbow Gourmet Jelly Beans, and Mamacita Rosita's Quik-Frozen
Taco Mix. Forget lunch. You'll be stuffed before you get halfway down Aisle 1.
This all may or may not be a good idea; I keep thinking a lot of barbecue
sauce must go bad before your typical family of four can get through it. On
the other hand, it's plainly the future: low prices, pleasant and disciplined
(if somewhat scarce) young personnel, and most of the necessities of life
amidst a sprinkle of its luxuries in a highly calculated mix. It works like
this: You go into Price Club just to get a gallon of milk and a box of
Cheerios, cheap. But in the process, you pass by a multitude of other good
things that you use every day at astonishing prices, and one by one they start
flying off the shelves into your cart. Then before you know it, you start
tossing in jugs of Chivas Regal, wristwatches, Madonna CDs, a silk shirt or
two, and then a streamlined resin chaise lounge recliner. I've managed to get
out of there for $50.00 on a good day, but my friend Pat Thurman WA9NGP says
he rarely makes it to the door for less than $100.00. It is to boggle.
Like almost anything else you examine closely on this most-interesting planet,
there is a lesson in the warehouse clubs: Get people in the door with good
prices, put them in a buying frenzy, and they'll buy lots more stuff than they
ever intended to. And I point it out here because it looks like software has
begun to be sold in much the same way.
I've been speaking with a company called SofSource, which distributes software
in an interesting fashion. They take simple, self-explanatory, mostly
horizontal applications and video games and package them up in minimalist
fashion for display in mass-market outlets. The package price on these items
is typically $5.95 or $6.95. The idea is that if you've already thrown 64
rolls of toilet paper and a silk shirt in your shopping cart, what's another
$6.00 for a piece of software? Toss it in the cart!
It seems to work. According to SofSource, they sell thousands of copies of a
package every month for the first 18 months or so that it's on the shelves.
Depending on what the package is, they tell me authors get between $3000.00
and $5000.00 per month on their royalty scale while the package is selling
briskly. That's not riches, but riches aren't the idea anymore--the way I see
it, programmers should quit killing themselves trying to become millionaires
and perhaps spend a little more time with their kids instead.


The Distribution Problem


I get a lot of mail from software authors, asking me to look at their program
on the enclosed disk and suggest how they could market it. The programs are
often astonishingly good, but with zero resources to get them to market, the
authors are right in assuming they don't have much of a chance. I have to grit
my teeth and tell them so, most of the time. Then I suggest they release the
product as shareware.
I sometimes hear back from them, telling me bitterly that shareware simply
doesn't work--they've tried it. The only response is that shareware does
work--statistically--but that not everybody wins, and the factors that dictate
who wins and who doesn't are most obscure. Worse, many of the most significant
factors, like whether or not John Dvorak plugs a shareware product in one of
his columns, are utterly up to fate and dumb luck.
(I've never released anything as shareware per se. I have released a couple of
things as "swapware;" that is, if you like my software, don't send me
money--send me ten bucks' worth of something I can use. Usually I specify
nuts, bolts, tools, or electronic parts, and sunuvugun, every so often someone
mails me five pounds of resistors or a broken cordless phone. The swapware
concept works well for resistors, which I use a lot of. Would it work for
food? That is, if you like my software, please send me a case of Honey Nut
Cheerios, or ten cans of chunk light tuna. Electronic barter. Might be worth a
try!)
What shareware addresses is the difficulty of trying out software before you
buy it. Basically you take it, try it, and if you like it you pay for it. The
weakness here is that the distribution system is utterly automatic and
accidental: People passing your software around, uploading it to BBSs, giving
it to their friends, and so on. Worse, the number of people who use shareware
without paying for it is very high by most estimates.
What SofSource is doing is putting the price point of software so far below
the threshold of pain that if you don't like it, hey, what the hell. You're
only out six bucks. And everybody who tries it pays. The distribution system
is deliberate and methodical, not flukily automatic, and it taps into that
very human ability to go into a spending frenzy when surrounded by too many
goods piled too high in the air.
I honestly don't know how well SofSource works yet. But by the time you read
this, my mortgage-calculator application (a seriously mutated and fleshed-out
descendent of HCALC) will be heading into the SofSource distribution channel.
I'll let you know what happens. In the meantime, if you'd like to hear more
about SofSource, contact Bob Falk in El Paso, who does most of their
acquisitions. (See the product box included in this column.)


Turbo Vision Resources


In the process of turning HCALC.PAS into a commercial application, I learned a
lot more about Turbo Vision than I had intended to. Some was good, some was
... well, marginal, but I have to keep weighing my difficulties against the
challenge of duplicating what TV does on my own.
One of the truly good things about Turbo Vision that I hadn't paid much
attention to before is the notion of resources. A resource sounds mysterious
(and the TV Guide gives the idea five pages, period), but it's far less
mysterious and potentially far more useful than the Borland documentation lets
on.
Some people have characterized resources as random-access streams, but that's
only about a third of the truth. A resource is a random-access stream keyed by
a text string. You can think of a resource as a black box containing named
objects, and when you pass the resource a string "P_F_SLOAN" you will get back
an object that had earlier been stored under the name "P_F_SLOAN," or else you
will be told that no such object exists in the resource. You don't have to
fuss with the search or worry about the internal representation of any of the
data in question. You simply have to call the resource's Get method, and the
search is done for you.
Beneath the surface, a resource is something like an ISAM manager for an
ordinary Turbo Vision stream. The string keys are stored in a special
string-collection class used only to index streams. The resource stores its
objects on a stream, and therefore it is polymorphic: The stream can contain
any object type ultimately descended from Borland's standard TObject type. All
the objects stored in a resource do not have to be of the same type, nor do
you necessarily need to know the type of the object when you request it--or
get it back. You only need to know its name.
One nonobvious requirement is that the application that reads an object in
from a resource must contain the code comprising any object stored in a
resource. The code proper is not stored out to the resource file; only the
object's state--the contents of its fields--is stored on disk. The code
containing an object's methods must have been linked into the application when
the program was compiled, if the application is later to read an object from
any stream or resource. I've made this point before, but people still seem to
be confused about it. Registration of types with streams and resources is how
a stream or a resource connects the state of an object read in from disk with
the object's code that already exists in memory.


Gee, Am I Doing OODBM?


As I'll begin explaining shortly, resources are mostly used to contain program
user-interface elements like menu bars, dialog boxes, and string lists.
However, there's nothing to limit you to programming UI elements. There's no
reason you can't build a simple database application around resources, using
some selected unique string field in an object as the index name.
The advantage there would be that you could sculpt a database "record"--an
object class, actually--to precisely fit the needs of the data it contains,
rather than having to massage the data to fit some sort of
all-things-to-all-data record format.
For example: Suppose you wanted to create a database to log your collection of
books, records, and videotapes. The three categories are similar enough to
allow a single database to make sense, but some significant differences make
representation of all three categories in a single record format close to
impossible. Books have an ISBN number and a single byline. A record is almost
always an anthology of works with two different types of "author," the creator
and the performer, that may be different for each work on the record. Records
have no standard registration number like the ISBN. Videotapes have a running
time figure that books do not. CDs have the "AAD" designator that should be
stored somewhere. You get my drift. You can define three classes to model
books, records, and videotapes separately, and then store out objects of those
classes at random to the same resource file. The only element that they must
have in common is a text string to act as the access key.
The limitation to this concept is that all key strings must be present in
memory at all times that a resource file is to be accessed. The key strings
are stored in a Turbo Vision collection, and the collection has no "virtual"
capability; that is, it can't keep parts of itself on disk and only load a
portion of itself into memory at one time. The longer your key strings are,
the more memory the resource as a whole will occupy, and pretty quickly you're
going to have a monster object eating up a major slice of your heap.
This, I suspect, is why Borland hasn't really mentioned the use of resources
as database objects, even though that's pretty much what they are. A sharp
person could certainly derive a virtual resource object in which the string
collection portion of the resource was kept on disk and intelligently buffered
to memory. If any of you have done this, or have seen a commercial or
shareware library that does this, I'd like to hear about it.
Nor is a resource really object-oriented database management. At most I would
call resources polymorphic ISAM. The research people haven't yet come to any
crisp consensus as to what OODBM really is, and until they do, I suspect I'm
going to keep my databases relational.


A Bin of Interchangeable Parts


Borland's intended use of resources is to add flexibility to applications by
allowing them to load user-interface objects from disk files. A menu bar is an
object, and can be stored in a resource under a name like BEGINNER. The same
file can contain two similar menu bar objects named INTERMED and ADVANCED. You
can at any time during an application's execution dispose of the current menu
bar, load any of the three menu bars from the resource file, and then insert
the new menu bar into the desktop. With almost no hassle at all, you're able
to present three different levels of menus on command: one for beginners, one
for intermediate users, and one for wizards. (I've seen the FastBack backup
utility do this sort of thing most effectively, though not using Turbo
Vision.)
A resource thus becomes a bin of interchangeable parts for an application.
Menu bars graded by expertise, dialog boxes with "extra" controls for expert
users or sysops, string lists for different human languages, all selectable
anytime at run time--it's a heady concept that I've only begun to explore.
By storing UI components in a resource and loading them as needed, you can
also exile the code that actually configures the resource off into a separate
module or utility. If you still have my HCALC.PAS source code somewhere, take
a look in the TMortgageApp.Init constructor. The bulk of the code in that
constructor builds two dialog boxes, which are then tethered to pointers and
held for further use. If those two dialog boxes had been stored out to a
resource, the code to create them would be unnecessary. All you'd need is a
couple of lines to open a resource and then load the dialog boxes from the
resource. Add to that the ability to exile all those convoluted Lisp-like
constructor calls that build menu bars, and you can get a lot of unnecessary
and mysterious-looking stuff out of your application entirely.

There is a catch--and the catch, of course, is that you have to build the
resources somewhere. You can lift the code that builds a resource out into an
application-specific custom resource-creator program, or you can use a
commercial resource editor. There are a number of these in the Windows
programming marketplace, but so far only one for Turbo Vision: Blaise
Computing's Turbo Vision Development Toolkit (TVDT).


Drag-and-Drop Resources


I commented on Blaise's TVDT shortly after it appeared as "nice to have." I'll
change the perspective slightly by saying that if you intend to use resources
with your TV apps, it becomes absolutely essential. In one evening's work, I
replaced all of my programmatically generated UI objects with resources
created in the TVDT, and exiled more than 370 lines of source code from my
mortgage-calculator application.
TVDT lets you define a UI element visually by pulling boxes and controls
around on the screen, and then saves the resource to a resource file when
you've decided it's the way you want it. A dialog box, for example, begins as
a plain rectangle. You can tug on the corner with the mouse cursor to change
its size and proportions. You can drop controls like buttons, static text, and
input lines onto the dialog box, then drag them around and change their labels
until they meet your needs.
You define a menu bar by filling out a little form something like a
spreadsheet, with the menu-item text, shortcut labels, shortcut-key codes, and
the numeric value of the command to be generated by that menu item. Once
you've filled out the form, you can "run" the menu to see how it will actually
look and operate on your application's screen.


Using Blaise Resources


The TVDT is more than just a resource generator. Blaise includes a few Pascal
units (with full source code, bravo!) that add a number of features to your
applications when added to your USES statement. One is BApp, a unit that must
be placed in the USES statement after Borland's App unit, and replaces some
(but not all) of the code in App. App has to be there, and be there first!
Among other things. BApp saves the underlying DOS text screen and screen mode
when you execute your TV app, and before returning to DOS, it courteously
restores what was there at invocation time.
But mostly, what the Blaise units do is handle TV resources efficiently and
quickly. It's not difficult. Here are some pointers on doing it right:
1. Before the application object is initialized but after all streamable
objects have been registered, call the resource's constructor to build the
resource on the heap and connect it with a specific resource-stream filename.
The main program block of my mortgage application now looks like Example 1(a).
Example 1: Using Blaise's Turbo Vision Development Toolkit: (a) Main program
block; (b) working with a resource; (c) executing the dialog box; (d) calling
a destructor.

 (a)

 BEGIN
 RegisterAllTypes;
 ResFile. Init(New (PResStream,
 Init ('MORTGAGE.BRS',
 stOpenRead, 1024)));
 MortCalc. Init;
 MortCalc. Run;
 MortCalc. Done;
 ResFile. Done;
 END.

 (b)

 ExtraRangeDialog :=
 PDialog
 (bAppResFile.Get ('EXTRA_RANGE'));

 (c)

 Control :=
 DeskTop^.ExecView(ExtraRangeDialog);

 (d)

 Dispose (ExtraRangeDialog,Done):

2. At the point where you want to read a chosen resource into memory for use,
use the resource's Get method and request the resource by name, as in Example
1(b).
Here, the string key by which the resource object locates the dialog box in
its stream is EXTRA_RANGE. (Blaise always puts the name of a resource in upper
case. This is just a custom; there's nothing compulsory about it.)
3. Execute the dialog box (or whatever sort of object) the same way you would
an object of that class constructed programmatically in your application; see
Example 1(c).
4. After you're through using the object, get it out of memory by calling its
destructor through Dispose, as in Example 1(d).
That's nearly all there is to it. The major difference in usage between dialog
boxes constructed programmatically and dialog boxes read from a resource file
is that when you can read them from the resource file at any time, you don't
need to keep them hanging around on the heap, taking up memory you could use
for other things. Pull 'em in, give 'em their default values (if any), execute
'em, pull out the user's responses, and then dispose of them.
The same general mechanism is used for menu bars, string lists, or other UI
objects.


The Command Constant Problem


The TVDT resource editor has no way of examining your source code or compiled
units. This leads to two logistical problems, one of which Blaise solved
cleverly and another that they didn't solve at all.

Bad news first. When you define a menu bar using the resource editor, you must
put the numeric literal value of all the commands you want the various menu
items to issue when selected. In nearly every case, you'll define a command as
a simple numeric constant somewhere in your application, for example,
cmCloseAll=197;.
This definition isn't available to the resource editor. You must put the
literal 197 on the line defining the menu item that issues the command when
selected.
The problem here is that you're defining this command in two places: in your
source and in the resource editor. If you change one and not the other, your
menu may think it is issuing a CloseAll command and then issue a
PrintWindowSummary command instead. Pray that this little item doesn't drive
you nuts, like it drove me!
There's no easy answer to this problem, short of forcing the resource editor
to parse Turbo Pascal source code. That's a tall order, and I don't expect to
see it. I will, however, rejoice if I do.


Shoehorning


The other problem is that the resource editor can only assume the presence of
the standard Borland controls like TButton, TInputLine, TCheckBoxes, and so on
that everyone who owns Turbo Pascal 6.0 already has. It can't read source code
or compiled units, so it can't take into account custom controls that you buy
from others or write for yourself.
A good example is Allen Bauer's FInput formatted line-input control that I
used in HCALC.PAS. I wanted to use the TFinputLine class from within the
resource editor, but the editor had no way to know that TFInputLine existed.
What to do?
Blaise solved this one cleverly, with a technique they call "shoehorning." The
idea is to "save room" in a dialog box for a custom control by laying out a
standard Borland control in the same spot to hold its place. After you've
loaded the dialog box into your application, you instantiate your desired
custom control. You then call a special Blaise-supplied function that
substitutes a pointer to the custom control for the dialog box's original
pointer to the standard Borland control. This function, bShoeHorn, does the
actual shoehorning. A method from my mortgage-calculator program demonstrates
this process, and is given in Listing One.
The Blaise documentation is a little sparse, and it doesn't detail what
assumptions are made about the custom control and what it can and should not
do. I didn't push my luck , and I think a rule of thumb might be that you
should only shoehorn a descendent of a standard control into the standard
control's place in a dialog box. In other words, don't try to shoehorn a fancy
input line into the space held by a TButton. I've gotten into enough trouble
wildly polymorphing to suspect it's not quite as exact a science as we've been
led to believe.


No Shortage of Vision


There's a number of other useful features in TVDT, including a special "beta
test" version of BApp that builds some debugging features into your app that
can be stripped out later on, simply by removing the BetabApp unit name from
your USES statement. Elegant! All in all, it's a terrific package and a
must-have if you're doing any serious development in Turbo Vision.
You probably know by now, but Borland has just announced Borland Pascal 7.0,
replacing the second-longest-lived version of the compiler. (As best I can
tell, Turbo Pascal 3.0 lived the longest.) I've only begun to look it over,
and I'll have more to say in an upcoming column. But it's fair to say you
won't be disappointed.
I've been sticking with Turbo Vision, in part because it's a complicated
subject that nobody else seems to be talking about at all, and in part because
once I set it aside, I don't expect to go back to it for awhile. There's a new
Paradox Engine that I'm itching to experiment with and tell you about, and
then, with some trepidation, I think I may move on to Windows programming for
another long spell.
It used to be I could discuss six topics in one column. Now it takes six
columns to do one topic. If programming didn't accomplish so much, I'd be
complaining a lot more about how complicated it's become.


Products Mentioned


SoftSource 6285 Escondido Drive El Paso, TX 79912 915-584-7670
Turbo Vision Development Toolkit 2.0 Blaise Computing Inc. 819 Bancroft Way
Berkeley, CA 94710 510-540-5441 $169.00
_STRUCTURED PROGRAMMING_
by Jeff Duntemann


[LISTING ONE]

PROCEDURE TMortgageView.ExtraPrincipalRange;

VAR
 ExtraRangeDialog : PDialog;
 ExtraPrincipalData : ExtraPrincipalRangeDialogData;
 FromPaymentLine,
 ToPaymentLine,
 DollarsInputLine : PFinputLine;
 R : TRect;
 Control : Word;
 View : PView;

BEGIN
 { Instantiate the resource-based EXTRA_RANGE dialog box from MORTGAGE.BRS:}
 ExtraRangeDialog := PDialog(bAppResFile.Get('EXTRA_RANGE'));

 { Create and shoehorn the three FInputLine controls: }
 R.Assign(0,0,0,0);
 DollarsInputLine := New(PFinputLine,Init(R,8,DRealSet,DReal,2));
 View := bShoeHorn(ExtraRangeDialog,DollarsInputLine);

 ToPaymentLine := New(PFInputLine,Init(R,3,DUnSignedSet,DInteger,0));
 View := bShoeHorn(ExtraRangeDialog,ToPaymentLine);


 FromPaymentLine := New(PFInputLine,Init(R,3,DUnSignedSet,DInteger,0));
 View := bShoeHorn(ExtraRangeDialog,FromPaymentLine);

 { Set the default values for the dialog through SetData: }
 ExtraPrincipalData.FromPaymentNumber := 0;
 ExtraPrincipalData.ToPaymentNumber := 0;
 ExtraPrincipalData.ExtraDollars := 0.00;
 ExtraRangeDialog^.SetData(ExtraPrincipalData);
 Control := Desktop^.ExecView(ExtraRangeDialog);

 IF Control <> cmCancel THEN { Update the active mortgage window: }
 BEGIN
 { Get data from the extra principal dialog: }
 ExtraRangeDialog^.GetData(ExtraPrincipalData);
 WorkingBox^.Show;
 WITH ExtraPrincipalData DO
 Mortgage.RangeExtraPrincipal(FromPaymentNumber,
 ToPaymentNumber,
 ExtraDollars);
 WorkingBox^.Hide;
 Redraw; { Redraw the mortgage window }
 END;
 Dispose(ExtraRangeDialog,Done);
END;





































December, 1992
GRAPHICS PROGRAMMING


Moving, Faster Lines, and Page Flipping


 This article contains the following executables: XSHRP21.ZIP


Michael Abrash


As I write this, the wife, the kid, and I are in the throes of yet another
lightning-quick transcontinental move, this time to Redmond to work for You
Know Who. Moving is never fun, but what makes it worse for us is the pets.
Getting them into kennels and to the airport is hard; there's always the
possibility that they might not be allowed to fly because of the weather; and,
worst of all, they might not make it. Animals don't usually end up injured or
dead, but it does happen.
In a (not notably successful) effort to cheer me up about the prospect of
shipping my animals, a friend told me the following story, which he swears
actually happened to a friend of his. I don't know--to me, it has the sound of
an urban legend, which is to say it makes a good story, but you can never
track down the person it really happened to; it's always a friend of a friend.
But maybe it is true, and anyway, it's a good story.
This friend of a friend (henceforth referred to as FOF), worked in an
air-freight terminal. Consequently, he handled a lot of animals, which was
fine by him, because he liked animals; in fact, he had quite a few cats at
home. You can imagine his dismay when, one day, he took a kennel off the plane
to find that the cat it carried was quite thoroughly dead. (No, it wasn't
resting; this cat was bloody deceased.)
FOF knew how upset the owner would be, and came up with a plan to make
everything better. At home, he had a cat of the same size, shape, and
markings. He would substitute that cat, and since all cats treat all humans
with equal disdain, the owner would never know the difference, and would never
suffer the trauma of the loss of her cat. So FOF drove home, got his cat, put
it in the kennel, and waited for the owner to show up--at which point, she
took one look at the kennel and said, "This isn't my cat. My cat is dead."
As it turned out, she had shipped her recently deceased feline home to be
buried. History does not record how FOF dug himself out of this one.
Okay, but what's the point? The point is, if it isn't broken, don't fix it.
And if it is broken, maybe that's all right, too. Which brings us, neat as a
pin, to the topic of drawing lines in a serious hurry.


Fast Run-length Slice Line Drawing


Last month, we examined the principles of run-length slice line drawing, which
draws lines a run rather than a pixel at a time, a run being a series of
pixels along the major (longer) axis. I concluded by promising a fast
assembler version for this month. Listing One (page 159) is the promised code,
in a form that's plug-compatible with the C code from last month.
Your first question is likely to be the following: Just how fast is Listing
One? Is it optimized to the hilt, or just pretty fast? The quick answer is:
It's fast. Listing One draws lines at a rate of nearly 1 million pixels per
second on my 486/33, and is capable of still faster drawing, as I'll discuss
shortly. (The heavily optimized AutoCAD line-drawing code that I mentioned
last month drew 150,000 pixels per second on an EGA in a 386/16, and I thought
I had died and gone to Heaven. Such is progress.) The full answer is a more
complicated one, and ties in to the principle that if it is broken, maybe
that's okay--and to the principle of looking before you leap, also known as
profiling before you optimize.
When I went to speed up run-length slice lines, I initially manually converted
the C code from last month into assembler. Then I streamlined the register
usage and used REP STOS wherever possible. Listing One is that code. At that
point, line drawing was surely faster, although I didn't know exactly how much
faster. Equally surely, there were significant optimizations yet to be made,
and I was itching to get on to them, for they were a lot more interesting than
a basic C-to-assembler port.
Ego intervened at this point, however. I wanted to know how much of a speed-up
I had already gotten, so I timed the performance of the C code vs. the
assembler code. To my horror, I found that I had not gotten even a two-times
improvement! I couldn't understand how that could be--the C code was decidedly
unoptimized--until I hit on the idea of measuring the maximum memory speed of
the VGA to which I was drawing.
Bingo. The Paradise VGA in my 486/33 is fast for a single display-memory
write, because it buffers the data, lets the CPU go on its merry way, and
finishes the write when display memory is ready. However, the maximum rate at
which data can be written to the adapter turns out to be no more than one byte
every microsecond. Put another way, you can only write one byte to this
adapter every 33 clock cycles on a 486/33. Therefore, no matter how fast I
made the line-drawing code, it could never draw more than 1,000,000 pixels per
second in 256-color mode in my system. The C code was already drawing at about
half that rate, so the potential speed-up for the assembler code was limited
to a maximum of two times, which is pretty close to what Listing One did, in
fact, achieve. When I compared the C and assembler implementations drawing to
normal system (nondisplay) memory, I found that the assembler code was
actually four times as fast as the C code.
In fact, Listing One draws lines at about 92 percent of the maximum possible
rate in my system--that is, it draws very nearly as fast as the VGA hardware
will allow. All the optimization in the world would get me less than 10
percent faster line drawing--and that only if I eliminated all overhead, an
unlikely proposition at best. The code isn't fully optimized, but so what?
Now it's true that faster line-drawing code would likely be more beneficial on
faster VGAs, especially local-bus VGAs, and in slower systems. For that
reason, I'll list a variety of potential optimizations to Listing One. On the
other hand, it's also true that Listing Oneis capable of drawing lines at a
rate of 2.2 million pixels per second on a 486/ 33, given fast enough VGA
memory, so it should be able to drive almost any non-local-bus VGA at nearly
full speed. In short, Listing One is very fast, and, in many systems, further
optimization is basically a waste of time.
Profile before you optimize.


Further Optimizations


Following is a quick tour of some of the many possible further optimizations
to Listing One.
The run-handling loops could be unrolled more than the current two times.
However, bear in mind that a two-times unrolling gets more than half the
maximum unrolling benefit with less overhead than a more heavily unrolled
loop.
BX could be freed up in the Y-major code by breaking out separate loops for X
advances of 1 and -1. DX could be freed up by using AH as the counter for the
run loops, although this would limit the maximum line length that could be
handled. The freed registers could be used to keep more of the whole-step and
error variables in registers. Alternatively, the freed registers could be used
to implement more esoteric approaches like unrolling the Y-major inner loop;
such unrolling could take advantage of the knowledge that only two run lengths
are possible for any given line. Strangely enough, on the 486 it might also be
worth unrolling the X-major inner loop, which consists of REP STOSB, because
of the slow start-up time of REP relative to the speed of branching on that
processor.
Special code could be implemented for lines with integral slopes, because all
runs are exactly the same length in such lines. Also, the X-major code could
try to write an aligned word at a time to display memory whenever possible;
this would improve the maximum possible performance on some 16-bit VGAs.
One weakness of Listing One is thatfor lines with slopes between 0.5 and 2,
the average run length is less than two, rendering run-length slicing
ineffective. This can be remedied by viewing lines in that range as being
composed of diagonal, rather than horizontal or vertical runs. I haven't space
to discuss this, but it's not very complicated, and it guarantees a minimum
run length of 2. That renders run drawing considerably more efficient, and
makes techniques such as unrolling the inner run-drawing loops more
attractive.
Finally, be aware that run-length slice drawing is best for long lines,
because it has more and slower setup than standard Bresenham's, including a
divide. Run-length slice is great for 100-pixel lines, but not necessarily for
20-pixel lines, and it's a sure thing that it's not terrific for 3-pixel
lines. Both approaches will work, but if line-drawing performance is critical,
whether you'll want to use run-length slice or standard Bresenham's depends on
the typical lengths of the lines you'll be drawing. For lines of widely
varying lengths, you might want to implement both approaches, and choose the
best one for each line, depending on the line length--assuming, of course,
that your display memory is fast enough and your application demanding enough
to make that level of optimization worthwhile.


An Interesting Twist on Page Flipping


I've spent a fair amount of time exploring various ways to do animation. (See,
for example, my July, August, and September 1991 DDJ columns, as well as those
in the January through April 1992 issues.) I thought I had pegged all the
possible ways to do animation exclusive-ORing; simply drawing and erasing
objects; drawing objects with a blank fringe to erase them at their old
locations as they're drawn; page flipping; and, finally, drawing to local
memory and copying the dirty (modified) rectangles to the screen.
To my surprise, someone threw me an interesting and useful twist on animation
the other day, a cross between page flipping and dirty-rectangle animation.
That someone was Serge Mathieu of Concepteva Inc., in Rosemere, Quebec, who
informed me that he designs everything "from a game 'point de vue'."
In normal page flipping, you display one page while you update the other page.
Then you display the new page while you update the other. This works fine, but
the need to keep two pages current can make for a lot of bookkeeping and
possibly extra drawing, especially in applications where only some of the
objects are redrawn each time.
Serge didn't care to do all that bookkeeping in his animation applications, so
he came up with the following approach (which I've reworded, amplified and
slightly modified):
1. Set the start address to display page 0.
2. Draw to page 1.
3. Set the start address to display page 1 (the newly drawn page), then wait
for the leading edge of vertical sync, at which point the page has flipped and
it's safe to modify page 0.
4. Copy, via the latches, from page 1 to page 0 the areas that changed from
the last screen to the current one.
5. Set the start address to display page 0, which is now identical to page 1,
then wait for the leading edge of vertical sync, at which point the page has
flipped and it's safe to modify page 1.
6. Go to step 2.

The great benefit of Serge's approach is that the only page that is ever
actually drawn to (as opposed to block-copied to) is page 1. Only one page
needs to be maintained, and the complications of maintaining two separate
pages vanish entirely. The performance of Serge's approach may be better or
worse than standard page flipping, depending on whether a lot of extra work is
required to maintain two pages or not. My guess is that Serge's approach will
usually be slower, owing to the considerable amount of display-memory copying
involved, and also to the double page-flip per frame. There's no doubt,
however, that Serge's approach is simpler, and the resultant display quality
is every bit as good as standard page flipping. Given page flipping's fair
degree of complication, this approach is a valuable tool, especially for
less-experienced animation programmers.
An interesting variation on Serge's approach doesn't page flip or wait for
vertical sync:
1. Set the start address to display page 0.
2. Draw to page 1.
3. Copy, via the latches, the areas that changed from the last screen to the
current one from page 1 to page 0.
4. Go to step 2.
This approach totally eliminates page flipping, which can consume a great deal
of time. The downside is that images may shear for one frame if they're only
partially copied when the raster beam reaches them. This approach is basically
a standard dirty-rectangle approach, except that the drawing buffer is stored
in display memory, rather than in system memory. Whether this technique is
faster than drawing to system memory depends on whether the benefit you get
from the VGA's hardware, such as the Bit Mask, the ALUs, and especially the
latches (for copying the dirty rectangles) is sufficient to outweigh the extra
display-memory accesses involved in drawing, since display memory is
notoriously slow.
Finally, I'd like to point out that in any scheme that involves changing the
display-memory start address, a clever trick can potentially reduce the time
spent waiting for pages to flip. Normally, it's necessary to wait for display
enable to be active, then set the two start address registers, and finally
wait for vertical sync to be active, so you know the new start address has
taken effect. The start-address registers must never be set around the time
vertical sync is active (the new start address is accepted at either the start
or end of vertical sync on the EGAs and VGAs I'm familiar with), because it
would then be possible to load a half-changed start address (one register
loaded, the other not yet loaded), and the screen would jump for a frame.
Avoiding this condition is the motivation for waiting for display enable,
because display enable is active only when vertical sync is not active and
will not become active for a long while.
Suppose, however, that you arrange your page start addresses so that they both
have a low-byte value of 0 (page 0 starts at 0000h, and page 1 starts at
8000h, for example). Page flipping can then be done simply by setting the new
high byte of the start address, then waiting for the leading edge of vertical
sync. This eliminates the need to wait for display enable (the two bytes of
the start address can never be mismatched); page flipping will often involve
less waiting, because display enable becomes inactive long before vertical
sync becomes active. Using the above approach reclaims all the time between
the end of display enable and the start of vertical sync for doing useful
work. (The steps I've given for Serge's animation approach assume that the
single-byte approach is in use; that's why display enable is never waited
for.)


Thanks


I took another bundle of reader contributions from the X-Sharp careware over
to the Vermont Association for the Blind the other day. They were very
grateful. Thanks to all of you who have helped so far.
_GRAPHICS PROGRAMMING_
by Michael Abrash


[LISTING ONE]

; Fast run length slice line drawing implementation for mode 0x13, the VGA's
; 320x200 256-color mode.
; Draws a line between the specified endpoints in color Color.
; C near-callable as:
; void LineDraw(int XStart, int YStart, int XEnd, int YEnd, int Color)
; Tested with TASM 3.0.

SCREEN_WIDTH equ 320
SCREEN_SEGMENT equ 0a000h
 .model small
 .code

; Parameters to call.
parms struc
 dw ? ;pushed BP
 dw ? ;pushed return address
XStart dw ? ;X start coordinate of line
YStart dw ? ;Y start coordinate of line
XEnd dw ? ;X end coordinate of line
YEnd dw ? ;Y end coordinate of line
Color db ? ;color in which to draw line
 db ? ;dummy byte because Color is really a word
parms ends

; Local variables.
AdjUp equ -2 ;error term adjust up on each advance
AdjDown equ -4 ;error term adjust down when error term turns over
WholeStep equ -6 ;minimum run length
XAdvance equ -8 ;1 or -1, for direction in which X advances
LOCAL_SIZE equ 8
 public _LineDraw
_LineDraw proc near
 cld
 push bp ;preserve caller's stack frame
 mov bp,sp ;point to our stack frame
 sub sp,LOCAL_SIZE ;allocate space for local variables
 push si ;preserve C register variables
 push di

 push ds ;preserve caller's DS
; We'll draw top to bottom, to reduce the number of cases we have to handle,
; and to make lines between the same endpoints always draw the same pixels.
 mov ax,[bp].YStart
 cmp ax,[bp].YEnd
 jle LineIsTopToBottom
 xchg [bp].YEnd,ax ;swap endpoints
 mov [bp].YStart,ax
 mov bx,[bp].XStart
 xchg [bp].XEnd,bx
 mov [bp].XStart,bx
LineIsTopToBottom:
; Point DI to the first pixel to draw.
 mov dx,SCREEN_WIDTH
 mul dx ;YStart * SCREEN_WIDTH
 mov si,[bp].XStart
 mov di,si
 add di,ax ;DI = YStart * SCREEN_WIDTH + XStart
 ; = offset of initial pixel
; Figure out how far we're going vertically (guaranteed to be positive).
 mov cx,[bp].YEnd
 sub cx,[bp].YStart ;CX = YDelta
; Figure out whether we're going left or right, and how far we're going
; horizontally. In the process, special-case vertical lines, for speed and
; to avoid nasty boundary conditions and division by 0.
 mov dx,[bp].XEnd
 sub dx,si ;XDelta
 jnz NotVerticalLine ;XDelta == 0 means vertical line
 ;it is a vertical line
 ;yes, special case vertical line
 mov ax,SCREEN_SEGMENT
 mov ds,ax ;point DS:DI to the first byte to draw
 mov al,[bp].Color
VLoop:
 mov [di],al
 add di,SCREEN_WIDTH
 dec cx
 jns VLoop
 jmp Done
; Special-case code for horizontal lines.
 align 2
IsHorizontalLine:
 mov ax,SCREEN_SEGMENT
 mov es,ax ;point ES:DI to the first byte to draw
 mov al,[bp].Color
 mov ah,al ;duplicate in high byte for word access
 and bx,bx ;left to right?
 jns DirSet ;yes
 sub di,dx ;currently right to left, point to left end so we
 ; can go left to right (avoids unpleasantness with
 ; right to left REP STOSW)
DirSet:
 mov cx,dx
 inc cx ;# of pixels to draw
 shr cx,1 ;# of words to draw
 rep stosw ;do as many words as possible
 adc cx,cx
 rep stosb ;do the odd byte, if there is one
 jmp Done

; Special-case code for diagonal lines.
 align 2
IsDiagonalLine:
 mov ax,SCREEN_SEGMENT
 mov ds,ax ;point DS:DI to the first byte to draw
 mov al,[bp].Color
 add bx,SCREEN_WIDTH ;advance distance from one pixel to next
DLoop:
 mov [di],al
 add di,bx
 dec cx
 jns DLoop
 jmp Done

 align 2
NotVerticalLine:
 mov bx,1 ;assume left to right, so XAdvance = 1
 ;***leaves flags unchanged***
 jns LeftToRight ;left to right, all set
 neg bx ;right to left, so XAdvance = -1
 neg dx ;XDelta
LeftToRight:
; Special-case horizontal lines.
 and cx,cx ;YDelta == 0?
 jz IsHorizontalLine ;yes
; Special-case diagonal lines.
 cmp cx,dx ;YDelta == XDelta?
 jz IsDiagonalLine ;yes
; Determine whether the line is X or Y major, and handle accordingly.
 cmp dx,cx
 jae XMajor
 jmp YMajor
; X-major (more horizontal than vertical) line.
 align 2
XMajor:
 mov ax,SCREEN_SEGMENT
 mov es,ax ;point ES:DI to the first byte to draw
 and bx,bx ;left to right?
 jns DFSet ;yes, CLD is already set
 std ;right to left, so draw backwards
DFSet:
 mov ax,dx ;XDelta
 sub dx,dx ;prepare for division
 div cx ;AX = XDelta/YDelta
 ; (minimum # of pixels in a run in this line)
 ;DX = XDelta % YDelta
 mov bx,dx ;error term adjust each time Y steps by 1;
 add bx,bx ; used to tell when one extra pixel should be
 mov [bp].AdjUp,bx ; drawn as part of a run, to account for
 ; fractional steps along the X axis per
 ; 1-pixel steps along Y
 mov si,cx ;error term adjust when the error term turns
 add si,si ; over, used to factor out the X step made at
 mov [bp].AdjDown,si ; that time
; Initial error term; reflects an initial step of 0.5 along the Y axis.
 sub dx,si ;(XDelta % YDelta) - (YDelta * 2)
 ;DX = initial error term
; The initial and last runs are partial, because Y advances only 0.5 for
; these runs, rather than 1. Divide one full run, plus the initial pixel,

; between the initial and last runs.
 mov si,cx ;SI = YDelta
 mov cx,ax ;whole step (minimum run length)
 shr cx,1
 inc cx ;initial pixel count = (whole step / 2) + 1;
 ; (may be adjusted later). This is also the
 ; final run pixel count
 push cx ;remember final run pixel count for later
; If the basic run length is even and there's no fractional advance, we have
; one pixel that could go to either the initial or last partial run, which
; we'll arbitrarily allocate to the last run.
; If there is an odd number of pixels per run, we have one pixel that can't
; be allocated to either the initial or last partial run, so we'll add 0.5 to
; the error term so this pixel will be handled by the normal full-run loop.
 add dx,si ;assume odd length, add YDelta to error term
 ; (add 0.5 of a pixel to the error term)
 test al,1 ;is run length even?
 jnz XMajorAdjustDone ;no, already did work for odd case, all set
 sub dx,si ;length is even, undo odd stuff we just did
 and bx,bx ;is the adjust up equal to 0?
 jnz XMajorAdjustDone ;no (don't need to check for odd length,
 ; because of the above test)
 dec cx ;both conditions met; make initial run 1
 ; shorter
XMajorAdjustDone:
 mov [bp].WholeStep,ax ;whole step (minimum run length)
 mov al,[bp].Color ;AL = drawing color
; Draw the first, partial run of pixels.
 rep stosb ;draw the final run
 add di,SCREEN_WIDTH ;advance along the minor axis (Y)
; Draw all full runs.
 cmp si,1 ;are there more than 2 scans, so there are
 ; some full runs? (SI = # scans - 1)
 jna XMajorDrawLast ;no, no full runs
 dec dx ;adjust error term by -1 so we can use
 ; carry test
 shr si,1 ;convert from scan to scan-pair count
 jnc XMajorFullRunsOddEntry ;if there is an odd number of scans,
 ; do the odd scan now
XMajorFullRunsLoop:
 mov cx,[bp].WholeStep ;run is at least this long
 add dx,bx ;advance the error term and add an extra
 jnc XMajorNoExtra ; pixel if the error term so indicates
 inc cx ;one extra pixel in run
 sub dx,[bp].AdjDown ;reset the error term
XMajorNoExtra:
 rep stosb ;draw this scan line's run
 add di,SCREEN_WIDTH ;advance along the minor axis (Y)
XMajorFullRunsOddEntry: ;enter loop here if there is an odd number
 ; of full runs
 mov cx,[bp].WholeStep ;run is at least this long
 add dx,bx ;advance the error term and add an extra
 jnc XMajorNoExtra2 ; pixel if the error term so indicates
 inc cx ;one extra pixel in run
 sub dx,[bp].AdjDown ;reset the error term
XMajorNoExtra2:
 rep stosb ;draw this scan line's run
 add di,SCREEN_WIDTH ;advance along the minor axis (Y)


 dec si
 jnz XMajorFullRunsLoop
; Draw the final run of pixels.
XMajorDrawLast:
 pop cx ;get back the final run pixel length
 rep stosb ;draw the final run

 cld ;restore normal direction flag
 jmp Done
; Y-major (more vertical than horizontal) line.
 align 2
YMajor:
 mov [bp].XAdvance,bx ;remember which way X advances
 mov ax,SCREEN_SEGMENT
 mov ds,ax ;point DS:DI to the first byte to draw
 mov ax,cx ;YDelta
 mov cx,dx ;XDelta
 sub dx,dx ;prepare for division
 div cx ;AX = YDelta/XDelta
 ; (minimum # of pixels in a run in this line)
 ;DX = YDelta % XDelta
 mov bx,dx ;error term adjust each time X steps by 1;
 add bx,bx ; used to tell when one extra pixel should be
 mov [bp].AdjUp,bx ; drawn as part of a run, to account for
 ; fractional steps along the Y axis per
 ; 1-pixel steps along X
 mov si,cx ;error term adjust when the error term turns
 add si,si ; over, used to factor out the Y step made at
 mov [bp].AdjDown,si ; that time

; Initial error term; reflects an initial step of 0.5 along the X axis.
 sub dx,si ;(YDelta % XDelta) - (XDelta * 2)
 ;DX = initial error term
; The initial and last runs are partial, because X advances only 0.5 for
; these runs, rather than 1. Divide one full run, plus the initial pixel,
; between the initial and last runs.
 mov si,cx ;SI = XDelta
 mov cx,ax ;whole step (minimum run length)
 shr cx,1
 inc cx ;initial pixel count = (whole step / 2) + 1;
 ; (may be adjusted later)
 push cx ;remember final run pixel count for later

; If the basic run length is even and there's no fractional advance, we have
; one pixel that could go to either the initial or last partial run, which
; we'll arbitrarily allocate to the last run.
; If there is an odd number of pixels per run, we have one pixel that can't
; be allocated to either the initial or last partial run, so we'll add 0.5 to
; the error term so this pixel will be handled by the normal full-run loop.
 add dx,si ;assume odd length, add XDelta to error term
 test al,1 ;is run length even?
 jnz YMajorAdjustDone ;no, already did work for odd case, all set
 sub dx,si ;length is even, undo odd stuff we just did
 and bx,bx ;is the adjust up equal to 0?
 jnz YMajorAdjustDone ;no (don't need to check for odd length,
 ; because of the above test)
 dec cx ;both conditions met; make initial run 1
 ; shorter
YMajorAdjustDone:

 mov [bp].WholeStep,ax ;whole step (minimum run length)
 mov al,[bp].Color ;AL = drawing color
 mov bx,[bp].XAdvance ;which way X advances
; Draw the first, partial run of pixels.
YMajorFirstLoop:
 mov [di],al ;draw the pixel
 add di,SCREEN_WIDTH ;advance along the major axis (Y)
 dec cx
 jnz YMajorFirstLoop
 add di,bx ;advance along the minor axis (X)
; Draw all full runs.
 cmp si,1 ;# of full runs. Are there more than 2
 ; columns, so there are some full runs?
 ; (SI = # columns - 1)
 jna YMajorDrawLast ;no, no full runs
 dec dx ;adjust error term by -1 so we can use
 ; carry test
 shr si,1 ;convert from column to column-pair count
 jnc YMajorFullRunsOddEntry ;if there is an odd number of
 ; columns, do the odd column now
YMajorFullRunsLoop:
 mov cx,[bp].WholeStep ;run is at least this long
 add dx,[bp].AdjUp ;advance the error term and add an extra
 jnc YMajorNoExtra ; pixel if the error term so indicates
 inc cx ;one extra pixel in run
 sub dx,[bp].AdjDown ;reset the error term
YMajorNoExtra:
 ;draw the run
YMajorRunLoop:
 mov [di],al ;draw the pixel
 add di,SCREEN_WIDTH ;advance along the major axis (Y)
 dec cx
 jnz YMajorRunLoop
 add di,bx ;advance along the minor axis (X)
YMajorFullRunsOddEntry: ;enter loop here if there is an odd number
 ; of full runs
 mov cx,[bp].WholeStep ;run is at least this long
 add dx,[bp].AdjUp ;advance the error term and add an extra
 jnc YMajorNoExtra2 ; pixel if the error term so indicates
 inc cx ;one extra pixel in run
 sub dx,[bp].AdjDown ;reset the error term
YMajorNoExtra2:
 ;draw the run
YMajorRunLoop2:
 mov [di],al ;draw the pixel
 add di,SCREEN_WIDTH ;advance along the major axis (Y)
 dec cx
 jnz YMajorRunLoop2
 add di,bx ;advance along the minor axis (X)

 dec si
 jnz YMajorFullRunsLoop
; Draw the final run of pixels.
YMajorDrawLast:
 pop cx ;get back the final run pixel length
YMajorLastLoop:
 mov [di],al ;draw the pixel
 add di,SCREEN_WIDTH ;advance along the major axis (Y)
 dec cx

 jnz YMajorLastLoop
Done:
 pop ds ;restore caller's DS
 pop di
 pop si ;restore C register variables
 mov sp,bp ;deallocate local variables
 pop bp ;restore caller's stack frame
 ret
_LineDraw endp
 end




















































December, 1992
PROGRAMMER'S BOOKSHELF


Wake Up and Smell the Working Set




Andrew Schulman


For years, microcomputer programming has been dominated by small amounts of
memory. On the PC, for example, a lot of programming was oriented toward
working within or around the 640K barrier. Now in more and more places, these
barriers are lifting. With the widespread use of Windows Enhanced mode, the
standard microcomputer has megabytes of readily accessible memory, a larger
pool of virtual memory, and even a flat-memory model.
Yet, in the same way that a citizen of the former Soviet Union might still
hang on to a Lenin pin, programmers still cling to the old ways. In Windows
programming circles, for example, the little discussion of performance one
finds seems to be dominated by memory-management considerations that don't
make much sense anymore. Windows Enhanced mode, OS/2 2.0, and Win32/NT are all
demand-paged virtual-memory systems. Yet the majority of Windows programming
books are still filled with dire predictions of what will happen if you don't
keep your segments discardable, movable, and small.
The PC world isn't 640K anymore, and in more and more places it's not chopped
into 64K pieces anymore, either. It's really time to wake up, smell the
coffee, and throw out all the old baggage.
Oddly enough, though, we don't really need any "new ideas." In fact, with the
increasing popularity of flat-memory models and demand-paged virtual memory,
it's time to dust off your old college textbooks. Why? Because PC systems are
finally starting to resemble the way we were taught computers are supposed to
work!
Except that, rather than dusting off your old textbooks, I would suggest
picking up a new one.
About halfway through writing an article on demand-paged virtual memory in
Windows Enhanced mode for Microsoft Systems Journal, I realized that if the
article was going to have any substance at all, it would have to discuss (or
at least be based on some awareness of) virtual memory in general, not just
the way it happens to be implemented in one mode of one version of one
Microsoft product.
So I started going through my book collection, looking for background reading
on demand-paged virtual memory. Many of the books Ray Duncan and I have
reviewed in "Programmer's Bookshelf"--Dewar and Smosna's Microprocessors: A
Programmer's View (reviewed in DDJ, September 1990), Hennessy and Patterson's
Computer Architecture: A Quantitative Approach (October 1990), and Tanenbaum's
Modern Operating Systems (May and June 1992)--discuss virtual memory. There's
a ton of literature available on this subject.
The nicest discussion of the subject, though, and the most useful to a
programmer rather than a chip designer, was the 25-page section on virtual
memory I found in a book from 1990 that we somehow haven't reviewed here yet:
Harold Stone's High-Performance Computer Architecture. It may be a little
strange to examine this book now, particularly since Hennessy and Patterson's
book seems to have blown away everything else in this field. But I was
surprised to find Stone's discussion of virtual memory and other
topics--including cache memory, pipelining, and multiprocessors--gave me more
of what I as a programmer actually needed to know than Hennessy and
Patterson's wonderful, definitive work.
Why should a programmer care about this stuff in the first place? After all,
demand-paged virtual memory is supposed to be transparent! You can dereference
a pointer to access a page of memory, even if that page is currently on disk;
a page-fault handler within the operating system will take care of loading the
page without you being aware of it. Obviously, the presence of virtual memory
is no more relevant to your average programmer than is, say, the presence of
an instruction prefetch queue on the processor.
Unfortunately, disk access is several orders of magnitude slower than access
to main memory. Consequently, there is one overwhelming reason why programmers
must understand the workings of "transparent" virtual memory: performance.
The reason for most programmers to study virtual memory, then, is so they can
understand its performance implications for their software. Stone's
High-Performance Computer Architecture does a great job of drawing out just
these implications. Rather than merely describing how virtual memory works, he
presents a detailed performance model in terms understandable to applications
programmers as well as systems designers. The very simple "working set"
concept is key here, and is of course discussed in every other book on the
subject, but somehow Stone manages to convey this concept in a way that is
genuinely helpful rather than merely informative.
For a few years, programmers working with systems such as Windows and OS/2
were worrying themselves silly with rules about segment sizes and segment
attributes. One venerable Microsoft University lecturer pounded into the minds
of an entire generation of Windows programmers the need to, "Keep your
segments as small as possible, as discardable as possible, and as unlocked as
possible."
Well, in Windows Enhanced mode, in OS/2 2.0, and in Win32/NT, all this pretty
much goes out the window. What replaces these old, out-moded ideas of how to
get good performance? The even older notion of "working set." We had a period
of a few years in which we had to do memory management and the like in very
odd ways, and that period is now thankfully coming to a close.
"Working set" is really just the 90/10 rule expressed in a different way. The
90/10 rule states that you will spend 90 percent of your time working over 10
percent of your code. But it also states that 90 percent of the software's
running time occurs in only 10 percent of the code. This is the whole basis
for virtual memory: Potentially, a program can run at full speed with only 10
percent of itself--or whatever the working set is--loaded into memory at any
given time. Unlike that nasty segment stuff, the programmer does not specify
any of this in advance. The operating system "discovers" a program's working
set on-the-fly, through page faults.
As Stone shows, paged virtual memory depends on the fact that all programs
have reasonably sized working sets or "footprints;" that is, that all programs
can run for a while with only discrete, page-sized bits of themselves in
memory at any given time. All programs?! Well, that's the problem: A
virtual-memory operating system can't know in advance how all programs, or
even how one program, will behave. All it knows is the probable behavior of
the average program. The average program will behave well under virtual
memory.
Your program may not, however, if the way it accesses memory doesn't
correspond to the model that virtual memory is based on. The model is simple:
If your program accessed x[i] at time t, then it is very likely to refer to
x[i+1] or x[i- 1] at time t + 1.
What began as a simple statistical description of the behavior of programs has
now been turned into a prescription: Your software had better behave this way.
Hence the great relevance of the section on virtual memory in Stone's book,
such as his discussion of "Improving Program Locality," to programmers
interested in getting decent performance in Windows, OS/2, or any other
demand-paged system.
I've focused here on virtual memory, but this is just one section of
High-Performance Computer Architecture. The 100-page chapter on memory-system
design is actually largely devoted to a detailed analysis of cache memory:
cache analysis, cache writes, replacement policies, performance metrics, and
so on. A lot of the cache-memory discussion sounds just like the virtual
memory discussion, except for one small thing: Virtual memory involves hitting
the disk, and disks are very slow. This one point makes virtual memory and
cache memory fundamentally different. Other chapters in the book discuss
pipelining, numerics, vector computers, multiprocessing, and multiprocessor
algorithms. The chapters on multiprocessing are noteworthy for their sensible
position that, until the communication and synchronization overhead of
multiprocessing is reduced, multiprocessor systems are likely to involve just
a handful of processors, not the 1000-processor behemoths one might imagine.
I read the second edition of this book when it first came out in 1990 and,
frankly, I didn't get much out of it at the time. Yet, as I've tried to
indicate here, when I picked it up again in late 1992, much of it seemed
amazingly relevant to my daily work. Material that once would have seemed
unfortunately irrelevant to daily PC programming practice is becoming more
important every day. Why? Because the 32-bit Intel architecture and the
operating systems sold on top of it are becoming more and more like other
32-bit systems every day.
In fact, Intel might even come to regret the day it started pushing 32 bits,
because 32-bit code is portable to other architectures in a way that segmented
16-bit code never was. Well, that is pretty unlikely, but certainly PC
programmers will increasingly be able to benefit from the lessons learned on
other processors and other operating systems, and from textbooks such as
High-Performance Computer Architecture.





























December, 1992
OF INTEREST





The new Microsoft C/C++ Browser Toolkit lets you access and manipulate (.BSC)
files in the Microsoft C/C++ 7.0 browser database, the database containing
information about program symbols, including symbol references, function
calltrees, and definition tables. The toolkit's browser library (BSC.LIB)
contains functions that let you search for symbol matches, generate class
trees, translate decorated names, and obtain index ranges for an object. You
can also take advantage of the information in the database to develop custom
applications using the new APIs.
C/C++ 7.0 registered users can download the toolkit from the Microsoft
developer's forum on CompuServe or order it from Microsoft for $29.00. Reader
service no. 20.
Microsoft Corp. One Microsoft Way Redmond, WA 98052-6399 206-882-8080
Napier Graphics has released NAPCAD/3D, a three-dimensional computer-aided
design program for Windows that includes C++ source. NAPCAD supports lines,
cubic spline curves, circles, polygons, cones, cylinders, spheres, 3-D surface
patches, sweeps, and extrusions. Up to three viewports can be active at one
time, so you can view a model at different angles and sizes. There are
user-definable fonts and symbol libraries, and you can shade a model using up
to eight light sources. Models can be divided into up to 256 layers and can be
printed, either shaded or as wire-frame, to any Windows-supported graphics
printer.
NAPCAD was written using Borland's C++ and Object Windows, and the source code
is fully commented. The price is $99.00 and a math coprocessor is recommended.
Reader service no. 21.
Napier Graphics 3212 20th Street SE Auburn, WA 98002 800-800-1961
Andsor Research has released the Andsor Database Engine, a Windows add-on
library for adding advanced database-management capabilities to any program
through DLL function calls. You can replace many sections of an application,
such as reports, queries, and updating operations, with simple procedures
written in ADL, a database language. These procedures can be embedded in the
program, but are usually stored in the database itself, resulting in smaller,
simpler Windows programs which need contain only function calls to execute the
stored procedures. All programs that access the database can use the same
procedures. An interactive database-management utility is included for
creating, testing, and modifying the stored procedures without recompiling the
program.
A full set of traditional ISAM file operations is also provided. The
programmer controls the database with code and variables that reside in the
program and calls the DLL functions to perform one operation at a time.
DDJ spoke with Brent Eldstrom, president of Custom Software Solutions in
Prince Albert, Saskatchewan, who develops Windows applications for the
insurance industry. Brent needs to include many support files with his
applications, and by using the Andsor engine they can all be kept in a single
DOS file. "Because of this," he said, "the engine has excellent ease of
use--there's no opening and closing of files because DOS sees it as only one
file."
The engine can be used from C, Visual Basic, and all languages or front ends
with DLL interfaces. It sells for $149.00. Reader service no. 22.
Andsor Research Inc. 390 Bay Street, Suite 2000 Toronto, Ontario Canada M5H
2Y2 800-766-1141 or 416-245-8073
DAB is a new distributed application-builder toolset from Real-Time
Intelligent Systems. It provides automatic network-wide data exchange between
programs running on PCs connected by LANs and serial data links. DAB
facilitates parallel processing and building cooperatively used applications.
DAB establishes a multiprocessing environment in each PC: A DOS application
program runs in the foreground and the communications software runs
automatically in the background. The programs cooperate on a peer-to-peer
basis: DAB automatically encodes data records into messages, handles all
message transmission and routing between computers, and decodes messages into
data records for the recipient program. The communications protocol lets data
links be broken and connected again without data loss.
DAB identifies each computer via an electronic identification key that plugs
into the computer's parallel port. The data-distribution network is formed
automatically and user programs communicate via data-distribution nodes that
automatically relay data throughout the network to all programs that need it.
The price is $499.00 and includes callable libraries for Microsoft's C
compiler and Borland's C++ compiler. Reader service no. 23.
Real-Time Intelligent Systems Corp. 30 Sever Street Worcester, MA 01609
508-752-5567
New from Knowledge Garden is KPWin++, a facility that in conjunction with
KnowledgePro for Windows affords a high-level, object-oriented development
environment for C++. KPWin++ reads code written in Knowledge Garden's KPWin or
Revelation Technologies' Open-Insight environments and generates ANSI-standard
compilable C++ code.
KPWin includes over 300 high-level commands for manipulating Window objects,
data, text, colors, fonts, and images. You can access Windows multimedia and
pen functions via calls to the Windows API. There is access to DLLs at the C
pointer level, and DDE support and drag-and-drop functionality are included as
well. KPWin also affords list-handling commands, multiple inheritance, classes
and subclasses, text-file indexing and hypertext, and demons and backward
chaining.
Using the Microsoft C/C++ 7.0 compiler, you can modify the code, link to
third-party libraries, and compile to create high-performance executable
files. Support is also planned for compilers from Borland, Watcom, and
Zortech. KPWin++ costs $895.00, $695.00 for registered KPWin and OpenInsight
users. Reader service no. 24.
Knowledge Garden Inc. 12-8 Technology Drive Setauket, NY 11733 516-246-5400
Access Softek has announced 16 graphics import filters for Windows that plug
into any Windows application, allowing it to import graphics. The filters
translate in a one-step process directly into the Windows MetaFile or
QuickDraw for the Macintosh, unlike others, which translate first into an
intermediate format. As a result, those from Access Softek tend to be faster
and more accurate.
The interface used for graphics import is the Aldus interface, which contains
only three function calls. (This is the only graphics-import standard
currently in widespread use; the filters also, however, support the proposed
TWAIN API spec.) For more information on graphic import filters, see Evangelo
Prodromou's article, "Graphics Import Filters for Windows Applications" in the
July 1992 DDJ. Contact Access Softek for pricing. Reader service no. 25.
Access Softek 2550 9th Street, #206 Berkeley, CA 94710 510-848-0606
Sierra Systems has released version 3.00 of its C cross-compiler and
language-development system for Motorola 68000 microprocessors. The new
version includes a hole-compression optimization utility that operates as
follows: The linker determines on the first pass when a more compact
addressing mode can be used and passes this information to the assembler,
which makes a second pass and shrinks the appropriate address holes; the
application is then relinked.
Optimization techniques such as loop invariant removal, common subexpression
removal, global register allocation by coloring, and register scorecarding are
included, obviating the need for assembly language. Also new to this version
are: inclusion of the Phar Lap DOS-Extender, floating-point enhancements,
enhanced implementation of multiplication by constant, a string-copy function,
and improved function-call stack cleanup.
Sierra C includes an optimizing compiler, assembler, linker, absolute address
mapper, object librarian and source archiver, symbol-table listing utility,
object-code size utility, command driver, parallel downloader, serial
downloader, and runtime library. It costs $2000.00 for DOS or OS/2 and
$3500.00 for UNIX workstations. Reader service no. 26.
Sierra Systems 6728 Evergreen Avenue Oakland, CA 94611 800-776-4888 or
510-339-8200
The Video Electronics Standards Association (VESA) has ratified the VESA Local
Bus standard. This standard is designed to bring workstation-level performance
to the PC. The VL-Bus removes many bottlenecks, allowing peripherals to
operate at the system's native speed and enabling data transfer between
peripherals and the system at maximum speed.
The VL-Bus standard provides a low-cost, extendible, and portable local bus
design that lets systems and peripherals from different manufacturers work
together seamlessly. The standard is a chip-level specification as well as a
connector definition and is designed to work with many different processors,
including any 80x86, MIPS R4000, and the upcoming P5. Reader service no. 27.
VESA 2150 North First Street, Suite 440 San Jose, CA 95131-2020 408-435-0333
Fuzz-C is Byte Craft's new C preprocessor for fuzzy logic, a tool for
implementing fuzzy logic on embedded microprocessors. The preprocessor reduces
the learning curve for implementing control functions in fuzzy logic: It scans
the application source code for fuzzy logic membership and consequence
functions and translates them into C, leaving the rest of the source code
untouched. With Fuzz-C, C and fuzzy logic can call functions in each other and
share data variables.
Fuzz-C is priced at $149.00. Reader service no. 28.
Byte Craft Ltd. 421 King Street North Waterloo, Ontario Canada N2J 4E4
519-888-6911
The C EXECUTIVE real-time operating-system kernel from JMI will now be
included in the FUSION TCP/IP software from Network Research. (See the
December 1991 "Of Interest" column.) C EXECUTIVE offers a complete execution
environment for multitasking applications written in C. The package includes a
complete I/O subsystem for standard input/output, a set of device driver
written in C, and a complete ROMable runtime C library.
C EXECUTIVE offers an optional file system, CE-DOSFILE*, that replicates the
DOS file structure on external media and allows any microprocessor to read and
write DOS disks online. Also included is a system debugger, CEVIEW*, which
provides a flexible system-level debugging tool for testing on the target
system.
Contact JMI for pricing for the combined C EXECUTIVE and FUSION TCP/IP. Reader
service no. 29.
JMI Software Consultants Inc. P.O. Box 481 Spring House, PA 19477 215-628-0840
The Geodyssey Environmental GIS Research Grants Initiative is a $1 million
grant program from Autodesk and the Environmental Systems Research Institute
(ESRI). The program is intended to further the use of low-cost geographic
information systems (GIS) and CAD tools for environmental research and
management. The program is sponsored and administered by the International
Geographic Information Foundation, which will award 100 ArcCAD, AutoCAD, and
ArcView for Windows software packages and training to academicians and
researchers within the next year. ArcCAD is a desktop GIS tool that uses the
AutoCAD data model and includes full spatial-analysis capabilities and a
dBase-compatible attribute DBMS; ArcView is a geographic query and display
package for Windows.
For more information, contact Autodesk. Reader service no. 30.
Autodesk Inc. 2320 Marinship Way Sausalito, CA 94965 415-332-2344
C++VS from Perennial and UNIX System Laboratories is a verification suite for
evaluating a C++ compiler's conformance to the language standard. C++VS
consists of three main parts: a driver that manages automatic test-case
execution; test cases, including error and library tests; and templates for
developing user-specific tests. Over 20,000 test cases are included. A basic
license for the test suite costs $28,000.00. Reader service no. 31.
Perennial Inc. 4699 Old Ironsides Drive, Suite 210 Santa Clara, CA 95054
408-748-2900
Dashboard Software has released DASHBoard, a utility for Windows and OS/2
developers that displays the vital parameters of a program while it is
running. DASHBoard finds program bugs and helps uncover and correct
performance problems. The control panel lets you set up permanent displays of
crucial parameters and include a customized DASHBoard window in programs you
distribute. The utility allows you to examine each variable and track its
value by bringing up a DASHBoard window along with your program window. You
can display the data as text or a graph and examine how fast a variable
changes or display its average, maximum, or minimum value. These variables can
be accessed via DDE from any Windows application.
DASHBoard works with C, C++, and all languages that access DLLs. DASHBoard for
Windows costs $129.00. Reader service no. 32.
Dashboard Software 4 Louis Avenue Monsey, NY 10952
Prentice Hall has begun a new series called the Open Systems Library. The
series will focus on task-oriented solutions for UNIX programmers, system
developers, and system administrators. The first two titles are UNIX System V
Print Service Administration, edited by Sally Brownin , and UNIX System V NFS
Administration, edited by Debra Herman. The books cost $27.00 and $24.95,
respectively. Reader service no. 33.
Prentice Hall 113 Sylvan Avenue, Route 9W Englewood Cliffs, NJ 07632
201-592-2348
Now available from Virtual Technologies is the SENTINEL debugging environment
for UNIX. This tool supports runtime verification of pointer usage and dynamic
memory allocation in both C and C++. It traps memory errors, traces stack, and
reports the source file, function name, and line number of erroneous code. A
discretionary activation feature can enable or disable debug output at compile
or run time to avoid impacting performance. The debugger also aids in tracking
down and determining the cause of memory leaks.
SENTINEL costs $195.00 for Intel's 80x86, $395.00 for Sun environments, and
$495.00 for HP environments. Reader service no. 34.
Virtual Technologies Inc. 46030 Manekin Plaza, Suite 160 Dulles, VA 20166
703-430-9247
Windows Sockets APIs are in the news with two new releases from Distinct and
Frontier Technologies. Distinct's TCP/IP for Windows now includes the Windows
Sockets API. This standard API can be used by developers of shrink-wrapped
software, while those developing more technical products have access to
monitoring, debugging, data capture, and internal data structures. Also new to
this version is the ONC RPC/XDR toolkit, including features such as an RPC
server and an enhanced RPC client that run over UDP or TCP. A Windows RPCGEN
program allows you to generate XDR routines directly on the PC for
applications distributed over PC-only or heterogeneous networks. In addition
to SLIP protocol support, there is now the more secure Point to Point Protocol
for data transfer over serial lines.

Distinct TCP/IP supports Packet, NDIS, and ODI drivers. The price of the SDK
is $495.00. Reader service no. 35.
The Frontier developer's toolkit provides the sockets-library interface for
the Windows environment for Frontier's Super-TCP for Windows. This allows
custom applications to connect to a TCP/IP network, open connections, transmit
and receive data, and disconnect from the network easily. The toolkit includes
the following: a Windows TCP/IP kernel with a PING application; Borland C++
libraries; sample source code; and a copy of the Windows Sockets API technical
specification. The socket-interface DLL design takes advantage of Windows'
memory-management services, allows the interface to be used simultaneously by
several applications, and avoids taking up base memory.
Super TCP/IP conforms with NDIS and Packet drivers and costs $595.00. Reader
service no. 36.
Frontier Technologies Corp. 10201 N. Port Washington Road Mequon, WI 53092
414-241-4555
Distinct Corp. P.O. Box 3410 Saratoga, CA 95070-1410 408-741-0781

























































December, 1992
SWAINE'S FLAMES


Politically Speaking




Michael Swaine


The election's over, and what have we learned? First-time voters had the
opportunity to learn the physical pleasures of voting: the use of the large
muscles of the arm, the guillotine-like action of the voting machine, the
satisfying ka-chunk as they punch in their votes. They might wish that,
instead of punching holes next to the names of people they decide they can
tolerate, they could punch out the names of those they can't, but that's a
quibble. By and large, the user interface seems well suited to the task, or at
least to the state of mind that most voters bring to the task. Let's hope that
the voting machine is not replaced any time soon by the TV remote.
Speaking of TV remotes, we need to give some thought to the politics of these
devices before they take over our lives completely. Before remotes, there was
a democracy, or at least an anarchy, to changing channels that is lost now
that the device can be hidden in the couch cushions. I suppose we've gone too
far to turn back now, in which case the only way to return some democracy to
the process may be to give everyone their own remote.
Speaking of democracy and process, this election also taught us some things
about how people become candidates. We saw how quickly a grass-roots campaign
can put a popular individual on the ballot in all 50 states, how effectively a
few million dollars can put him in contention, and how equivocally his
equivocation can take him out again. That's probably too many lessons for one
candidacy. Certainly too much equivocation.
Speaking of equivocation, John Sculley is waffling about PDAs, or personal
digital assistants. He now feels that these handheld electronic address books
are actually executive toys, as most people who have seen the things knew from
the start, rather than a key element in Apple's strategy to get into the home.
Meanwhile, Kaleida, the joint Apple-IBM venture in multimedia technology, is
under intense pressure to produce the software tools that will let Apple and
its Asian partners put CD-ROM-based products and titles into the home next
year. Sculley seems to have learned an important lesson in family values: If
you want an American family to plunk down thousands of dollars on an
electronic gadget during a recession, it had better be capable of showing
Michael Jackson videos.
Speaking of family values, we heard a lot during the campaign about the
influence of special interests on the political process, but I was surprised
to see how bluntly one of those special interests was identified: Christian
fundamentalism. I thought it was political suicide in this country to hold
churches accountable for their political actions. It's not artistic suicide,
of course, and Irish singer Sinead O'Connor proved this by tearing up a
picture of the Pope on American television. Dramatic, yes, but I still prefer
the ka-chunking hole-punch gesture.
Speaking of gestures, I recently saw a demonstration of yet another
beta-version pen computer. What made this demo interesting was the feedback
from the audience, which included at least one professional writer. She
couldn't see writing on a pen computer regularly, but saw great potential in
it as an editing device. Potential is the word: One has to overlook a lot of
errors on the part of the recognizer in these demos, and thus far I have not
seen recognition good enough to use routinely for writing or editing. The
writer really perked up, though, when the developer talked about using a few
character-recognizer-interpreted keywords to link files of mostly
uninterpreted ink. This possibility intrigued me as well, since it presents a
way to create hypertext links among documents without requiring any better
recognition rates than we have today. This could be a huge opportunity, and I
expect it to make a lot of money for someone.
Speaking of making a lot of money, Bill Gates became the richest person in
America this year, topping $6 billion in net worth, which seems excessive.
Seriously, what can you do with $6 billion? Well, yes, there is 1996. That was
one of the lessons we learned from this election, wasn't it, that earned
wealth qualifies one for public service. So, let's see, Bill is three times as
qualified as that mere $2 billion man, Perot, if we listen to the vote of the
cash register. Ka-chunk?










































Special Issue, 1992
Special Issue, 1992
EDITORIAL


Doing the Wrong Thing




Ray Valdes


Two years ago, in Seattle for a programming conference, I took a cab from the
airport. The driver, a burly fellow with tattoos on both forearms, noticed my
paraphernalia and said: "You have a computer? I'm running Windows 3.0 in
Enhanced Mode and really love it." It was then I knew, a full year before the
release of 3.1; that Windows would fully dominate the world of desktops.
I haven't talked to any cabbies who are using templates, exceptions, or
runtime type identification, but I don't consider the question, "Will C++ take
over the world?" an open issue any more.
Remember "The Year of the LAN"? For the last ten years, next year was
predicted to be it. The Year of the LAN never actually arrived, but today you
look around and see networks in place everywhere.
So it is with C++ Although no one has predicted "The Year of C++" (and such a
year may never be a visible entity anyway), the time is not far off when using
C for mainstream application programming will be as quaint as using assembler.
Ten years ago everyone in the mainstream was coming to grips with C (which was
by then old hat to researchers), but all successful programs were still in
assembler (Lotus 1-2-3, Wordperfect, Wordstar, NetWare, DOS). Now mainstream
programmers are absorbing C++ (which is now old hat to researchers and
academics), even though successful programs are still mostly written in C
(Pagemaker, Excel, Word, Windows).
So the question is not "Will C++ take over the world?" but rather, "Why?"
Given other alternatives, why choose a language that's been called. "One of
the most grotesque and cryptic programming languages man has ever created"
(Ray Duncan, PC Magazine, August 1991)?
I must confess I'm with Ray on this one. I've been using C for 12 years, and
object-oriented languages (Smalltalk, Objective C) on occasion since 1984.
I've no quarrel with the basic OOP goals of encapsulation, information hiding,
reuse of code via inheritance, and polymorphism; that's what I try to achieve
with my C coding style. And I realize that C can't tackle these problems
unless the language is extended. It's the resulting melange that rubs me the
wrong way--despite the great care and deliberation used by Bjarne and company
in understanding changes to the language.
Certainly there are some who feel as I do. Tom Cargill writes: "If you think
C++ is not overly complicated, just what is a 'protected abstract virtual base
pure virtual private destructor,' and when was the last time you needed one?"
(C++ Journal, Fall 1990). David Smythe says, "C++ is incredibly complex... The
number of caveats which productive C++ programmers must grasp is simply too
large" (C++ Report, February 1989).
I recently met with Michael Tiemann, author of GnuC++, president of Cygnus
(which sells support for GnuC++), and one of the handful of people who've
actually implemented a C++ compiler. The meeting was not on the subject of
language design, but he nevertheless volunteered the opinion that C++ was
grotesque and baroque when compared to cleaner languages such as Smalltalk and
Objective-C.
Yet despite these remarks, all the people I've quoted are actively using or
working with C++. Tiemann's company is a leading player in C++ for embedded
systems, Cargill just published a well-received book, C++ Programming Style.
Duncan completed an illuminating three-part series on C++ for PC Magazine, and
so on. To understand why this makes eminent sense, I reread a prescient paper
by Dick Gabriel, long-time Lisp hacker extraordinaire and founder of Lucid,
which sells (among other things) a C++ compiler.
This classic paper, entitled, "Good News, Bad News, and How to Win Big," was
circulated on the Internal last year. Gabriel begins by talking about the
MIT/Stanford style of design, an approach which can be called, "Do the right
thing." The tenets of this philosophy are simplicity, correctness,
consistency, and completeness. This design approach holds all these qualities
to be much more important than easing the programmer's burden in implementing
a given design. Results of this design approach include the Scheme language,
CommonLisp (with CLOS extensions), and the ITS operating system used to run
the PDP10s at MIT's AI lab.
In contrast to this philosophy, there's what Gabriel calls the
"worse-is-better" approach--also known as the "New Jersey" school of design
(after AT&T's Bell Labs facilities in that state). In the New Jersey approach,
simplicity, correctness, consistency, and completeness are all laudable goals;
but all can, on occasion, be sacrificed. Gabriel continues:
[In the New Jersey approach] it is more important for the implementation to
the simple than the interface... It is slightly better to be simple than
correct.... Consistency can be sacrificed for simplicity in some cases, but it
is better to drop those parts of the design that deal with less common
circumstances than to introduce either implementational complexity or
inconsistency. Completeness must be sacrificed whenever implementation
simplicity is jeopardized. Consistency can be sacrificed to achieve
completeness if simplicity is retained; especially worthless is consistency of
interface.... The programmer is conditioned to sacrifice some safety,
convenience, and hassle to get good performance ~ ~
In other words, if doing the right thing becomes too complex, punt, and let
the user of the API (or programming tool or language) bear the burden. Gabriel
considers UNIX and C to be the exemplars of this design method, with some
distaste.
But he then goes on to say:
However, I believe that worse is better, even in its strawman form, has better
survival characteristics than the-right thing, and that the New Jersey
approach when used for software is a better approach than the MIT approach....
UNIX and C are the ultimate computer viruses.... It is important to remember
that the initial virus has to be basically good.... Once the virus has spread,
there will be pressure to improve it, possibly by increasing its functionality
closer to 90%, but users have already been conditioned to accept worse than
the right thing. Therefore, the worse is better software first will gain
acceptance, second will condition its users to expect less, and third will be
improved to a point that is almost the right thing.
Gabriel's dispassionate analysis arrives at its logical conclusion: "The good
news is that in 1995 we will have a good operating system and programming
language; the bad news is that they will be UNIX and C++."
This explains why any new programs I write over the coming year will be in
C++. I don't think I'll be alone in this regard.































Special Issue, 1992
A CONVERSATION WITH BJARNE STROUSTRUP


The designer of C++ looks at where the language is going




Al Stevens


Al is a DDJ contributing editor and can be contacted through the DDJ offices
at 411 Borel Ave., San Mateo, CA 94402.


In late 1989, Al Stevens stopped by AT&T's Bell Labs and chatted with language
designers Bjarne Stroustrup, designer of C++, and Dennis Ritchie, known for
his seminal work on C. The interview that ensued appeared in Dr. Dobb's C
Sourcebook for the 1990s. Al recently made a return trip to Bell Labs and
Bjarne and picked up where they left off.
--Editors
DDJ: Since our last interview in 1989, there have been many new developments
in C++, most prominently the convening of the ANSI X3J16 committee. What is
your role in standardizing the language you designed?
BS: Basically two roles. The formal role is that I am the chairman of the
working group concerning extensions. We try and make policy for what can be
accepted and what can't be accepted, and we try to look into specific
proposals that come, trying to make sure that we don't have the language sink
under an avalanche of good ideas. That's actually a very difficult role,
partly because most of the ideas coming in are good. But if we took all the
good ideas, the language would sink without a trace.
I take all the opportunities I can to tell the story about the good ship Wasa.
It's a ship that was built in Sweden at the time when they had big wooden
battleships, and it was going to be the best and most beautiful battleship in
the world. It was even named after the royal family. They got it halfway built
when the king noticed that the opposition was building ships with two gun
decks instead of just one. This could be rather embarrassing because if the
Wasa came next to one of these new two-gun deck battleships, the Wasa would
soon be on the bottom of the sea with a lot of holes in it. So they added a
few features. They added another gun deck, they added lots more beautiful
statues. The king apparently was reasonably happy, but I hear the designer
went halfway mad and died out of panic over what had been done. But whatever
the case, the Wasa only made it halfway across the Stockholm harbor before it
keeled over with a gust of wind and killed 50 of the people on board. You
should go to Stockholm to see the Wasa; it is most impressive. I tell this
story to people who want just another feature.
Avoiding featurism is not that easy, so one of the things I do is try to keep
the language coherent, try and make sure that the features accepted actually
fit into the language instead of being warts on the side. Of course, everybody
claims that their feature fits into the language, and it's not a wart. You can
argue for every feature--and people do.
In general, I try to take as much part in the standards process as I can, and
try and bring a longer perspective to things. I've been involved in this for
12 or 13 years, and some people came in a couple of months ago. I try and
point out that there's a history here, try to point out that some of these
problems and suggestions we've seen before, try to point out that the whole
world is not a Cray, that the whole world is not a PC, try to point out that
not every programmer is a C programmer, not every programmer thinks that
Pascal is the greatest thing since hot dinners; I basically try to balance
things out. That's another sort of role that is hard for a lot of other people
to play. For that reason it sort of fell on me because I was there. One of the
things that's important in this context is trying to keep the language from
mutating into something very strange, keeping C++ as close to C as possible
but no closer. That is a policy that allows us to gain the benefits from C,
from the C standardization, and from the C experience without closing the path
of evolution and of using new techniques.
DDJ: When do you foresee the publication of a formal standard?
BS: We have to go for public review in late '93 according to plans, and then
it takes many months for people to come in with comments, and it takes at
least a year to work over the comments and work them into the standard. So,
maybe '95 or something like that. Now, people think that's forever. But what
they don't realize is that that is actually an incredibly ambitious program
for an ANSI and ISO standardization. C took seven years.
People should realize that there is a very important role for the standards
committee in addition to producing the standard. That is to act as a forum
where people can meet, discuss things, and agree on things at least until they
agree on something else. It allows the community to pull together around a
draft standard and settle issues instead of having people sitting, one guy in
California, one guy in Seattle, one guy in London, and making their own
decisions about what really should have been in and what really was meant by
this or that sentence in the ARM [Editor's note: The Annotated C++ Reference
Manual, Ellis and Stroustrup, Addison-Wesley, 1990]. So, I think that the
ANSI/ISO committee is very, very important as a forum for discussing issues
and for disseminating knowledge and disseminating techniques. In that sense,
the standardization effort started to do good at least a year before the
ANSI/ISO committee was convened. Because that was when the community started
talking seriously about should the language be standardized, when should it be
standardized, how should we standardize it. People started talking together.
To a large extent the ARM was the first effect of what was coming. It was
written with the knowledge that a document would be needed. If you look at the
acknowledgment list, there are more than a hundred people, and you didn't get
onto that acknowledgment list just out of politeness. These are people who did
some work.
DDJ: Can you predict how close to the ARM the standard will be?
BS: Many of the words will be different because one of the things we need is
more precision. But the spirit of the language will be the same, and many of
the details will be the same. There are millions of lines of C++ out there,
and most of them will still run in five or ten years. I don't see any major
language extensions coming--but you can have a nice little debate about what
is major. I most certainly don't see any major incompatible changes being
done. There are things in the C++ language that I don't like. There are things
in the C++ language that I've tried to remove on several occasions. But I
don't think that we can do such things. I'd like to get rid of the declarator
syntax, but I know I can't. It's not even going to be seriously discussed
because it's there, and even an ANSI/ISO committee can't do anything about it
at this stage. We know how to live with it and so we'll curse a bit, but we
will actually not make a change. Compatibility and stability are very
important goals.
DDJ: Will Cfront continue to evolve during the standardization process?
BS: Sure. I expect that Cfront and all the other C++ compilers will continue
to evolve. Some of them will die for various reasons. At least they might
conceivably--I'm not sure that any of them will. But since there are so many
players and since there is a fair amount of commercial competition, it
wouldn't be surprising if some of them didn't survive or if some that are
being built don't quite make it to completion. I think Cfront still has a
niche. Cfront was designed as something very portable, reasonably correct, as
a tool for giving people the ability to port to something new, to use C++ on
platforms where there wasn't any specific native C++ compiler. And with the
hope that eventually somebody would--in each ecological niche--build something
that could take advantage of local quirks to do a better job. But Cfront was
built to be hard to beat--to force competitors do do better--and it will act
that role out for some time yet. Even when all the specialist niches have been
taken over, some version of Cfront may still have a role as a vehicle for
cross-platform portability. Some versions of Cfront actually have "gone
native" as they start taking advantage of specific environments. The Cfront
version that goes on the HPs, for instance, has native code generation and
integration with an environment. Similarly, the Cfront version that is part of
Centerline's (Saber's) environment doesn't look or feel like Cfront, but it
is. Versions of Cfront adapted to local environments may have a long life.
DDJ: Are you still actively involved in modifying and maintaining Cfront?
BS: No, but I use Cfront as an experimental tool. I feel uncomfortable talking
about language extensions without having implemented them. For example, the
series of ideas for runtime type identification I implemented using Cfont. So
for me it's an experimental tool--and of course my compiler for everyday work.
DDJ: Have there been major standards issues where you and the committee have
disagreed?
BS: I don't think so. There has certainly never been a situation where here's
the committee and here I am. There have been situations where part of the
committee goes one way and part of the committee goes another way. The most
major of those was the so-called "great debate" over termination semantics for
exceptions, which we worked on for a long time and before accepting the
semantics described in the ARM by a great majority. In discussions, of course,
I'm in the minority some of the time and in the majority at other times.
Working things out takes time, and you work it out. I don't get my way all the
time, but I've got a pretty good track record simply because I can work on
this slightly more full time than a lot of people, and to some extent because
I'm more willing to compromise. I don't get along very well with "true
believers," and I feel that a lot of compromise is necessary when you are
dealing with something as big as C++ with so many interests involved. For
example, one of my language-extension proposals was shot down in flames. That
is probably a good lesson for some people who want to add features to C++.
Apparently good ideas don't always work, and even some of mine go down in
flames. Typically because they ought to go down in flames. In the case I'm
thinking of, it's just good we noticed the problems before we voted on it.
DDJ: You attend software development conferences, often give keynote
addresses, and talk to the programmers. Is the mood of the programming
community shifting closer toward acceptance of C++ than it was three years
ago?
BS: I really try to go away from my office only once every second month for a
conference or something like that. And that means with ANSI standardization in
the works, I probably only make three or four major trips a year. It's not
exactly as if I was a traveling circus. But yes, I go and I try to keep my ear
to the ground and understand what people think. One thing that's a problem
with conferences--or talking to people in research departments, or reading the
net--is that there are a lot of people who are at such places because they
like new ideas, because they really want to know the latest or because they
really know what the latest trend is. It's actually much more interesting and
much harder to talk to people working on software "in the trenches." At
conferences and such I like to be lurking around in a corner arguing with
somebody who's not up on the latest trend and seeing what their problems are.
It's much more interesting hearing about problems than hearing--for the
thousandth time--the latest solution looking for a problem.
DDJ: In your keynote addresses you still recommend that the shift from C to
C++ should be a gradual one, with programmers learning the improvements a
small step at a time. Do you foresee a day when object-oriented programming is
the first natural way that students learn to write software?
BS: Yes, I think so. I'm talking to mostly fairly experienced programmers who
are coming aboard to C++ and object-oriented tools and techniques from a
background in something like Pascal, Modula-2, C, or Fortran. For such people
it makes a lot of sense to leverage what they know and move them along a
gradual introduction to the ideas. That's what we know how to do, that's what
we've done, that's what we've seen successes with, and we haven't seen the
disasters that have been noted from coming from the other "100 percent OOP
now!" direction. People express fear that programmers never get to the
object-oriented stage by "going slowly." That's just not my experience.
Sometimes people say that they've been using C++ for half a year and they're
not fully OOP yet. So what? After half a year people are much further on than
they used to be and after a year to a year and a half you will see them all
the way. "Going slowly" also allows people to gain a much greater appreciation
of where the different approaches work best.
Now, the question is not so much will there come a day when another approach
is more appropriate because there will. The question is, "What will it take to
make the other approach reasonable?" Partly it will take that a lot of the
professional programmers have already made it most of the way so they can
start. In other words, the gradual approach will have brought the majority of
programmers to the point where they are ready for the OOP and data-abstraction
techniques. But it also takes better libraries and better environments than
are common these days. In particular, we're only just seeing the beginnings of
decent environments supporting C++ programming. We have the UNIX/PC
traditional environments, which, from sort of a detached point of view, are
so-so. Of course, they're much better than they used to be, but still we're
only a very small step along the way, and we have the Smalltalk
implementations that are sort of great in some ways but do not support a
statically typed language all that well and have a tendency to lock people
into a small environment. What I would like to see is a programming
environment that really understands the language--and by that I primarily mean
the static type system--and that really can be helpful in finding things in
debugging, in designing, in displaying the structure of programs, in
performance measurements. One of the sources of problems with object-oriented
programming has been people going overboard with purity and cleanliness, and
creating very slow monsters. I think the only thing that really will bring
that to an end is the ability to do decent profiling simply. We will need
better libraries, because if you want to start out with object-oriented
programming, you need a good library to support the concepts.
With C++ quite often people as a first exercise write a string class and as a
second exercise try to do a graphics system. That is very challenging and
might be very nice for a professional programmer, but it's not the best way of
teaching the average student programmer. What we need is an environment that
has a very good string class that you can take apart and look at and has a
very nice graphics system so that you never care about Windows or X again
unless you absolutely want to. So I think the two things needed to be able to
start with object-oriented programming are an environment and a library. Then
we could also do with better textbooks for beginners and we could do with
better design books for slightly more advanced people. All the components of
what I'm talking about are "almost there." We have decent textbooks for
beginners, although I'd like to see them better. We have decent design books,
although I'd like to see them better. We have decent environments, although
they're not as good or as widespread as I'd like them to be. Give it another
couple of years, five years maybe, and we'll actually have all these bits and
pieces together. It will be most interesting. Then, I don't think there will
be any real discussion about what the right approach is. People who want to
jump straight to "true object-oriented programming" now have simply
underestimated the size of the job and have overestimated the ability of the
current support for doing so. It's alright to jump, but we have to land on our
feet.
DDJ: In your presentations you joke about C++ and call it a "strongly hyped"
language....
BS: No I don't. I joke about people having strongly hyped languages and about
people using hype and exaggerated claims about the languages to get users, and
using advertising gimmicks instead of logical argument and solid case studies.
I am against that behavior even when done by C++ proponents. Maybe it's worth
remembering that for many, many years the only language in the field of
object-oriented programming that did not have advertising and marketing and
all of that was C++. C++ became the most used object-oriented language before
the first C++ marketing campaign and the first paid C++ advertising appeared.
DDJ: Nonetheless, there has been hype. Do you think it has hindered or
advanced the cause of C++?
BS: I think all hype has the effect of giving a temporary advantage which
turns into a larger disadvantage soon afterwards. People don't deliver what
they promise. I think that C++ has promised much less than the opposition and
delivered a much larger fraction of what was promised.
DDJ: The ARM identifies templates and exception handling as experimental.
Compilers and translators now exist that implement templates in close
compliance with the ARM's definition. Are the template and exception-handling
designs reasonably firm now?
BS: Oh, definitely. They're marked "experimental" because you must have a
first or a second printing. In the third and later printings, the
"experimental" has gone and instead there's a note saying that this text was
voted into the language by the standards committee complete with the dates of
when it happened. Yes, the designs are firm. They have been implemented. Minor
details will be elaborated and made more precise, reflecting the experience
from implementation. There may even be minor changes. I wouldn't be surprised,
but I do expect my old code to keep running. I am quite unhappy about
standardizing things where I don't have personal implementation experience,
but I seem to have been reasonably lucky with exception handling and
templates. They can be implemented, and they have been implemented as
specified. I have talked to several of the implementors and gotten some of the
darker corners pinned down and some of the things where the text is either so
short that people could wonder if it was really there or where it could be
read in several ways. I think we'll see some clarification but you have to be
a language lawyer to even find the spots.
DDJ: What are the major additions Version 3.0 adds to C++?
BS: Version 3.0 is the language described in the ARM and in my second edition
[Editor's note: The C++ Programming Language, second edition, Addison-Wesley,
1991], minus exception handling. Version 2.0 didn't have exceptions and didn't
have templates, and there was a variety of minor facilities and clarifications
that hadn't been implemented, but they are very minor. The new thing in 3.0 is
templates, and the next wave of C++ compilers will be exception handling. I
hear that IBM has actually started shipping their compiler for the RS/6000
series, which supports both templates and exception handling. If so, it's the
first generally released one.
DDJ: Have you used any of the C++ compilers on the PC, and can you comment on
them?
BS: People always ask me that. And people keep sending me implementations--and
often two weeks later their marketing department asks what I think about it. I
have a simple defense. I don't have any hardware that can run that stuff. I
never load them. Actually, I would like to play with them because I hear that
they are very good, but I hardly dare to do it: It would cost me too much
time.
It is amusing to hear these discussions about whether X is better than Y
because, for most users, X and Y are simply beyond what they've had before
with C and other languages by a large factor. They have major squabbles about
the last fraction of a percent and usually about very obscure things. I think
the technical departments of just about everybody in the C++ field are better
than what people have been used to in the C field. Marketing is scrambling to
find ways of saying they're much better than competitors, but they're all so
good that that's a hard job.
DDJ: Some C++-style conventions are emerging and being taught, particularly in
the area of class design. For example, we are told to place public members
before private members; to have no public data members; to code inline member
functions outside of the class declaration; and so on. How important are such
conventions, and should they be formalized?
BS: Style is important, but it's very hard to legislate. It's like saying, "Do
you like dark chocolate or milk chocolate?" I prefer dark chocolate,
especially with nuts, but it doesn't mean that I should legislate that you
have to eat it. From a language designer's point of view, I try to teach a
little bit by example, and I try not to make statements of the form, "Thou
shall not," or "Thou shall." If you look at some of the style guides, you
often find lists of things you must and must not do, which are simply
transcriptions from where my second edition says, "You might like to do that,"
and "Only do this if you know what you are doing." If you have a software shop
with a group of people who have to work together, it's a good idea to sit down
and make a set of rules. It's good to look at something that has evolved over
time to have some experience. You can look at the style used in my books or
somebody else's book. Start from one of them, evolve from one of them, try to
have a house style, and don't get upset about spelling rules and whether
public or private comes first. That doesn't matter very much. It matters much
more whether the programmers understand what is an abstract class and what do
you use it for. What are the pros and cons of having a rooted hierarchy? When
do you want a concrete class? Things like that, which are not usually in the
style guides, are much more important for writing successful programs. The
trouble with a style guide, especially if it's enforced by a non-C++
programmer, is that it favors form over substance. A good style guide would
free people from thinking too much about stylistic issues. I dislike style
guides that think they can force people to do what's right because a style
guide can't.
DDJ: Programmers and managers often ask, "When should we use C and when should
we use C++?" Given a competent and motivated design and programming team and
the availability of both languages, are there circumstances in which C is a
more appropriate tool than C++?
BS: Given all that you say, no. My usual answer to that question is, "Look at
the tools." If the C++ tools are there, use C++. Otherwise use C. You may not
be able to make the transition tomorrow, but that's usually because you need
to install the tools. C++ is a better C. Just go use it. Notice that my answer
is different than the one I gave some years ago. Things have matured.
DDJ: Let's play a phrase-association game. I'll mention some areas of interest
to C++ and object-oriented programmers and ask you to share your thoughts on
them with us. Concurrency.
BS: It's been well known for the last 30 years or so that next year
concurrency will be very important. I did my PhD on concurrency issues, and I
came out of Cambridge knowing 40 ways of getting it wrong. I was sure that I
did not know a single way of getting it right for everybody. From that
observation came the policy for C++ that we don't put concurrency features
into the language. We try to provide libraries and primitives that allow you
to build concurrency features--systems for a particular application. This has
been done. A simple and primitive example was the task library which I wrote
for simulation and which was later adapted for robot programming. There are
other examples of concurrency-support libraries. I have not seen as much as I
would like in that respect, but I would like to see libraries supporting a
variety of notions of concurrency for C++. I consciously decided that I didn't
want a medium-level facility like Ada's, which pleases teachers but is too
high-level for the kernel hackers and too low-level for the database guys.
DDJ: Persistent objects.
BS: It's easy to say and hard to do because persistence covers every thing
from writing an object out to disk and getting it back to having a full-blown
distributed object-oriented database system with concurrency control,
transaction logging, etc., for multiple users. It is an area where a lot of
work will be done. We have several systems supporting various forms of
persistence for C++. But again, I strongly suspect that it's not a language
issue. It's an environment issue; it's a library issue; it's a tools issue.
The work that we are doing on runtime type identification will provide some
support for people who want to do databases and object I/O. There will
probably be a standard form for tying in added information about object layout
and such.

DDJ: That old chestnut, portability.
BS: Portability is an economic issue. You're portable if it's cheaper to
modify the program to run somewhere else than it is to build a new program
from scratch. The best definition of portability is that it's much cheaper to
port it. People who are thinking about 100 percent portability are deluding
themselves. You can only get 100 percent portability provided you port between
two systems that are very similar. And we can't get everything in the world
very similar in that sense. I don't think the environment of a Cray should be
the same as the environment of a 386 to the point where you could guarantee
that every program that ran on the one would run on the other. The people who
use the one kind of machine have one kind of concerns, and the people on the
other have other kinds of concerns, so portability is always something that
will take a certain amount of work. Though we'll always curse and say that's
too much, I suspect that over the years it will steadily become better. But
people should realize that 100 percent portability is a very dangerous idea,
because that's sort of a return to the Middle Ages where everything is what it
was in my father's time, and nothing changes. That's a caricature of the
Middle Ages, true. It wasn't like that. If it had been, we would never have
gotten the Renaissance. We don't really want the world to be so uniform that
100 percent portability could be guaranteed. We don't want to live in such a
world.
DDJ: Standard class libraries for fundamental data structures.
BS: One of the things that I regret about C++ and its development was that I
didn't take a half-year break just before I published my first book to write
the libraries I planned. It would have been very useful to get just the
basics. We have I/O. I had complex in as an example. It's not very important
except as an example and to some mathematicians and engineers. It would have
been nice to get lists--arrays with bounds checking--especially in associative
arrays. They're the most useful single data type there is. Just a few things
like that would have created immense leverage for people building more
extensive libraries. Beyond that, I have my doubts about libraries intended to
work for "everybody." We need libraries that are standardized for a given
domain. A standard library for workstations, maybe, one for certain kinds of
mathematicians, one for Frenchmen, and another for Japanese dealing with text
processing. A standard library for a certain industry such as CAD/CAM or
telecommunications. I don't believe in intergalactic standardization. The
world is just too big. You have to have room for innovation in many places,
and you have to understand that not everybody can agree on everything. The
basis for agreement in C++ for the moment is too low, but there is a lot of
healthy diversity also in the C++ world. The standards committee is looking
into this problem. One reason I didn't solve the problem in '85 and one reason
that the standards committee hasn't gone further was that we lacked model
libraries that didn't lock everybody into doing things the "one right way." A
lot of people were looking to Smalltalk and the single root hierarchy as the
standard way, and it was clear to us that we couldn't do that because that
imposes a fixed cost on everybody. You can never get a really fast complex
number if you have to fit into a library hierarchy that has a class object at
the bottom. You can't get a real small complex number. You can't get a complex
number that will have layout com


patibility for a
n array of complex numbers with Fortran. In other words, you damage the
ability to invent independently and to cooperate with the rest of the world in
your own language or in some other language. We needed a model of libraries
that didn't have that problem. I think we have one now. The library-design
chapter of my second edition discusses the classification of classes by
saying, "We have concrete classes." These are classes that are meant to behave
the way the built-in types are. It's things like a complex number or string.
Nothing fancy. They're meant to be reused as building blocks for something
else, but not through derivation. They're meant to be focused on one problem,
not generalized. They're meant to be efficient in time and space. They are
very much like integers. They are free-standing for that same reason. They are
not part of a framework. At the next level we can have abstract classes. You
have an abstract interface, and you can use hand-coded implementations for
them through derivation, or you can use concrete data types to implement them.
The abstract data types then become a way to tie in separately compiled and
separately developed classes--separately developed code to have a common
interface to use. Importantly you can do it after the event, after classes
have been developed separately. What is wrong about the class hierarchies is
that you have to design the hierarchy first and then fit things into them.
With the abstract classes you can have all the bits and pieces out in the
world and then you can tie them together after the event. I think you'll see a
lot of libraries that build along that idea, and I think you might see a
little bit of work in the standards committee in providing a set of classes
that can be used to do something like that.
I'll give you an example. Say that I want to write my program so that I can
read data structures--files and such. I can build myself an abstract iterator
class that gives me the next element. I can then tie it to either a Borland
container or a Microsoft container, depending on which way the wind is blowing
or which day of the week it is or maybe even use both of them in the same
program. Importantly, I can write my program independently of which of the
so-called "standard libraries" we are using. On the other hand, I'm not trying
to dig into their business. I'm not trying to replace their library or
replicate the work they've already done. I'm just trying to provide an
interface between what they offer and what my program uses. That kind of thing
is very important. In the context of libraries, there were language feature
reasons for not succeeding in '85. If you read the paper I wrote in early '86,
"What is Object-oriented Programming?" you'll see I bemoan the absence of
templates as a way of expressing container classes and operations that are
parameterized by types. I don't think we could do a really good job before we
had templates. Now we have them.
DDJ: C++ CASE tools.
BS: I don't have much personal experience. There seems to be lots of them.
There is one kind of CASE tool which I have seen, and I'll draw a deliberate
caricature here and hope that nobody will recognize their own product in it.
In the caricature, you have a bunch of managers who have decided that
programmers are evil, and you have to produce a straight jacket for them, so
that they cannot think, and if they think, they have no way of expressing what
they thought. They should just provide the code that designers wearing suits
have come up with and specified. One of the worst things you can have is
language dependence, so everything has to be language independent, meaning you
have to program in the common subset of C and Cobol that these managers and
designers knew when they were young, back in the Middle Ages. This is the
caricature, and this is what gives CASE a bad name.
The last time I looked around a trade show, I didn't see anything offered that
fit that model very well. Everybody was trying to be more flexible, everybody
was trying to understand that there were such things as classes, and that
types had some place in a programmer's conceptual world, and that class
hierarchies were important and such. To the extent where CASE tools actually
help programmers instead of just being a straight jacket and stifling anything
new, to that extent it's good. You can do nice things. But I still feel that
there may be a legacy somewhere that's very worried about what the programmers
might do. There are a lot of people in the computing industry who don't like,
understand, or trust programmers. Traditionally, one of the slogans in C is,
"Trust the programmer." To some extent, C++ has that in it still. I do not
consider programmers inherently evil or stupid, and I think any system, be it
a language or a CASE system, should leave open the possibility for showing
initiative and doing things in slightly unusual ways because, related to one
language or one CASE system, most of the world is unusual. For a better idea
of what I think about design issues and how they relate to language issues,
read chapters 11 and 12 of my second edition.
DDJ: Critics of C++ tend to also be critics of, or at least holdouts from,
object-oriented programming. They often point out imperfections in the C++
language. What, in your opinion, are its major weaknesses?
BS: First a general observation. Since C++ is so much out in front, it seems
that everybody agrees that it's wrong and not good enough. That's the only
thing they agree on. Some people want some other object-oriented language,
somebody wants a 4GL, somebody wants a functional language, somebody wants
something completely different, somebody wants to stay back in the C and
assembly-code world. But everybody agrees that C++ is not the answer. This is
the nature of being out in front. A friend of mine once said, "You know when
you're out in front. The guy out in front has the arrows in his back."
C first and then C++ are among the few languages that say, "Yeah, we're not
perfect." There are a lot of trade-offs here. This is the real world. There
are some things we don't like, and we can't do anything about them. There are
some things we don't like, and it's too hard to do anything about them.
Perfection, in some language theoretical sense, is not an aim of C++. Utility
is. C++ allows you to write programs faster than you can do in C. It allows
you to build better tools. It allows you to write programs that run faster
than anything comparable in terms of expressiveness. It allows you to
structure programs well. It allows you to deliver on time and within budget.
In many cases it allows you to support various design strategies. It allows
you to fit in an environment that is not meant specifically for an
object-oriented system. It allows you to coexist with C and Cobol and Fortran.
That's what matters, not whether I know a better syntax for declaring
variables than the one we've got in C++. I knew that 15 years ago, but it's a
second-order issue. So my answer is, "It works."
DDJ: Do you have any opinions about what form the next major programming
paradigm might take?
BS: Not really, I spend too much time in the trenches getting C++-related
stuff to work. I don't think that whatever the next great wave is will sweep
away everything in front of it. Like all the other good ideas, they tend to
get absorbed and adapted. Ideas that are totally alien from what we have now
don't make it out of a cult world. Structured programming worked as far as it
fitted into ordinary programming. Data abstraction worked as far as it fitted
in with other programming techniques. Object-oriented programming is going to
work as far as it fits in with the rest of the world. Whatever comes next--I
would like to see some rule-based systems, I would like to see some parts of a
system functional, and such--will fit in with other things. Otherwise they
will stay largely unused. People forget that the idea of having exactly one
language used for everything else is sort of strange. I don't think a single
language can serve all uses even for one model of the adventurous programmer.
We'll always see places for many languages, some of them very specialized. We
will see experimental languages that you hope will do things much better than
C++ in some areas. What comes next will absorb from these experiments.
DDJ: Will you continue to contribute to language design and development beyond
C++?
BS: I don't think so. I got into this because I needed some tools. In a couple
of years I will have them--10 or 15 years later than I thought. After that I
don't quite know what I'm going to do. I have no wish to become a professional
language designer. It doesn't seem a good idea. I am currently trying to use
what little time I get to move away from language design and get more into
tool design. To my mind that's not all that different because the language and
its compiler are just prominent tools. So I'll go back and dream up some
different kinds of tools, maybe to do with programming. But, language design?
Nah.











































Special Issue, 1992
STANDARD C++: A STATUS REPORT


Defining a language standard




Dan Saks


Dan is the founder of Saks & Associates. He serves as secretary of the ANSI
and ISO C++ standards committees. He is also contributing editor for The C
Users Journal and columnist for The C++ Report. He and Thomas Plum are
coauthors of C++ Programming Guidelines, and codevelopers of Suite++: The Plum
Hall Validation Suite for C++. You can reach him at 393 Leander Dr.,
Springfield, OH 45504, by phone at 513-324-3604, or at dsaks@wittenberg.edu.


The ANSI C++ technical committee, X3J16, has been working on a formal C++
standard for almost three years. But the committee has yet to release a draft
standard to the public, and an official standard is still several years away.
Nonetheless, the committee's decisions have already affected the C++ compilers
you use today, and will certainly shape the compilers and libraries you use in
the future.
In this article, I'll explain how the C++ language definition is changing as
it evolves into a standard. I'll cover the committee's major technical
decisions and describe various problems that are yet unsolved. I'll also look
at the prospects for a standard C++ library.


An International Standard


X3J16 is an ANSI technical committee, but it isn't writing just the U.S.
national standard for C++. X3J16 is working closely with the ISO C++ Working
Group, WG21, to develop an international C++ standard. (The working group's
full name is ISO/IEC JTC1/ SC22/WG21. See the text box entitled, "Who's
Standardizing C++?" for a more detailed explanation of the standardization
process.) International programming-language standards typically start as
national (read "ANSI") standards. But U.S. programming-language standards
reflect American natural language and culture. Although programmers around the
world may tolerate programming languages with English keywords, many need to
express parts of their programs, like string literals, in their native
languages. Not unreasonably, many also want to write identifiers and comments
in their own language.
But many natural languages, even those based on the Roman alphabet, use more
than just the 26 characters in the English alphabet. European keyboards have
letters with accents and umlauts (like a and o, respectively), combination
letters (AE), and other characters. Japanese Kanji keyboards have hundreds of
keys for composing thousands of different characters.
European computer systems usually make room for native-language characters by
omitting certain punctuation characters. For example, Danish keyboards,
displays, and printers replace the [ and ] with AE and A, respectively. The C
program in Example 1(a) comes out on a Danish printer looking like Example
1(b).
Example 1: (a) C program using U.S. ASCII; (b) the same program using Danish
ISO 646; (c) the same program using trigraphs; (d) the same program using the
new digraphs.

 (a)
 int main(int argc, char *argv[])
 {
 if (argc < 1 * argv[1] == '\0') return 0;
 printf("Hello, %s\n", argv[1]);
 return 0;
 }

 (b)
 int main(int argc, char *argvAEA)
 ae

 if (argc < 1 00 *argvAE1A == '00') return 0;
 printf ("Hello, %s0n", argvAE1A);
 return 0;

 a

 (c)
 int main(int argc, char *argv??(??))
 ??<

 if (argc < 1 ??!??! *argv??(1??) == '??/0') return 0;
 printf ("Hello, %s??/n", argv??(1??));
 return 0;
 ??>

 (d)
 int main(int argc, char *argv<::>)
 <%


 if (argc < 1 or *argv<:1:> == '??/0) return 0;
 printf ("Hello, %s??/n", argv[1]);
 return 0
 %>

In a serious attempt to accommodate the needs of C programmers in non-English
speaking cultures, the ANSI C committee added the wide character type wchar_t,
multibyte character literals, trigraphs, and locales. But their efforts fell a
little short. ISO adopted ANSI C as the ISO standard, but the ISO C working
group spent another few years preparing an addendum to the standard that,
among other things, provides better support for linguistic and cultural
variations.
Even before X3J16's first meeting in December 1989, SC22 expressed interest in
an international standard for C++. However, some SC22 members were concerned
that previous ANSI programming-language standards, including C, didn't meet
the needs of the international community, even though those ANSI standards
were adopted as ISO standards. Sympathizing with their concern, X3J16 decided
to try to produce a standard that ISO would accept without change as an
international standard.
At SC22's request, X3J16 wrote a project plan for SC22 to create WG21 to work
with X3J16 in developing the C++ standard. Also, X3 changed X3J16's charter
from type D (domestic standards development) to type I (international
standards development). This means that X3J16 is developing the ISO C++
standard with the intent that it will also become the ANSI standard.
WG21 held its first meeting in June 1991. All X3J16 meetings since have been
joint meetings with WG21. I call the joint committee "WG21+X3J16." We meet
three times a year, typically in March, July, and November. Each meeting lasts
four and one-half days.


The Working Paper


Programming-language standards aren't written from scratch; the committee
starts with one or more "base documents." X3J16 selected two base documents:
The AT&T C++ 2.1 PRM (Product Reference Manual).
The ANSI C Standard.
Many committee members wanted to use the Annotated C++ Reference Manual (the
ARM) as the first base document. The ARM is an updated version of the PRM
sprinkled with annotations and commentary that elaborate and clarify the PRM.
While the ARM includes complete chapters on templates (Chapter 14) and
exception handling (Chapter 15), the PRM has only empty place holders.
However, AT&T (which owns the copyrights for both the ARM and the PRM) gave
X3J16 permission to use only the PRM (not the ARM) for the standard. Thus,
although we often think of the ARM as the base document, strictly speaking,
the annotations and commentary aren't included.
When the committee selected the base documents, there was no ISO C standard.
We decided that when the ISO C standard becomes available it will be our third
base document. The ISO C standard turned out to be the same as ANSI C, but ISO
C will soon have an addendum that WG21+X3J16 will have to consider.
The current draft of the C++ standard has no formal standing. We don't even
call it the "draft;" we call it the "Working Paper." The project editor used
the PRM as the first version of the Working Paper, and has spliced parts of
the C standard in as needed.
The editor produces a new Working Paper three times a year, about two months
before each meeting. At each meeting, the committee approves the document as
the basis for future work. Someday we'll approve the Working Paper as a draft
and make it available for public comment. I don't know when that day will be.
More Details.


Templates


AT&T C++ 2.1 does not implement templates, so the PRM does not describe them.
However, the ARM describes templates (although the description is labeled
"commentary"). The committee added the ARM's chapter on templates (less the
annotations) to the Working Paper. At the time, there weren't any commercially
available compilers supporting templates. Now there are several. The remaining
discussion of templates assumes you are familiar with basic template features.
[Editor's Note: For more information on templates, see "Templates in C++" by
Nicholas Wilt on page 29 of this issue.]
X3J16's Formal Syntax working group identified problems with the template
syntax. All the problems stem from the choice of <and> as delimiters for
template argument lists. A C++ parser might have trouble distinguishing when
<and> are delimiters and when they are operators.
For example, the formal parameter of a template can be a type, as in:
 template <class T> class list;
or it can be a value, as in:
 template <int n> class buffer;.
For class buffer, the actual argument in a template instantiation can be any
integer expression, as in:
 buffer<10>b1;

 buffer<2*BUFSIZ>b2;
It can even be an expression containing the <and> operators, as in:
 buffer<x>y>z> b3;.
A C++ parser must be prepared to look arbitrarily far ahead to determine that
x>y>z is the template argument. The committee briefly considered using
parentheses, braces, or brackets for template argument-list delimiters, but
decided to stick with <and>. The Formal Syntax group has suggested alternative
grammars for C++ that correct the problem, but the committee hasn't selected
one yet. In the meantime, if you limit your template arguments to simple
expressions, you shouldn't have any trouble with today's compilers.


Exceptions


As with templates, the PRM doesn't have a chapter on exceptions, but the ARM
does. So the committee adopted the ARM's Chapter 15 on exception handling,
less the annotations, of course. Because exception handling is not yet
generally available, I'll take a moment to explain it.
Exception handling is a mechanism for responding to events that may disrupt
the normal flow of a program. The C++ exception-handling mechanism is designed
to handle synchronous exceptions, like resource-allocation failures or values
out of range, rather than asynchronous events like device interrupts. The
syntax for exception handling uses three new keywords -- catch, throw, and
try. My stock example in Figure 1 shows how exceptions work.
Figure 1: Exception handling in C++.

 int f()
 {
 try
 {
 // the compound statement part
 int n = g();
 // ...
 return n;
 }
 catch (int x) // a catch clause

 {
 cerr << "number" << x << " happened\n";
 return x;
 }
 catch (char *x) // another catch clause
 {
 // respond in some other way ...
 return -1;
 }
 }
 int g()
 {
 return h();
 }

 int h()
 {
 if (something_wrong)
 throw 2;
 // keep going ...
 }

The entire body of function f is something called a try block. A try block
consists of a compound statement followed by one or more catch clauses. The
catch clauses handle exceptions that may occur while executing the compound
statement. Executing a throw expression triggers ("throws") an exception. The
throw may occur in the compound statement itself or in functions called from
the compound statement. This particular try block in f calls g, which calls h,
which conditionally throws an exception.
Throwing the exception terminates the execution of both h and g, and returns
control to a catch clause in f. f has two catch clauses, only one of which can
handle the exception. The program selects the catch clause by matching the
type of operand in throw with the type of expression in catch. In Figure 1,
the operand of the throw expression in h is of type int, so the first catch
clause in f catches that exception.
Some of you may recognize that exception handling is similar to the
functionality of the standard C setjmp and longjmp functions. However,
exceptions are safer and more powerful than setjmp and longjmp. If either g or
h declared local objects with destructors, throwing an exception in h invokes
the destructors for those local objects as it "unwinds" the stack on the way
back to the catch clause in f. longjmp merely discards intervening stack
frames as it returns to the setjmp point without calling destructors, leaving
resources used by local objects in an uncertain state. Also, C++ exceptions
can throw objects of any type to a handler, but longjmp only transmits integer
values.
The committee had little doubt about the need for exception handling in C++,
but there was considerable debate about the underlying execution model. The
question was whether exception handling should only support termination, as
described in the ARM, or support resumption instead. With resumption, a catch
clause can return control directly to the throw point after handling the
exception. With termination, the only way you can return to the throw point is
by repeating the normal flow of execution. Bjarne Stroustrup summarized the
question as: Does throwing an exception mean "get out" or "get help?" The
committee opted for simplicity and stayed with the termination model.
When we added exception handling, we also relaxed the rules for matching throw
expressions with catch clauses to allow a wider combination of type matches.
We also added a small section on access rules for thrown objects.


New Tokens and Keywords


As I explained previously, programmers in non-English cultures have an added
burden programming in C++ because C++ uses punctuation characters that have
been replaced by native-language characters. (C programmers have this same
problem.)
C++ uses ASCII as its character set. ASCII is the U.S. variant of the ISO 646
standard character set. ISO 646 uses fewer character codes than ASCII. Each
national variant of ISO 646 can use the unused codes for native-language
characters or symbols. In ASCII, characters like {}[]^~ occupy the unused ISO
646 codes, and C++ uses these characters heavily.
C++ programming on systems using national variants of ISO 646 might be easier
if programmers could write C++ programs using only invariant ISO 646 character
set, and avoid the troublesome characters.
Standard C's trigraphs don't offer a particularly readable solution to this
problem. Trigraphs are three-character sequences that are alternative
representations for the troublesome characters. For example, the trigraphs for
[and] are ??(and ??), respectively. Using trigraphs, the C program in Example
1(a) looks like Example 1(c).
But trigraphs were never intended for humans to compose C source code. They
were designed to aid mechanical translation into C.
Keld Simonsen, the Danish representative to the ISO C and C++ Working Groups,
devised a set of digraphs (two-character symbols) and new keywords as
alternate spellings for the offending C and C++ symbols, shown in Table 1.
Using these symbols, Example 1(a) looks like Example 1(d). Notice that you
still need the trigraphs inside the character and string literals.
Table 1: New digraphs and keywords.

 Existing Alternate
 -----------------------

 [ <:
 ] :>
 { <%
 } %>
 & bitand
 && and
 I bitor
 II or
 ^ xor
 ~ compl
 &= and_eq
 I= or_eq
 ^= xor_eq
 ! not
 != not_eq


The ISO C addendum specifies the identifiers in Table 1 as macros defined in a
new standard header, iso646.h. Thus, you can continue using those identifiers
as user-defined identifiers in C as long as you don't include iso646.h.
However, the C++ Working Paper adds the identifiers to the set of reserved
words. This means that, at some future date, you will not be able to use those
identifiers as user-defined identifiers in any C++ program. Consider yourself
warned.


Core Language Issues


New features to C++ draw a lot of attention, but the real work of the
standards committee is ironing out the flaws in the language description.
These flaws include inconsistencies, ambiguities, and minor omissions. Here
are a few of the flaws in the ARM corrected by the Working Paper.
The name S in struct S {...}; is called a "tag name." In C, tags are not type
names. That is, you cannot declare S x;. You must write it as struct S x;. In
C, you can turn a tag into a type name using a typedef, as in typedef struct S
S; and then you can write just S x;.
In C++, tags names are automatically both tag names and type names. For
compatibility with C, C++ accepts (and ignores) typedefs that equate type
names with tag names.
Some C programmers mimic the C++ behavior by always declaring their structs
using typedef struct S {...} S;.
Other C programmers don't even bother with the tag name and write
typedef struct {...} S;.
Section 7.1.3 of the ARM states, "An unnamed class defined in a typedef gets
its typedef name as its name. For example, typedef struct {/* ... */} S; //
the struct is named S." But it's not clear whether a member function with the
same name as the typedef is a constructor. In other words, given the code in
Example 2(a), is A::A a constructor?
The committee decided that the answer is no. The commentary in the ARM makes
it clear that this rule was only meant to give the class a name for linkage.
So the committee changed the rule to say, "An unnamed class defined in a
typedef gets its typedef name as its name for linkage purposes." Our intent is
that the previous declaration should be equivalent to Example 2(b), making it
clear that A::A can't be a constructor. We also intended that this class not
have a destructor, but those words are not yet in the Working Paper.
Example 2: (a) An unnamed class defined in a typedef; (b) equivalent to 2(a)
using a dummy typedef name.

 (a)
 typedef struct {
 A();
 } A;

 (b)

 struct dummy_name {
 A();
 };
 typedef struct dummy_name A;

Another problem in the ARM appears in Section 6.7. It says, "An auto variable
constructed under a condition is destroyed under that condition and cannot be
accessed outside that condition." In Example 3, you cannot access j after the
first If statement, because either j was never created (if i is 0), or j has
already been destroyed. This rule creates the only situation in C++ where the
lifetime of a named object ends before it goes out of scope. That is, you can
see j but you can't touch it.
Example 3: Case where an auto variable j is constructed within a conditional
statement.

 if (i)
 for (int j = 0; j < 100; j++) {
 // ...
 }
 if (j != 100) // error: access outside condition
 // ...

The Working Paper eliminates this anomaly by changing the rules to say the
statement in an If, If-Else (both branches), Switch, While, Do, or For
statement implicitly defines a local scope. Example 4(a) is now equivalent to
Example 4(b).
Example 4: (a) Under new rules, the statement in a conditional implicitly
defines a local scope; (b) equivalent to Example 4(a) under the new rules.

 (a)
 if (i)
 for (int j = 0; j < 100; j++) {
 // ...
 }

 (b)
 if (i) {
 for (int j = 0; j < 100; j++) {
 // ...
 }
 }
 // j is no longer in scope



Name Lookup



The committee's Core Language working group has spent a great deal of time
trying to pin down the rules for looking up names (identifiers) as they are
referenced. The working group used the example in Figure 2 to illustrate the
problem.
Figure 2: Name Lookup example.

 1: struct X {
 2: static int i;
 3: struct Y {
 4: int i;
 5: void f();
 6: };
 7: };
 8: int i;
 9: void X::Y::f() {
 10: i = 5;
 11: }

The question is, to which declaration of i does i=5; on line 10 refer? It
could be the X::i on line 2, or X::Y::i on line 4, or the global ::i on line
8. The ARM doesn't say. The Core Language working group informally agreed that
the answer is X::Y::i. They also agreed that if you comment out line 4, then
the answer is X::i.
The committee has yet to formalize these rules, but it appears that they will
be something like the following. To look up a name inside the definition of an
out-of-line member function:
1. Look in the local scope.
2. For each class name from left to right in the explicitly qualified name of
the member function, look in the scope of that class. That is, for X::Y::f,
look in Y, then look in X.
3. Look in successive enclosing scopes, moving outward to file scope.
The committee has resolved other problems with friend declarations and the
"rewriting" rule for inline member functions. But, there are still many other
details to work out. For example, inline friend function declarations raise
other unanswered questions. Consider the example in Figure 3, in which
function X::f is defined as a friend inline in Y. Amazingly, the ARM doesn't
prohibit this definition. So, which i does i = 5; on line 8 refer to? The ARM
doesn't say, leaving each implementation to decide for itself. The committee
is considering simply banning inline definition of friend functions to avoid
this problem.
Figure 3: Inline friend-function example.

 1: int i;
 2: struct X {
 3: void f();
 4: };
 5: struct Y {
 6: static int i;
 7: friend void X::f() {
 8: i = 5;
 9: }
 10: };



The Standard Library


Although many C++ users would like the C++ standard to include an extensive
class library, that's not likely to happen. The job is just too big. WG21+
X3J16 has wisely limited itself to a few critical library components:
Language support. Functions and classes required for runtime support of the
C++ language. This includes standard implementations for the free-store
management functions defined in the header new.h: new, delete, and possibly
set_new_handler. It also includes exception-handling support defined in a new
header exception.h: functions terminate, set_terminate, unexpected,
set_unexpected, and a class SUE all similar to those described in Section
15.6c of the ARM.
Input/Output. A simplified version of the iostream library distributed with
AT&T's cfront 2.0, with additional support for wide characters, multibyte
strings, and locales. The working specification does not use templates, but
does use exception handling.
Standard C. The Standard C Library adapted to C++.
Strings. One or more classes to support variable-length strings. Look for
strings of char (ordinary "narrow" characters) and wchar_t (wide characters).
Simple foundation classes. Classes like bit sets, bit strings, and a template
for general dynamic arrays.
At present, the Working Paper only includes the C++ version of the Standard C
Library. Some aspects of the C library don't mesh well with C++, so the C++
standard makes some minor adjustments.
For example, the C declaration for the strchr function in string.h opens a
"hole" in the library's type safety. The C declaration for strchr is:
 char *strchr(const char *s, int c);
strchr returns the address of a character in the string addressed by s (or a
null pointer). This means that strchr returns the address of a constant
character as the address of a modifiable character. This allows accidents like
that in Example 5, which tries modifying name even though it's declared
constant. memchr, strpbrk, strrchr, and strstr share this problem.
Example 5: Using the C version of strchr, a program can accidentally alter a
const char.

 const char name [] = "Nancy";
 char c;

 ...
 *strchr (name, c) = tolower(c);

The C++ library declares all of these functions as an overloaded pair with
extern C++ linkage with self-consistent arguments and return types. For
example, the C++ library declares strchr as in Example 6.

Example 6: The C++ library declares strchr as shown here.

 extern "C++" const char *strchr(const char *s, int c);
 extern "C++" char *strchr( char *s, int c);



Minor Extensions


WG21+X3J16 has added a variety of minor extensions to C++. The details are
rather long, so here's a quick summary:
The Working Paper relaxes the restriction that the return type of a virtual
function in a derived class must be the same as the return type of the
base-class function that it overrides.
You can now overload operators on enumerations.
wchar_t is now a reserved word. It represents a distinct integral type for
overloading purposes.


So When is it Coming?


The C++ standard is shaping up nicely, but it's still years away. I hesitate
to pick a year.
If you would like to join X3J16, contact Stephen D. Clamage, Vice-Chair,
TauMetric Corp., 8765 Fletcher Pkwy., Suite 301, La Mesa, CA 91942
(619-697-7607 or steve@taumet.com); or, for frequent reports on the standard,
refer to my column in The C++ Report.


References


American National Standard X3.159-1989--Programming Language C.
Ellis, Margaret A. and Bjarne Stroustrup. The Annotated C++ Reference Manual.
Reading, MA: Addison-Wesley, 1990.
Unix System V AT&T C++ Language System Release 2.1 Product Reference Manual,
Select Code 307-159.


Who's Standardizing C++


ANSI is the American National Standards Institute, a trade association that
sets industrial standards for a wide range of products, such as bar codes,
bicycle helmets, heating and air-conditioning equipment, and plywood. ANSI is
not a government agency and its standards are not binding by law, unless, of
course, a governmental agency adopts an ANSI standard for regulatory or
procurement purposes.
ANSI doesn't actually write standards; it establishes procedures for writing
and approving standards, and then delegates most of the work to
industry-specific standards committees. X3 is the ANSI-accredited committee
that administers standards development for information processing systems.
CBEMA (the Computer and Business Equipment Manufacturer's Association) funds
and staffs X3's offices at its Washington, DC headquarters. X3 chartered X3J16
to develop an ANSI standard for C++.
ANSI and many other national standards bodies are members of ISO (the
International Standards Organization). ISO works jointly with yet another
standards group, IEC (the International Electrotechnical Commission). ISO and
IEC formed a joint technical committee, JTC1, to standardize information
technology. JTC1's subcommittee SC22 oversees the development of international
programming-language standards. SC22 created WG21 to develop the international
C++ standard.
X3J16's officers are:
Dmitry Lenkov (Hewlett-Packard), chair.
Stephen D. Clamage (TauMetric), vice-chair.
Dan Saks (Saks & Associates), secretary.
Jonathan Shopiro (AT&T/USL), project editor.
Thomas Plum (Plum Hall), international representative (representing ANSI at
WG21 meetings).
Steve Carter (Bellcore), WG21 convenor.
Bjarne Stroustrup, inventor of C++, is an alternate representative of AT&T.
WG21+X3J16 does most of its technical work through working groups that analyze
technical issues and make recommendations for resolving them. The working
groups are:
C Compatibility. Identifies conflicts between Standard C and C++, and attempts
to distinguish those conflicts that are necessary and must be documented from
those that might be eliminated.
Core Language. Studies problems in the syntax and semantics of the C++
language itself.
Environments. Considers issues in specifying the translation and execution
environments for C++ programs, such as program startup and termination,
translation limits, and hosted vs. freestanding implementations.
Extensions. Evaluates proposals for extensions to C++.
Formal Syntax. Identifies ambiguities, inconsistencies, and errors in the
grammatical specification of the language.
Libraries. Drafts proposed specifications for components in the standard C++
library. (The C++ PRM contains no library specification.)
--D.S.









Special Issue, 1992
A C++ BEAUTIFIER


From C++ to C and back




Tim Maher


Dr. Maher is the founder and head of Consultix, which specializes in UNIX and
C training. He can be contacted at P.O. Box 70563, Seattle, WA 98107-0563, or
via UUCP {uunet, uw-beaver}!timji!tim, CompuServe 72460,3050, Internet
tim@timji.celestial.com, or ATTMail as tmaher.


UNIX tool developers have been slow to provide "beautifiers" for C++.
Consequently, some programmers have resorted to posting C++ reworkings of the
BSD indent utility in the Usenet news, violating federal copyright law in the
process.
My alternative approach to writing a beautifier was to apply the UNIX "filter"
model to the problem. As Figure 1 illustrates, this entailed using a
preprocessor to disguise C++ as C, using standard C tools to effect
beautification, and then using a post-processor to convert the disguised C++
back to its original form. In this article, I'll describe the programs and the
testing procedure, and show sample inputs and outputs.
Listing One (page 27) shows the Bourne-shell driver program, which is linked
to the names c++cb and c++indent. It is used like cb or indent, according to
its invocation name. The case construct starting on line 6 determines which
name was used. If it was c++cb, the appropriate argument parsing is performed
in lines 8-14. Line 16 composes the command that will eventually beautify the
preprocessed text and place it into the file named in $BEAUT.
If the program is invoked as c++indent, arguments are parsed (lines 18-23)
according to the invocation format shown on the 4.2 BSD man page for indent:
input file, optional output file, and other processing options. Line 24
composes the appropriate beautification command for this beautifier. Note that
most keywords that occur in C++ but not in C are listed in connection with
typedef (-T) options, to inform indent of words requiring special treatment.
The keyword operator is purposely omitted because it appears only as a
function-name prefix, and therefore has no special formatting implications.
Line 29 uses a Bourne-shell "special substitution" to assign a default
"symbol" to the environment variable CppSYM, if it is not inherited with a
non-null value from the invoking shell. The code in lines 30 and 31 causes an
error message and exit if the text to be beautified happens to already contain
this symbol, which is reserved for selective introduction by the preprocessor,
c++encode. If the program exits here, you should set the environment variable
CppSYM to some other valid C identifier at the invoking shell level, and try
again.
The preprocessor, c++encode, is called on line 32, followed by the invocation
of the beautification command composed previously. The eval prefix is needed
to allow the shell to properly recognize metacharacters that appear after
substitution for the variable BEAUTIFY has been performed. The disguised code
of the original C++ program is then reconstituted in lines 35-36 by a sed
command that reverses the effects of the preprocessor. It does this by
removing the C-comment "wrappers" that encapsulate inline comments and
reconverting the various occurrences of CppSYM to the colon expressions they
represent.
Listing Two (page 27) shows the preprocessor program, c++encode.c, prototyped
in nawk but converted to C for a 40-fold speed improvement. This program does
some elementary parsing of the C++ program, and then selective encoding of C++
sequences that were empirically determined to generate syntax errors after C
beautification. The parsing step is required to allow discrimination between
the C++ sequences subject to beautification (which need C-language encoding)
and literal sequences that occur in quoted strings and comments. (Early
experience showed that syntax errors could be introduced during beautification
if C++ sequences in quotes or comments were encoded.)
The for loop starting in line 38 examines each character in the current input
line according to the current mode (that is, inside a comment, inside a quoted
string), and the code in lines 42-63 detects the onset and termination of
these special modes. In some cases, special output sequences are generated to
disguise the input. For instance, if the onset of an inline comment is
detected (line 51), the program generates the initiating sequence for a C
comment (line 53). In line 68, strings generated in this manner are sent to
the output; then continue is executed to initiate the processing of the next
input character.
Processing of unquoted and uncommented text is undertaken in lines 70-82. The
C++ scope qualifier (::) is replaced in lines 71-72 by the symbol set in the
invoking shell script, and the colon used to express derivations is replaced
by a slightly different string based on that symbol (lines 73-81), so that the
post-processor can differentiate between these cases. In lines 84-85, any
generated text is sent to the output, or else the literal input text is sent
out. The C-comment terminator for an encapsulated inline comment is provided
in lines 87-88. The terminating sequence is preceded by a hyphen to flag it as
eligible for post-processing reconversion to the inline form. At the end of
the input file, the program will issue an error message (lines 91-95) if any
special mode (single-quotes, for instance) is still in effect.
Example 1(a) shows a sample fragment of (nonsense) C++ code, in dire need of
beautification; Example 1(b) shows what can happen if the C-language
beautifier cb is directly used to reformat C++ code. What was formerly an
inline comment in Example 1(a), line 5, is now three lines of syntactic
nonsense (lines 7-9); what was a pair of double-colons is now a pair of
single-colons on different lines (14-15), and all other inline comments (lines
19, 22) are ruined by the imposition of a space between their slashes. The
havoc wreaked on the last line, an inline comment containing special
characters, adds insult to injury.
Example 1: Effects of processing C++ code by cb, c++encode, c++cb, and
c++indent: (a) Syntactically correct but ugly C++ code fragment; (b) C++ code
after processing by cb -s; (c) C++ code after processing by c++encode; (d) C++
code after processing by c++cb; (e) C++ code after processing by c++cb -s; (f)
C++ code after processing by c++indent.

 (a)

 $ cat sample.cc

 1 class C : virtual public F { public:
 2 C(int x):F(x, 100u) { } // note; arg 2: unsigned
 3 C func(double* = 0): /* defined below */
 4 };
 5 C C::func(double*x) { if (x) // commentary
 6 { } // inline comment with funny stuff- // : ::
 7 }

 (b)

 $ cb sample.cc

 1 class C :
 2 virtual public F {
 3 public:
 4 C(int x):
 5 F(x, 100u) {
 6 }
 7 // note;
 8 arg 2:
 9 unsigned
 10 C func(double * = 0); /* defined below */
 11 };
 12
 13
 14 C C:
 15 :

 16 func(double*x)
 17 {
 18 if (x)
 19 / / commentary
 20 {
 21 }
 22 / / inline comment with funny stuff - / / :
 23 :
 24 :
 25 }
 26
 27

 (c)

 $ c++encode sample.cc

 1 class C__42 virtual public F { public:
 2 C(int x):F(x, 100u) { } /*- note; arg 2: unsigned -*/
 3 C func(double* = 0): /* defined below */
 4 }:
 5 C C 42func(double*x) { if (x) /*- commentary -*/
 6 { } /*- inline comment with funny stuff- // : :: -*/
 7 }

 (d)

 $ c++cb sample.cc

 1 class C : virtual public F {
 2 public:
 3 C(int x):
 4 F(x, 100u) {
 5 } // note: arg 2: unsigned
 6 C func(double* = 0); /* defined below */
 7 }:
 8 C C::func(double*x) {
 9 if (x) // commentary
 10 {
 11 } // inline comment with funny stuff- // : ::
 12 }

 (e)

 $ c++cb -s sample.cc

 1 class C : virtual public F {
 2 public:
 3 C(int x):
 4 F(x, 100u) {
 5 } // note; arg 2: unsigned
 6 C func(double * = 0); /* defined below */
 7 };
 8
 9
 10 C C::func(double*x)
 11 {
 12 if (x) /*- commentary -*/ {
 13 } // inline comment with funny stuff- // : ::

 14 }
 15
 16

 (f)

 $ c++indent sample.cc; cat sample.cc

 1 class C : virtual public F {
 2 public:
 3 C(int x):F(x, 100 u) {
 4 } // note; arg 2: unsigned
 5 C func(double *= 0); /* defined below */
 6 }:
 7 C
 8 C::func(double *x)
 9 {
 10 if (x) { // commentary
 11 } // inline comment with funny stuff- // : ::
 12 }

Example 1(c) shows that c++encode replaced C: virtual by C__42_virtual (line
1), thereby disguising a uniquely C++ sequence as a valid C identifier. In
like fashion, the sequence C::func was disguised as C_42func (line 5). Note
that the colon in line 2 was not replaced because it was not surrounded by
spaces. (To give you control over the results of beautification, I decided
that stand-alone colons would be encoded, rendering them invisible to the
beautifier, whereas embedded colons would be subject to beautifier
manipulations.) The only other change is that all inline comments were
converted to C comments. The hyphen following the C-comment initiator (/*)
warns indent not to monkey with the enclosed text; the hyphen preceding the
comment terminator tells the post-processor to consider reconverting it to
inline form.
Example 1(d) shows the effect of running c++cb on the code sample of Example
1(a), which preprocesses using c++encode, beautifies using cb, and then
post-processes to reinstate the disguised C++ code using sed (lines 35-36,
Listing One). The result is an attractive and syntactically correct C++
program. (If the line-split introduced after the colon of line 3 is not
desired, you can put spaces around those colons to prevent splitting, as in
line 1.)
Example 1(e) shows the results of running c++cb-s on this code. They are
significantly different from those shown in Example l(d) only in line 12. The
beautifier chose to place program code (a "{" symbol) behind the C comment on
this line, which had been converted from an inline comment by the
preprocessor. Because reconverting such a comment to the inline form would
effectively comment out the following program code, the post-processing stage
(lines 35-36, Listing One) uses the simple strategy of only reconverting
inlined comments that end at the line's end. Example 1(f) shows the results of
running c++indent on this code. They are cosmetically quite different from the
results produced by c++cb. Unfortunately, there are also material
differences--syntax errors were introduced on lines 3 and 5 (discussed
presently).


Testing


I tested both c++cb and c++indent under SunOS 4.1.1b using the NIH Class
Library--367 files containing over 55,000 lines of code. (c++cb was also
tested under System V R3.2, which lacks indent.) The c++cb program was run
both with and without the strict processing option (-s), and c++indent was run
without any added options.
Although most of the files were successfully beautified without trouble, some
problems did arise. Specifically, in one case c++cb without options separated
the components of the C++ insertion operator ("<<"). Manual patching was
required to correct the problem. For its part, indent (invoked by c++indent)
always inserted a space between the components of new-style constants; that
is, 100u became 100 u, as in Example 1(f), line 3.
Another problem was that in one case a pointer operator was erroneously
attached to a following equal sign, as in Example 1(f) line 5. This problem
can easily be avoided through routine use of dummy variable names--changing
double* = 0 to double *x = O. Manual patching was required to permit
compilation.
For the 55,000 lines comprising the NIH Class Library, the manual
post-beautification cleanup process amounted to two lines for c++indent, and
one line for c++cb. Thus, although the technique is not perfect, good results
can be achieved with typical code, and the mistakes that do occur are easily
identified and corrected.


Conclusion


Some final notes on the use of c++indent are in order. First of all, it is no
accident that beautification via indent produced both the most attractive and
the most troublesome results. It is precisely because indent takes so many
liberties in reformatting the (disguised) code for maximum visual appeal that
it sometimes introduces errors. Nonetheless, if my experience with the NIH
Class Library is representative, and I think it is, problems with c++indent
are likely to be few and far between.
Secondly, certain options available in some versions of indent must be
avoided. In particular, the -troff option should not be used, because it is
fundamentally incompatible with the decoding technique; as an alternative,
beautify first, and run vgrind afterwards. The -st option ("standard output")
must also be avoided, because it is incompatible with the design of c++indent.
Finally, because no attempt was made to test all of the myriad processing
options of indent, c++indent users should be wary of unexpected results when
straying from the defaults.
_A C++ BEAUTIFIER_
by Tim Maher


[LISTING 1]

 1 :
 2 # @(#) c++cb,c++indent - driver program for C++ beautification
 3 # Tim Maher, CONSULTIX, 11/9/91. (206) 781-UNIX
 4 #
 5 ENCODED=/tmp/c++encode_$$ BEAUT=/tmp/beaut_$$ # temp files
 6 case "$0" in # use cb or indent, depending on invocation name
 7 *c++cb) OUT="" # for cb, all arguments are optional
 8 while test $# -gt 0 # separate options from filenames
 9 do case "$1" in # "l" option takes argument
 10 -[!l]) OPTS="$OPTS $1"; shift ;;
 11 -l) OPTS="$OPTS $1 $2"; shift 2 ;;
 12 *) break # end of options

 13 esac
 14 done # if no filename, copy stdin, and set filename arg
 15 test -z "$*" && cat > /tmp/c++cb_$$ && set /tmp/c++cb_$$
 16 BEAUTIFY="cb $OPTS $ENCODED > $BEAUT"; INPUT=$* ;;
 17 *c++indent) # not a filter; needs filename arg
 18 INPUT=${1:?"Usage: $0 infile [outfile] [flags]"}; shift
 19 case "$1" in # next arg would be output filename or flag
 20 "") ;; # no second arg is okay too
 21 [!-]*) OUT=$1; shift # set output filename
 22 esac
 23 OPTS="$*"; : ${OUT:=$INPUT} # if no outfile, use input name
 24 BEAUTIFY="indent $ENCODED $BEAUT $OPTS -Tasm -Tbool -Tcatch
 25 -Tclass -Tconst -Tdelete -Tdo -Tfriend -Tinline -Tnew
 26 -Tprivate -Tprotected -Tpublic -Tsigned -Ttemplate -Tthrow
 27 -Ttry -Tvirtual -Tvolatile" # -Toperator omitted
 28 esac
 29 : ${CppSYM:=_42}; export CppSYM; # set up disguising string
 30 name=`grep -l "$CppSYM" $INPUT` && # exit if symbol in input
 31 { echo Error- \"$CppSYM\" appears in $name >&2; exit 100; }
 32 c++encode $INPUT > $ENCODED exit 100
 33 if eval $BEAUTIFY # beautify, leaving output in $BEAUT
 34 then trap "" 2 3 15; # reconstruct C++ after beautification
 35 sed -e 's+/\*-\(.*\) -\*/$+//\1+' -e "s/_${CppSYM}_/ : /g" \
 36 -e "s/$CppSYM/::/g" < $BEAUT > $OUT
 37 else echo "$0: error code $? from $BEAUTIFY" >&2; exit 200
 38 fi
 39 rm -f /tmp/*_$$; # clean-up temp files



[LISTING 2]

 1 /* @(#) c++encode.c: Tim Maher, 11/9/91, tim@Timji.Celestial.Com
 2 C++ syntax disguiser, allowing beautification via cb, indent */
 3 #include <stdio.h>
 4 #include <string.h> /* Use this line for AT&T systems */
 5 /* #include <strings.h> /* Use this line for BSD systems */
 6 char *getenv();
 7 #define MAX 512 /* Maximum length for C++ input line */
 8 #define F 0 /* FALSE */
 9 #define T 1 /* TRUE */
 10
 11 main (argc, argv) int argc; char *argv[]; {
 12 int filenum, linenum, cnum, modes, exit();
 13 int sq, dq, cc, ic, ch, nx, max, knt;
 14 char line[MAX], out[10], *sym;
 15 FILE *handle;
 16
 17 if (argc < 2) {
 18 fprintf(stderr, "Usage: %s file1 [file2 . . .]\n",
 19 argv[0]); exit(1);
 20 }
 21 /* use default "sym" string if none supplied */
 22 if ((sym=getenv("CppSYM")) == (char *)0) sym="_42";
 23 for (filenum = 1; filenum < argc; filenum++) {
 24 handle = (filenum == 1 ? fopen(argv[filenum], "r") :
 25 freopen(argv[filenum], "r", handle));
 26 if (handle == (FILE *)0) {
 27 fprintf(stderr, "%s: fopen() error\n",argv[0]);

 28 exit (50);
 29 }
 30 linenum = 0; sq = dq = cc = ic = F;
 31 while (fgets(line, MAX, handle) != (char *) 0) {
 32 max=strlen(line)-2; /* ignore NL; 0-based index */
 33 if (max == MAX - 3) {
 34 fprintf(stderr,"%s: increase MAX\n", argv[0]);
 35 exit(100);
 36 }
 37 linenum++;
 38 for (cnum = 0; cnum <= max; cnum++) {
 39 ch = line[cnum];
 40 sprintf(out,"%c",ch); /* default out = ch */
 41 nx = (cnum == max) ? '\0' : line[cnum+1];
 42 if (cc) { /* in C comment mode */
 43 if ('*' == ch && nx == '/') {
 44 /* C comment ends */
 45 cc = F; strcpy(out, "*/"); cnum++;
 46 }
 47 } else if ('/' == ch && nx == '*' && !(sqdq)) {
 48 /* C comment starts */
 49 cc = T; strcpy(out, "/*"); cnum++;
 50 } else if (ic) ; /* in inline comment */
 51 else if ('/' == ch && nx == '/' && !sq && !dq) {
 52 /* inline comment starts */
 53 ic = T; sprintf(out, "/*-"); cnum++;
 54 } else if ('\\' == ch) { /* quote by BS */
 55 if (cnum != max) { /* next char is quoted */
 56 sprintf(out, "%c%c", ch, nx); cnum++;
 57 }
 58 } else if (dq) { /* in double-quotes */
 59 if ('"' == ch) dq = F; /* DQ string ends */
 60 } else if (sq) { /* in single-quotes */
 61 if (ch == '\'') sq = F; /* SQ string ends */
 62 } else if ('\'' == ch) sq = T; /* SQ starts */
 63 else if ('"' == ch) dq = T; /* DQ starts */
 64 else strcpy(out,""); /* no output created */
 65 /* OUTPUT OF QUOTED AND COMMENTED TEXT */
 66 if (strcmp(out,"") != 0) { /* non-null string */
 67 /* print and then process next char */
 68 printf("%s", out); continue;
 69 } /* PROCESS UNQUOTED AND UNCOMMENTED TEXT */
 70 /* process scope qualifier; :: -> sym */
 71 if (ch == ':' && nx == ':') {
 72 strcpy(out, sym); cnum++;
 73 } else if (ch == ' ' && nx == ':') {
 74 /* handle derivation; SP:SP OR SP:NL */
 75 if (cnum == max) { /* line ends with : */
 76 sprintf(out, "%s%s%s", "_", sym, "_");
 77 cnum += 1; /* SP:NL -> _sym_NL */
 78 } else if (line[cnum+2] == ' ') {
 79 sprintf(out, "%s%s%s", "_", sym, "_");
 80 cnum += 2; /* SP:SP -> _sym_ */
 81 }
 82 } /* FOLLOWING SECTION DOES ALL NORMAL OUTPUT */
 83 /* print prepared string, or literal input ch */
 84 if (strcmp(out, "") != 0) printf("%s", out);
 85 else printf("%c", ch);
 86 } /* END OF FOR-CNUM LOOP; FINISHED WITH LINE */

 87 if (ic) { /* inlines end with EOL, so turn off */
 88 ic = F; printf(" -*/\n");
 89 } else printf("\n"); /* NL to end this line */
 90 } /* END OF WHILE LOOP */
 91 if (sq dq ic cc) {
 92 fprintf(stderr,"%s: %s%s; sq=%d dq=%d ic=%d cc=%d\n",
 93 argv[0], "ERROR- altered mode at EOF for file ",
 94 argv[filenum], sq,dq,ic,cc); exit (200);
 95 }
 96 } /* END OF FOR FILENUM LOOP */
 97 exit (0);
 98 }


Example 1:

(a)
$ cat sample.cc

 1 class C : virtual public F { public:
 2 C(int x):F(x, 100u) { } // note; arg 2: unsigned
 3 C func(double* = 0); /* defined below */
 4 };
 5 C C::func(double*x) { if (x) // commentary
 6 { } // inline comment with funny stuff- // : ::
 7 }

(b)
$ cb sample.cc

 1 class C :
 2 virtual public F {
 3 public:
 4 C(int x):
 5 F(x, 100u) {
 6 }
 7 / / note;
 8 arg 2:
 9 unsigned
 10 C func(double * = 0); /* defined below */
 11 };
 12
 13
 14 C C:
 15 :
 16 func(double*x)
 17 {
 18 if (x)
 19 / / commentary
 20 {
 21 }
 22 / / inline comment with funny stuff - / / :
 23 :
 24 :
 25 }
 26
 27

(c)

$ c++encode sample.cc

 1 class C__42_virtual public F { public:
 2 C(int x):F(x, 100u) { } /*- note; arg 2: unsigned -*/
 3 C func(double* = 0); /* defined below */
 4 };
 5 C C_42func(double*x) { if (x) /*- commentary -*/
 6 { } /*- inline comment with funny stuff- // : :: -*/
 7 }


(d)
$ c++cb sample.cc

 1 class C : virtual public F {
 2 public:
 3 C(int x):
 4 F(x, 100u) {
 5 } // note; arg 2: unsigned
 6 C func(double* = 0); /* defined below */
 7 };
 8 C C::func(double*x) {

 9 if (x) // commentary
 10 {
 11 } // inline comment with funny stuff- // : ::
 12 }

(e)
$ c++cb -s sample.cc

 1 class C : virtual public F {
 2 public:
 3 C(int x):
 4 F(x, 100u) {
 5 } // note; arg 2: unsigned
 6 C func(double * = 0); /* defined below */
 7 };
 8
 9
 10 C C::func(double*x)
 11 {
 12 if (x) /*- commentary -*/ {
 13 } // inline comment with funny stuff- // : ::
 14 }
 15
 16

(f)
$ c++indent sample.cc; cat sample.cc

 1 class C : virtual public F {
 2 public:
 3 C(int x):F(x, 100 u) {
 4 } // note; arg 2: unsigned
 5 C func(double *= 0); /* defined below */
 6 };
 7 C
 8 C::func(double *x)

 9 {
 10 if (x) { // commentary
 11 } // inline comment with funny stuff- // : ::
 12 }


























































Special Issue, 1992
TEMPLATES IN C++


Function and class templates are powerful features




Nicholas Wilt


Nicholas is a software engineer working in the Boston area. His interests
include computer graphics, C++, and assembler programming. He can be reached
through the DDJ offices.


Templates are one of C++'s most powerful features in that they allow you to
define the "shape" of a function or class and leave the implementation
specifics to the compiler.
Function templates can be used to describe algorithms defined on a wide
variety of types. By using a function template to define a sorting algorithm,
for example, the type to sort is left as a parameter. Then, whenever the
template is invoked, a sort function for that type is generated by the
compiler.
Class templates can be used to define a class's structure in terms of the
template arguments. For every different set of template arguments given to a
class template, the compiler creates a new class. This is an especially useful
feature for container classes; when you define a class template for
BinaryTree, for example, it becomes easy to declare BinaryTree of int, float,
or a user-defined type.
Compared to the way C++ handles inheritance and polymorphism, templates are
relatively simple, especially once you come to grips with their syntax. The
examples in this article were developed with Borland C++ 3.0, but with a
minimum of effort they should work with other C++ implementations that support
templates.


Function Templates


Selection, like sorting, is defined on all ordered types. Selection takes an
array of N elements and an index i, and returns the ith element in the sorted
order of the array. To compute the median of an N-element input array, for
example, you select the element represented by N/2. The minimum is selection
of the 0th element; the maximum is selection of the element represented by (N
- 1). (Note that arrays are numbered from 0.)
Given the definition of the selection problem and its relation to the sorted
order of the array, programmers often make the mistake of performing selection
by sorting the array, then returning the ith element. This technique works,
but it does more work than needed. It is more efficient to repeatedly
partition the array and consider only the partition that contains the ith
element. A good discussion of a randomized, expected linear time-selection
algorithm is presented in Introduction to Algorithms by Thomas Cormen, Charles
Leiserson, and Ronald Rivest (MIT Press, 1990). This algorithm is usually more
efficient than one that sorts the input array.
In pretemplate C++, there were two (unattractive) alternatives for
implementing selection:
A generic implementation, in which a variety of types can be selected with a
single routine. The qsort() library routine in ANSI C takes this approach.
A variety of selection routines, one for each type you wish to select.
The generic approach is not type-safe; if you confuse integers with
floating-point numbers, the compiler is unable to warn you. It is also
inefficient, since a generic pointer to a function is typically called every
time two elements need to be compared.
The variety approach is equally unattractive because it involves code
duplication. Errors are likely to be introduced during code duplication, and
errors in the routine itself get propagated across all the routines.
Templates offer a solution that resolves both concerns. By allowing you to
write the function once in a general form, they enforce type safety and avoid
source-code duplication.


Syntax


To declare a function template, you write something like Example 1. Listing
One (page 32) shows an implementation of selection that uses templates. Our
template-based Select function is defined in Example 2(a). The resulting code
looks remarkably similar to the pseudocode found in textbooks on algorithms.
Select is a function that returns the ith element of an array of Ts. What are
Ts? Ts are anything that can be sorted! They are a parameter of the function
template.
Example 1: Declaring a function template.

 template<template parameters>
 return value // From here on we define
 name (function parameters) { // the function in terms of
 function body // the template parameters.
 }

Example 2: (a) Defining a template-based Select function; (b) calling the
Select function; (c) defining a template like this means it will not be
invoked for integers.

 (a)

 template<class T> T
 Select(T *base, int n, int i)
 {
 // Implementation
 }

 (b)


 int *arr, n;
 // ... allocate array and set n
 // Set penultimate to the second-largest element in arr.
 int penultimate = Select(arr, n, n - 2);

 (c)

 int
 Select(int *x, int n, int i)
 {
 // Own function definition of Select<int>
 }

To use Select, just call it with a pointer to an ordered type. To select the
second-largest element in an array of integers, for example, call Select, as
shown in Example 2(b). Since Select uses the < operator to determine how to
order the array, operator < must be defined for the class passed to Select.
Operator < is built in for primitive types such as ints and floats, so we did
not need to do anything special to select from an array of integers.
Calling the template-based Select function is like invoking a macro. Earlier,
we called Select with a pointer to int. This tells the compiler we need an
instance of Select with T set to int. As with a macro invocation, the compiler
replaces each mention of T (the name of the template parameter) with int. The
resulting instance of Select takes an int* parameter, declares local instances
of int, uses the integer < operator to compare elements in the input array,
and returns int.
If the user were to call Select with a float* parameter, the compiler would
create an instance of Select taking parameters (float*, const int, int) and
replace T with float everywhere in the function declaration. This function
instance is distinct from the one for integers; the entry point, parameters,
and code generated are completely separate. In the executable, the only thing
the two functions int Select(int*, const int, int) and float Select(float*,
const int, int) have in common is that they both perform selection.
What happens if you define your own Select function with the same parameters
as an instance of the Select template? If you were to define, say, Example
2(c), then the Select template would no longer be invoked for integers.
Instead, the user-defined Select function will be used.
Select can be used on arrays of user-defined classes (let's call them
MyClass), as well as on built-in types. Just pass a pointer to the array of
MyClass you want to select from, making sure operator < is defined for
MyClass.
Listing Two (page 32) illustrates the use of Select. It has an example of
selecting on arrays of a user-defined class, as well as the built-in int type.


Class Templates


You can specify templates for classes as well as functions. The most common
use for a class template is a container class such as a linked list or binary
heap. Let's take binary trees as an example. A binary tree can contain any
ordered type, so you can have binary trees of integers, floating-point
numbers, or other objects. In C++ without templates, you have two choices for
implementing a binary-tree class: Implement a generalized BinaryTree class
that takes a generic representation of the objects to manipulate, or implement
a specific BinaryTree class for each type of object you wish to put in a
binary tree.
These are the same problems faced by the selection example given earlier. A
method that combines the best of both worlds (without templates) is to
implement a generalized class, then derive "Binary tree of integer," "Binary
tree of float," and so on from the base binary-tree class and make sure the
member functions of the derived classes are type-safe.
Templates provide a better approach. By specifying the template for a
BinaryTree class, you can define the class in terms of the template
parameters. Thus, a parameterized binary tree-node class can be defined, as in
Example 3(a). Note that the above definition is not of a class. There is no
class called BinaryNode; BinaryNode is merely a template describing a family
of classes. This family of classes has names like BinaryNode<int>,
BinaryNode<float>, BinaryNode<UserDefinedType>, and so on, and every one of
them is distinct. That is why the left and right pointers in the class
definition in Example 3(a) are pointers to BinaryNode<T>, rather than pointers
to BinaryNode.
Example 3: (a) Defining a parameterized binary node class; (b) using classes
generated from templates as base classes; (c) a vector class might take two
parameters: the type to be a vector of and the number of elements in the
vector.

 (a)

 template<class T>
 class BinaryNode {
 T x; // Contents of node
 BinaryNode<T> *left, *right; // Left and right children
 //etc.
 };

 (b)

 // Class template for a binary tree node including a
 // pointer to the parent.
 template<class T>
 class BinaryNodeWithParent : public BinaryNode<T> {
 BinaryNodeWithParent<T> *parent;
 // etc.
 };

 (c)

 template<class T, int Size>
 class Vector {
 T x[Size]; // Vector contains Size T's.
 // etc.
 };

Classes generated from templates can be used as base classes; in fact, class
templates can inherit from other class templates. By extending the BinaryNode
example, we might have Example 3(b). Listing Three (page 32) gives a minimal
template-based BinaryTree class. The binary-tree data structure is described
in any good algorithms text; it supports a variety of operations on ordered
types in logarithmic time on average. Our BinaryTree class supports insertion,
traversal, and query. Although many other operations are defined on binary
trees, they are omitted for lack of space. It may be instructive to extend our
BinaryTree class to support other operations such as minimum, maximum, and
deletion.
Listing Four (page 33) is a typical C++ program that uses the BinaryTree
class. It simply insert ten random numbers in the range [0,99] into a
BinaryTree <int>, then displays the pre-, in-, and post-order traversal orders
of the tree.
Unfortunately, we have concentrated on the most common type of template, which
takes a single class parameter. There can be multiple template parameters, and
they need not be general classes. A vector class might take two parameters:
the type to be a vector of and the number of elements in the vector; see
Example 3(c).

The class template described here could reuse code for two-element vectors of
integer (perhaps for screen coordinates in a graphics program) and
three-element vectors of float (perhaps for world coordinates in a
three-dimensional sketching program).


Conclusion


Templates open up a whole new world for C++ programmers. They allow for
compact and efficient implementation of container classes and other
parameterized types. They also allow general, efficient implementation of
algorithms with a minimum of code duplication. Look for C++ class libraries to
change dramatically as templates become widely supported. Many of the current
approaches to flexibility are based on inheritance and have become outmoded
with the advent of templates.
_TEMPLATES IN C++_
by Nicholas Wilt

[LISTING ONE]
// select.h -- Template-based C++ implementation of the randomized
// selection algorithm from _Introduction to Algorithms_ by Cormen,
// Leiserson and Rivest (pg. 187). Copyright (C) 1992 by Nicholas Wilt.
// All rights reserved.

template<class T> T
Select(T *base, const int n, int inx)
{
 int left = 0;
 int right = n;

 // This while loop terminates when base[left..right]
 // defines a 1-element array.
 while (right != left + 1) {
 // Partition about the q'th element of the array
 int q = left + rand() % (right - left);

 T t = base[q];
 base[q] = base[left];
 base[left] = t;

 // Partition about base[left]; all elements less than
 // base[left] get swapped into the left-hand side of
 // the array.
 int i = left - 1;
 int j = right;
 while (j >= i) {
 while (j > 0 && t < base[--j]);
 while (i < (right-1) && base[++i] < t);
 if (i < j) {
 T t = base[i];
 base[i] = base[j];
 base[j] = t;
 }
 }
 // Now focus attention on the partition containing
 // the order statistic we're interested in.
 if (inx <= j - left)
 // Throw away the right-hand partition; we know it
 // doesn't contain the i'th order statistic.
 right = j + 1;
 else {
 // Throw away the left-hand partition; we know it
 // doesn't contain the i'th order statistic.
 // Now we're looking for the inx - j - left + 1'th
 // order statistic of the right-hand partition.
 inx -= j - left + 1;
 left = j + 1;

 }
 }
 // base[left] is the element we want, return it.
 return base[left];
}


[LISTING TWO]

// select.cpp -- Demonstrates the use of the Select function template on
// arrays of integers and Point2D's. It generates a random of array of
// integers, then enumerates the order statistics from 0..n-1. This will
// print out the members of the array in sorted order. This isn't a
// suggestion for how to use Select (obviously it's more efficient to sort
// the array if you want to print out the members in sorted order!) but it's
// good to enumerate them all to verify that algorithm is working properly.
// The other class the program deals with is a used-defined Point2D class.
// This is a two-dimensional point defined by two floating-point numbers.
// The ordering chosen, defined by the operator< for the class, is a
// standard topological ordering from computational geometry.
// Copyright (C) 1992 by Nicholas Wilt. All rights reserved.

#include <fstream.h>
#include <iomanip.h>
#include <stdlib.h>
#include "select.h"

// Just for fun, here's a signum template function. Returns -1 if a < b; 1 if
// b < a; 0 if they're equal. I've ensured only operator< is used for
// comparisons, so operator> isn't defined for a user-defined class.
template<class T> int
Signum(T a, T b)
{
 if (a < b) return -1;
 else if (b < a) return 1;
 else return 0;
}
class Point2D {
 double x, y;
public:
 Point2D() { }
 Point2D(double X, double Y): x(X), y(Y) { }

 // operator< makes our points sorted in topological order: increasing order
 // in x, and increasing order in y if there is a tie in x.
 friend int operator<(const Point2D& x, const Point2D& y) {
 int ret = Signum(x.x, y.x);
 return (ret) ? ret < 0 : Signum(x.y, y.y) < 0;
 }
 friend ostream& operator<< (ostream& a, Point2D& b) {
 a << "(" << b.x << ", " << b.y << ")";
 return a;
 }
};
// Number of elements to allocate in the test arrays
const int NumInts = 10;
const int NumPts = 5;
int
main()

{
 int *x = new int[NumInts];
 int i;
 for (i = 0; i < NumInts; i++) {
 x[i] = rand() % 100;
 cout << x[i] << ' ';
 }
 cout << '\n';
 for (i = 0; i < NumInts; i++) {
 cout << Select(x, NumInts, i) << ' ';
 }
 cout << '\n';
 delete x;
 Point2D *y = new Point2D[NumPts];
 cout << setprecision(2);
 for (i = 0; i < NumPts; i++) {
 y[i] = Point2D( (double) rand() * 10.0 / RAND_MAX,
 (double) rand() * 10.0 / RAND_MAX);
 cout << y[i] << ' ';
 }
 cout << '\n';
 for (i = 0; i < NumPts; i++) {
 cout << Select(y, NumPts, i) << ' ';
 }
 cout << '\n';
 return 0;
}


[LISTING THREE]

// bintree.h -- Header file for simple binary tree class template.
// Copyright (C) 1992 by Nicholas Wilt. All rights reserved.
// BinaryNode is a helper class template for the BinaryTree class template. It
// should be needed only rarely by the end user of the BinaryTree class, if
// ever. BinaryTree<T> is BinaryNode<T>'s friend. The only other classes that
// can access BinaryNode's data are classes for which it is a base class.
template<class T>
class BinaryNode {
protected:
 T x;
 BinaryNode<T> *left, *right;
public:
 BinaryNode(const T& X): x(X) {
 left = right = 0;
 }
 BinaryNode(const T& X, BinaryNode<T> *l, BinaryNode<T> *r): x(X) {
 left = l;
 right = r;
 }
 friend class BinaryTree<T>;
}
// BinaryTree manipulates binary trees described by pointers to BinaryNode.
template<class T>
class BinaryTree {
protected:
// Pointer to the root node of the binary tree.
 BinaryNode<T> *root;
// Various functions to manipulate collections of binary

// tree nodes. These are the functions that do the real work.
 static BinaryNode<T> *InsertNode(BinaryNode<T> *r, T x);
 static BinaryNode<T> *DupNode(BinaryNode<T> *r);
 static void PostOrderDeletion(BinaryNode<T> *r);
 static T *QueryNode(BinaryNode<T> *r, const T& x);
 static void PreOrderNode(BinaryNode<T> *r, void (*f)(T *));
 static void InOrderNode(BinaryNode<T> *r, void (*f)(T *));
 static void PostOrderNode(BinaryNode<T> *r, void (*f)(T *));
public:
 // Constructors and assignment operator
 BinaryTree() { root = 0; }
 // Copy constructor duplicates all the nodes in the tree.
 BinaryTree(BinaryTree<T>& x) { root = DupNode(x.root); }
 // Assignment operator deletes nodes already there, then
 // copies the ones in the tree to be copied.
 BinaryTree<T>& operator=(const BinaryTree<T>& x) {
 PostOrderDeletion(root);
 root = DupNode(x.root);
 return *this;
 }
 // Destructor frees all the nodes in the binary tree.
 ~BinaryTree() { PostOrderDeletion(root); }
 // Inserts a node containing x into the binary tree.
 void Insert(const T& x) { root = InsertNode(root, x); }
 // Returns a pointer to node equal to x, if in tree.
 // If none found Query returns 0.
 T *Query(const T& x) { return QueryNode(root, x); }

 // Various traversal functions perform the traversal
 // in the order given and call f with a pointer to the
 // node contents when visiting.
 void PreOrder(void (*f)(T *)) { PreOrderNode(root, f); }
 void InOrder(void (*f)(T *)) { InOrderNode(root, f); }
 void PostOrder(void (*f)(T *)) { PostOrderNode(root, f); }
};
// The following function declarations give examples of how to
// write function templates for member functions.
// Deletes the tree pointed to by r.
template<class T> void
BinaryTree<T>::PostOrderDeletion(BinaryNode<T> *r)
{
 if (r) {
 PostOrderDeletion(r->left);
 PostOrderDeletion(r->right);
 delete r;
 }
}
// Inserts a node with key x into the binary tree with the
// root given, and returns the new root.
template<class T> BinaryNode<T> *
BinaryTree<T>::InsertNode(BinaryNode<T> *r, T x)
{
 if (r) {
 if (x < r->x)
 r->left = InsertNode(r->left, x);
 else
 r->right = InsertNode(r->right, x);
 }
 else

 r = new BinaryNode<T>(x);
 return r;
}
// Duplicates the binary tree given and returns pointer to the new root.
template<class T> BinaryNode<T> *
BinaryTree<T>::DupNode(BinaryNode<T> *r)
{
 if (r)
 return new BinaryNode<T>(r->x,
 DupNode(r->left),
 DupNode(r->right));
 else
 return 0;
}
// Returns pointer to key given, if found in binary tree. Otherwise returns 0.
template<class T> T *
BinaryTree<T>::QueryNode(BinaryNode<T> *r, const T& x)
{
 if (r) {
 if (x == r->x)
 return &r->x;
 if (x < r->x) return QueryNode(r->left, x);
 else return QueryNode(r->right, x);
 }
 else
 return 0;
}
// Traversal functions. These three functions traverse the tree and call the
// pointer to function on the node contents when it's time to visit the node.
template<class T> void
BinaryTree<T>::PreOrderNode(BinaryNode<T> *r, void (*f)(T *))
{
 if (r) {
 (*f)(&r->x);
 PreOrderNode(r->left, f);
 PreOrderNode(r->right, f);
 }
}
template<class T> void
BinaryTree<T>::InOrderNode(BinaryNode<T> *r, void (*f)(T *))
{
 if (r) {
 InOrderNode(r->left, f);
 (*f)(&r->x);
 InOrderNode(r->right, f);
 }
}
template<class T> void
BinaryTree<T>::PostOrderNode(BinaryNode<T> *r, void (*f)(T *))
{
 if (r) {
 PostOrderNode(r->left, f);
 PostOrderNode(r->right, f);
 (*f)(&r->x);
 }
}





[LISTING FOUR]

// bintree.cpp -- Demonstration program for BinaryTree class template. Inserts
// 10 random numbers into a BinaryTree<int>, prints them out and then prints
// out the pre-, in- and post-order traversals of the resulting binary tree.
// Copyright (C) 1992 by Nicholas Wilt. All rights reserved.

#include <iostream.h>
#include <stdlib.h>
#include <alloc.h>

#include "bintree.h"

// Passed to BinaryTree<int> traversal functions. This gets called with a
// pointer to the node contents (int in this case, since it's a
// BinaryTree<int>) whenever it's time to visit a node in the tree.
void
PrintInt(int *x)
{
 cout << *x << ' ';
}
int
main()
{
 int i;
 BinaryTree<int> tree;
 cout << "Insertion:\t";
 for (i = 0; i < 10; i++) {
 int insertme = rand() % 100;
 tree.Insert(insertme);
 cout << insertme << ' ';
 }
 cout << '\n';
 cout << "Pre-order:\t"; tree.PreOrder(PrintInt); cout << '\n';
 cout << "In-order:\t"; tree.InOrder(PrintInt); cout << '\n';
 cout << "Post-order:\t"; tree.PostOrder(PrintInt); cout << '\n';
 return 0;
}


Example 1: Declaring a function template

template<template parameters>
return value // From here on we define
name(function parameters) { // the function in terms of
 function body // the template parameters.
}




Example 2: (a) defining a template-based Select function; (b)
calling the Select function

(a)

template<class T> T
Select(T *base, const int n, int i)

{
 // Implementation
}


(b)

 int *arr, n;
 // ... allocate array and set n
 // Set penultimate to the second-largest element in arr.
 int penultimate = Select(arr, n, n - 2);

(c)

int
Select(int *x, const int n, int i)
{
 // Own function definition of Select<int>
}



Example 3

(a)

template<class T>
class BinaryNode {
 T x; // Contents of node
 BinaryNode<T> *left, *right; // Left and right children
//etc.
};


(b)

// Class template for a binary tree node including a
// pointer to the parent.
template<class T>
class BinaryNodeWithParent : public BinaryNode<T> {
 BinaryNodeWithParent<T> *parent;
// etc.
};


(c)

template<class T, int Size>
class Vector {
 T x[Size]; // Vector contains Size T's.
// etc.
};










Special Issue, 1992
TOWARD A LESS OBJECT-ORIENTED VIEW OF C++


C++'s modular nature makes it a weak object language, but a strong
general-purpose one




Harris Shiffman


Hank is a technical specialist at SunPro, a division of Sun Microsystems, and
can be contacted at shiffman@eng.sun.com.


The last couple of years have seen a growing wave of enthusiasm for
object-oriented approaches to requirements analysis, application design, and
programming. This same period has been marked by the increasing popularity of
the C++ language and its acceptance as a logical successor to C. Since C++ was
designed to support object-oriented development, it seems only natural to see
a strong link between C++ and OOP. Developers interested in an object-oriented
language may look no further than C++, while programmers who move to C++ will
of course adopt an object-oriented style of programming.
Such a view is seriously mistaken. Anyone who associates C++ with OOP and OOP
with C++ misses two very important points: first, that C++ is not the best
language for object-oriented programming by a wide variety of measures, and
second,that it is possible to get tremendous advantage from C++ without using
object techniques.


OOP Languages: Same Approach to Different Goals


C++ is not a great object-oriented language, as any devotee of Smalltalk or
Lisp will be glad to tell you. These other languages are easier both to learn
and to use than C++. They make it possible to write better structured, more
comprehensible, and more maintainable programs. Class reuse between
applications, still a rarity in C++, is the rule in these languages.
Actually, the difference between C++ and these other languages is not (or at
least not only) one of quality. The larger difference is one of intent.
Although C++'s facilities are based on the same concepts as other
object-oriented programming languages, it uses them to achieve very different
goals. C++ uses objects to address the complexity problems inherent in large C
applications written by teams of programmers. Smalltalk and Lisp use objects
to make individual developers more productive. These highly interactive
languages offer great value in exploratory programming and prototyping.
In its simplest form, OOP simply states that data structures should be treated
more like objects in the real world. A data structure is much like a piece of
paper. Our programs write notes on the paper, read the notes back, erase them,
and write other notes.
Throughout this process there is a hidden assumption that the note will mean
the same thing to its reader that it did to the writer. When different parts
of a program interpret things differently we have problems. Frequently, these
problems show up when different developers on the same project attempt to
integrate their efforts.
By contrast, OOP associates a data structure with the set of operations which
act upon it. Once these operations or methods (the "interface" to the object)
have been defined, everyone is expected to use them to communicate with the
object. This offers important benefits:
Users of objects need not know much about their internal workings. This
reduces the number of complex relationships between data that they need to
manage and leaves them free to concentrate on the problem they are trying to
solve.
Should the implementation of an object need to change, the scope of that
change can be reduced. As long as the external view of the object and its
interface remain the same, code which uses the object will be isolated from
the change.
This is the data-abstraction concept, as supported by languages like Ada. OOP
goes one step further by providing an inheritance mechanism. New classes of
objects can be defined in terms of existing object classes. A class inherits
all of the structure and behavior of its ancestor classes. It may choose to
define additional structure and behavior of its own. It can also override any
behavior inherited from any of its ancestors by defining a replacement version
of the inherited operation.
At this point, object-oriented languages head off in a number of different
directions. A few stop here, satisfied with the benefits they receive from
more modular code. Most others provide a mechanism for selecting the
appropriate method for an object at execution time. These languages permit a
single piece of code to operate on data of many different types.
Smalltalk and the Common Lisp Object System (CLOS) go much further. In
addition to allowing a class to replace inherited methods, they let it
customize these methods through encapsulation. This is useful when a new
class's behavior is a variation on that of its ancestors. Smalltalk's
super-send mechanism lets a method perform some preprocessing, invoke the
behavior it inherited from its ancestors, then do some additional processing,
all without requiring a programmer to know precisely which ancestor provides
the desired behavior. CLOS's :BEFORE, :AFTER, and :AROUND method types and the
CALL-NEXT-METHOD procedure offer similar capabilities.
C++ lacks an equivalent to Smalltalk's super-send. Instead, it forces
programmers to hard code the name of the class whose method is being
encapsulated. For example, we might create an array class that implements
resizable arrays of integers and a protected_array class based on the array
class which adds a test for invalid subscripts. protected_array puts the
subscript test in its subscript operator (the operator[] defined in Example
1). If the subscript is within range, the subscript operator on
protected_array invokes the subscript it inherits from array, see Example 1.
Example 1: When subscripts are within range, the subscript operator invokes
the subscript it inherits.

 int& protected_array::operator[] (int x)
 { return (x > = 0 && x < size()) ?
 array::operator[] (x) : error_value; }

In this example there is one parent class and one child, so having to refer to
the parent class isn't a problem. It becomes steadily more serious as the
number of classes in the inheritance tree grows. When one class has dozens of
ancestors (not at all uncommon in Smalltalk and Lisp), it is too much to
expect a programmer to know the origin of every single method. The situation
becomes worse when a redesign of a set of classes makes it necessary to move
methods from one class to another. Suddenly it becomes necessary to locate and
modify every caller of the moved method.
This protected_array example illustrates another important difference between
C++ and more exploratory languages like Smalltalk and Lisp: Programmers in
these languages expect to have things like range-checked arrays provided for
them. Smalltalk is known and loved as much for the size and richness of its
class libraries as for the language itself.


C++: Why It Is The Way It Is


C++ began as "C with Classes," a set of object extensions to C. These
extensions were based on the Simula67 language, the first language to exhibit
characteristics which we would now call object oriented. In adding features to
C, Bjarne Stroustrup was determined not to compromise its efficiency and
expressiveness. He identified several important constraints for his new
language:
C++ should be a proper superset of C. Changes to C semantics are permitted
only where absolutely necessary. Maintaining compatibility with C permits easy
migration of both C code and C expertise to the new language. Correcting the
peculiarities of C was not a goal. (And if you don't think C has
peculiarities, try explaining the difference between a pointer and an array
some time.)
C++ should be type safe. The function prototype feature, later adopted by the
ANSI committee for C, permits the compiler to identify and handle data-type
mismatches across function calls, either by casting arguments to the
appropriate type or by reporting an error. By requiring that every member
function in a class first be declared in the class statement, C++ prevents
requests from being made of an object that can't handle them. Smalltalk and
Lisp are less able to detect errors like these at compile time and provide
mechanisms for dealing with inappropriate object requests during program
execution.
Wherever possible, C++ features should extend existing C mechanisms instead of
defining new ones. For example, C++ builds its object-class capabilities on
top of C structs. Programmers can choose the set of object capabilities they
need for each class they build. By contrast, most other object languages
(including other C extension languages) treat objects as a new primitive data
type. These languages require programmers to decide whether they want the
efficiency of data structures or the additional features available with
objects.
C++ language features must not charge a penalty in either space or time unless
they are actually being used. Object-oriented languages permit a program to
determine which of a number of member functions to execute, based on the type
of the object that receives the request. Most languages permit member-function
selection to be deferred until execution time, using type information inside
the object to determine the appropriate function to call. Although C++
supports such a runtime-operator identification facility, it uses it only on
member functions declared virtual and only when the appropriate function can
not be identified at compile time. C++ will also include type information in
an object only if the object's class declares at least one virtual member
function. C++ also includes a second operator-identification mechanism called
"overloading." Overloading, which can choose a function based on both the
number and types of the arguments passed to it, always makes its choice at
compile time.
In general, a C++ program can be written to be just as small and fast as the
corresponding C program. In fact, language features like variable references
make it possible for C++ code to be even more efficient than C. (Both of these
assertions assume compilers that optimize C++ code as well as the C compilers
in use today. If they are not entirely true today they will be before too
long.)
C++ will be designed for the traditional batch compile/link/run model in which
C operates. This is most evident in the C++ class declaration, which contains
both the structure of class instances and a complete list of interface
functions. Such a design precludes the incremental style of other object
languages. C++ programs can't add methods to classes on-the-fly, since that
would violate the requirement that all methods be named in the class
statement. The class statement was clearly designed to be placed in a header
file and included by other source files. It was not intended for incremental,
rapid prototype sorts of environments familiar to Smalltalk and Lisp
programmers, which permit the structure and behavior of classes to be built up
a little at a time. In fact, an interactive environment for experimental
programming in C would probably be easier to build and to use than one based
on C++.
Where ambiguity exists, C++ will make it the programmer's responsibility to
resolve the conflict. Ambiguity can arise in object-oriented languages which
support multiple inheritance. If a method defined by multiple classes in a
complex inheritance tree is invoked, which class should be selected? Should
the inheritance tree be searched in depth first order? Breadth first? What if
different classes in the tree have some ancestor classes in common but include
them in a different order? C++ addresses this problem by generating errors at
compile time. If two or more ancestor classes define the same method, the
programmer must state explicitly which one should be used.
C++'s handling of ambiguity also shows up in its method for selecting among
overloaded functions. Consider the code in Example 2, which defines two
functions for finding the smaller of a pair of numbers.
Example 2: Defining two functions for finding the smaller of a pair of
numbers.

 int min (int a, int b)

 { return a < b ? a : b; }

 double min (double a, double b)
 { return a < b ? a : b; }

 main ()
 { int x = min (5, 7);
 double y = min (3.2, 1.3);
 double z = min (16.3, 2); }

 % CC min. cc

 "min.cc", line 10: error: ambiguous call: min ( double , int )
 "min.cc", line 10: choice of min ()s:
 "min.cc", line 10: int min(int , int );
 "min.cc", line 10: double min(double , double );
 Compilation failed
 %

The first call to min uses the (int, int) version of min, since both arguments
are of type int. The second call to min uses the (double, double) version for
the same reason. The third call is ambiguous. Should the compiler cast the
first argument to an int? Or should it cast the second to a double? C++ leaves
the choice to the programmer, who must either cast one of the arguments or
define a third version of min that takes a double and an int.
C++ demonstrates that the whole of a programming language can be far more
complex than the sum of its parts. Experienced C programmers can learn all of
the syntactic elements of the language and their semantics and usage in a
couple of days. This will be followed by several months of discovering what
happens when individual elements are combined. Can I combine overloaded and
virtual-member functions? How about overloaded functions, template functions,
and default values for function arguments? What happens if I provide
operations to convert from a built-in data type to an object class and back
again? How do I know that the compiler has written functions for me or
generated function calls that don't appear in my source?
Taken as a whole, C++ seems more the product of engineering than of art. Given
all the constraints on its design, perhaps that should not come as much of a
surprise.


C++ is a Better C (and Isn't that Enough?)


One measure of an object-oriented language is its ability to support an object
programming style. A pure object-oriented language should make it easy to
program with objects and difficult to program any other way. By this measure,
Smalltalk is the best object-oriented language available today. CLOS makes
Lisp a good object language. Although programmers can use Lisp to write
procedural code, it does give them plenty of reasons to use objects.
If C++ is a less-than-perfect object-oriented language by this measure, it is
also true that most programmers aren't trying to use it as one. C++ satisfies
a very real need for a better C language. C++ provides programmers with better
compile-time error detection and allows them to be more explicit in describing
their intentions than is possible in C.
The fact that so many of C++'s language elements can be used in isolation is
to the advantage of the more pragmatic developer. Many C++ shops seem to
settle on a subset of the language, whether as part of a formal process or
through a combination of comfort and inertia. Although they may not outlaw the
rest of C++, they don't feel an obligation to use features just because they
exist.
Where might one divide the C++ language? Here's one possible set of layers:
The C-language subset: This is a popular place to start, especially when
moving existing C code to C++. Once prototypes have been inserted into
function declarations, most C code will require little change to compile under
C++.
Improvements on C: C++ addresses many of the deficiencies of C. Features like
looser placement of variable declarations, variable references, const data
items, inline functions, and default values for function arguments offer
greater convenience to C++ programmers without forcing any change in their
programming style.
Memory management: Data types which use malloc and free can be redesigned as
simple classes to take advantage of automatic invocation of class constructor
(initialization) and destructor (cleanup) functions. These functions provide a
convenient home for code which requests and returns heap memory. In addition,
the ability to redefine the new and delete operators permits each class to
define its own scheme for memory management and garbage collection.
Data abstraction: Beyond their memory-management features, classes are
extremely useful for creating new data types. Overloading of function names
and basic operators like plus (+) and equals (=) makes it possible for
programs to treat these data types as if they were built into the language, as
in Example 3(a).
Example 3: (a) Overloading of function names and basic operators like plus (+)
and equals (=) makes it possible for programs to treat these data types as if
they were built into the language; (b) the template facility lets you write
general container classes which can be instantiated to hold any particular
kind of value.

 (a)

 #include (stream.h)
 #include "fraction.h"
 #include "string.h"

 main ()
 { Fraction d = .5, e(2, 3):
 Fraction f = d + e - 1;
 cout << "F is " << f << endl; // Prints "F is 1/6'

 String g = "Hello,", h = " there";
 String i = g + h;
 cout << "I is " << i << endl; } // Prints "I is Hello, there"

 (b)

 #include <stream.h>
 #include "stack.h"

 main ()
 { stack<double> s1; // Create a stack of doubles

 for (int i = 0; i < 10; i++)
 s1.push (i + .5);

 for (i = 0; i < 5; i++)
 cout <<s1.pop() <<" " << s1 << endl;

 stack<char*> s2; // Create a stack of character pointers
 s2.push ("abc");
 s2.push ("def");
 s2.push ("ghi");

 while (!s2.is_empty())
 cout << s2.pop() << " " << s2 << endl; }

These abstract data types on are also easy to reuse in new applications. The
new template facility makes it possible to write general container classes
(arrays, linked lists, stacks, and the like) which can be instantiated to hold
any particular kind of value in Example 3(b).
Object-oriented programming, inheritance, and runtime operator identification:
Although this is what first attracts us to C++, it's the hardest part of the
language for us to learn to use properly.
Needless to say, there are many other ways to divide up the features of C++.
The important point is not to think of C++ as just an object-oriented
language. It is really a vast collection of extensions to C, some of which are
based on object techniques. Its object features exist not to support the kind
of programming-in-the-small style seen in Smalltalk, Lisp, and languages like
Objective-C and Eiffel, but to address issues of integration and maintenance
encountered by large teams of programmers working on major projects.
C++ is an object-oriented language that a C programmer can appreciate,
especially the kind of C programmer who in an earlier age would have written
in assembly language. It is oriented first toward execution performance and
then toward flexibility. Most of the features that C++ adds to C involve no
runtime overhead. Those few that do can be avoided by the efficiency-oriented
programmer.
It is the modular nature of the elements of C++ that makes it both a weak
object language and a strong general-purpose language. OOP purists may decry
its limitations. More pragmatic developers are likely to decide that C++ is
just object oriented enough to help them get their work done.











































Special Issue, 1992
WHY C++ WILL REPLACE FORTRAN


C++ has efficiency, speed, and a lot more




Thomas Keffer


Tom is president of Rogue Wave Software. He can be contacted at P.0. Box 2328,
Corvallis, OR 97339.


Fortran has long been the lingua franca of the numerics world. Yet Fortran's
shortcomings have become a tired joke amongst programmers. Its limited type
checking, lack of extensibility, and reliance on global data make it extremely
hard to maintain and debug. So why does it survive?
The reasons are simple: momentum and efficiency. Countless complicated
algorithms have been implemented in Fortran, and no one wants to rewrite them.
And, for all its warts, Fortran is still an extraordinarily efficient
language.
It's not enough to offer a language that's just as good as Fortran. If people
are to switch, the replacement language must not only be the equal of Fortran
in terms of efficiency and code reuse, but it must also be a lot better in
terms of productivity, maintenance, and power. A tall order!
C++ meets these criteria, making it (in my mind) the first serious contender
to challenge Fortran's supremacy. With care, C++ can meet Fortran's efficiency
(although not exceed it, except in the sense that it can make previously too
complicated algorithms feasible). It is also easy to call existing,
known-to-be-correct Fortran code from C++ (at least under UNIX and some
dialects on the PC). The kicker is that with C++, it can be much easier to
build and maintain the really big, gnarly code being contemplated today.
Here's why:
Encapsulation allows the grisly details of memory allocations, I/O parsing, Do
loops, and the like to be hidden from the user.
Operator overloading allows basic arithmetic operations to be extended to new,
more abstract atomics such as vectors and matrices. Error-prone Do loops over
vector and matrix elements are then eliminated. Dialects of C and Fortran
allow this, but they're currently nonstandard. Operator overloading can also
make it trivial to change the precision of a calculation from float to double.
Inheritance, in combination with encapsulation and operator overloading,
allows new user-defined types to be created with little extra work. If you
need a special vector--say, one that supports null values--you simply add the
missing functionality. The rest can be inherited.
It is easy to base these vectors and matrices on the Linpack Basic Linear
Algebra (BLA) package, for which highly optimized machine-language versions
have been written.
Complex numbers can be added to the language, correcting a major deficiency of
C.
Dynamic binding can allow large parts of the problem to be defined at run time
with little increase in code complexity.
With C++, programmers can concentrate on high-level architectural issues of
the code, not implementation details, and write code like that in Example 1.
In this example, it isn't necessary to know the size of the complex vector b
at compile time--it is determined automatically when the vector is read in.
Example 1: With code like this, it isn't necessary to know the size of the
complex vector b at compile time.

 ComplexVec b; // Declare a complex vector
 cin >> b; // Read it in
 FFTServer s; // Allocate an FFT server
 // Calculate the transform of b:
 ComplexVec theTransform = s.fourier(b);
 cout << theTransform; // Print it out



But Won't Fortran 90 Save Us?


Fortran 90 offers substantial improvements, giving the language a level of
portability and maintainability roughly equivalent to C. Among the
improvements the X3J3 standards committee included were recursion, dynamic
memory allocation, and pointers. In other areas the committee went further,
adding built-in support for vector arithmetic and structures (TYPE, in Fortran
vernacular) and eliminating Do loops in many situations (although the syntax
is not pleasing).
Still, many other modern concepts are missing. Fortran is far from supporting
key object-oriented concepts such as inheritance and encapsulation, let alone
polymorphism. Useful C++ concepts like constructors and destructors and
parameterized types are also missing. It is these features, plus a host of
other, smaller features (const variables come to mind) that give C++ its
unique blend of efficiency and maintainability.


Numerics in the Small


C++'s ability to define whole new types and an accompanying algebra gives the
language a chameleon-like quality. A class designer, for example, can make the
language look remarkably different, depending on what goals need to be
achieved (now it's a database language, now it's a graphical language...).
Several new types are particularly well suited for numerics, including
vectors, matrices, linear algebra, and transforms.
Vectors. Encapsulating arrays inside a vector class to give them a natural,
predictable arithmetic is an obvious place to start. How might such a class
look? This depends on how the vectors are constructed, what their arithmetic
looks like, and so on. For concreteness, let's look at how a vector of doubles
(call it class DoubleVec) might look, starting with a set of constructors that
allows us to create new vectors in a predictable way; see Example 2(a). You
can then add some basic arithmetic and assignment operations as in Example
2(b), and address individual elements of the vector as in Example 2(c). If,
however, you want to set every other element of a vector to some value, you
have to write something like Example 2(d), which defeats the advantages of
abstraction.
Example 2: (a) DoubleVec, a vector of doubles; (b) adding basic arithmetic and
assignment operations; (c) addressing individual elements of the vector; (d)
the advantages of abstraction are defeated if you have to write code like
this.

 (a)

 DoubleVec a; // Null vector - no elements at all, but
 can be resized
 DoubleVec b(8); // 8 elements long, uninitialized
 DoubleVec c(8,1); // 8 elements, initialized to 1.0
 DoubleVec d(8,1,2); // 8 elements, initialized to 1, 3, 5, ...


 (b)

 b = 2.0; // Set all elements of b to 2
 b = c + d; // Set b to the sum of c and d
 b *= 2; // Multiply each element in b by 2.

 (c)

 b[2] = 4.0; // Set the 3'rd element of b to 4.0.
 c[1] = b[3]; // Set the 2'nd element of c to the 4'th element of b

 (d)

 DoubleVec a(10, 0); // 10 element vector
 for (int i = 0; i<10; i+=2)
 a[i] = 1;

Having taken the trouble to encapsulate the array elements into a vector,
alarm bells should go off in your head if you start to take the vector back
apart and address individual elements. The results are likely to be slow and
hard for the compiler to optimize. Instead, you need to tell the program, in
some abstract sense, to "set every other element to 1."
The key to maintaining a high level of abstraction is the slice that allows
elements separated by a constant stride (say every second element) to be
addressed; see Example 3(a). The slice is an extremely powerful abstraction
that can be used to implement a variety of algorithms. As an added bonus, the
BLA routines have been programmed in terms of slices, so we can take advantage
of existing, highly optimized versions of this package to implement our slice
arithmetic.
Example 3: (a) The key to maintaining a high level of abstraction is the slice
that allows elements separated by a constant stride to be addressed; (b)
implementing slices by using a "helper" class; (c) arithmetic operators
implemented using DoubleSlices as arguments allows expressions like this; (d)
code that executes slowly because a simple vector must continuously undergo a
type conversion to the helper class.

 (a)

 a.slice(0, 2) = 1; // Starting with element 0; set every other
 element to 1

 (b)

 class DoubleVec {
 ...
 public:
 ...

 operator DoubleSlice(); // For type conversion
 DoubleSlice slice(int start, int step, unsigned N) const;
 };
 // The "helper class";
 class DoubleSlice {
 DoubleVec* theVector;
 int startElement;
 unsigned sliceLength;
 int step;
 public:

 ...
 friend DoubleVec operator+(const DoubleSlice&, const DoubleSlice&);
 ...
 };

 (c)

 DoubleVec b(10, 0), c(10, 1);
 DoubleVec d = b.slice (0, 2) + c.slice(1, 2);

 (d)

 DoubleVec g = b + c; // DoubleVec to DoubleSlice type conversion occurs


There are two architectural approaches to implementing slices: either using a
"helper class" or building the slices into the vector class. Each has
advantages, although the role of slices in algorithms is so fundamental that
the second approach tends to produce cleaner, more efficient code.
Nevertheless, it is useful to take a look at the first approach.
What's a helper class? To answer this, look at Example 3(b), where the actual
vector data has not been shown in the interest of clarity. In addition to the
basic vector class DoubleVec, there's a second ("helper") class named
DoubleSlice, which contains the data necessary to address the "sliced"
elements. The member function slice() returns an instance of this class. All
the arithmetic operators must be implemented using DoubleSlices as an argument
(as per the + operator in the example code) to allow expressions like Example
3(c). This leads to slow code, because a simple vector must continuously
undergo type conversion to the helper class; see Example 3(d).
Alternatively, you could implement two versions of the arithmetic operators:
one taking the vector as an argument, the other the helper class. This,
however, leads to type-conversion ambiguities. Building slices into the vector
class leads to simpler code, so type conversion becomes simpler as well.
The lesson here is that helper classes are fine, but when the code
fundamentally depends on them, the results are slow and type conversions are
always ambiguous. You should reexamine your approach to see if your problem
depends on what helper classes are trying to accomplish in some fundamental
way. If so, you might be better off finding a way to eliminate them, even if
that makes the remaining classes slightly more complex.
Listing One, page 47, illustrates how a vector class with built-in slices
might be implemented. (Error checking, efficiency optimization, and code that
deals with special cases have been omitted in the interest of clarity.) First
comes the vector data, which can be shared by more than one type of vector. It
contains a reference count and a pointer to an array of raw, untyped data. The
constructor specifies the number of elements in the vector and the size (in
bytes) of each individual element.
Listing Two, page 47, outlines the actual vector that includes a pointer to
the DataBlock (outlined in Listing One). The reference count in class
DataBlock is used to ensure that the vector data is not prematurely deleted if
more than one vector is using it. For every vector actively using the
DataBlock, the count is increased by 1. When a vector is done, the count is
decremented by 1. When the count is 0, no vector is using the block, and it
can safely be deleted.
There's also begin, a pointer to the start of data (and the slice), where the
actual data typing occurs. This design approach allows multiple types to share
the same DataBlock and eliminates one level of indirection at the expense of
some extra storage space. As you'll see, it also allows some highly expressive
statements.
There's one other wrinkle: the variable step. This is the stride length, the
step size between contiguous elements of the vector. A conventional vector is
a slice that starts with the 0th element and has a stride of 1.
A slice of an existing vector is created by calling the member function
slice(), which in turn, uses a special constructor; see Listing Three (page
47).
The combination of a generalized starting element and stride enables some very
powerful and intuitive expressions. For example, it's trivial to return all
the real parts of a complex vector as a DoubleVec: It's just a slice of every
other element in the complex vector. (Perhaps this isn't the safest approach
since it assumes a known structure for type complex. It would fail if someone
implemented complex using, say, polar notation. Nevertheless, the
functionality could be retained by using a helper class.) The result is that
the function in Example 4(a) can be used as an lvalue, as in Example 4(b).
Example 4: Addressing the real elements of a complex vector: The function in
Example 4(a) can be used as an lvalue, as in Example 4(b).


 (a)

 DoubleVec real(const ComplexVec&);

 (b)

 ComplexVec a(10, 10, 0); // (0,0), (0,0), (0,0), ...
 real(a) = 1.0; // (0,0), (0,0), (0,0), ...

Matrices. Matrices are an extremely important part of numerics that can be
created by inheriting from a vector; see Listing Four (page 47). Note the
member functions col(unsigned) and row(unsigned) that return a column or row,
respectively, as a vector slice, allowing expressions like Example 5(a). It's
even possible to return the diagonal as a slice; see Example 5(b).
Example 5: (a) Returning a column or row as a vector slice; (b) returning the
diagonal as a slice.

 (a)

 DoubleMatrix a(10, 0); // 10 by 10 matrix, initialized to zero
 a.row(3) = 1; // Set row 3 to 1
 a.col(2) = a.col(4); // Copy column 4 to column 2

 (b)

 DoubleMatrix I(10, 10, 0); // 10x10 initialized to 0
 I.diagonal() = 1; // Create an identity matrix

Linear Algebra. Matrix decompositions--LU Decomposition and Singular Value
Decomposition (SVD)--occupy a central role in linear algebra. However, pivot
indexes, conditioning numbers, nullspace, and range vectors abound. Gathering
these into a C++ class makes them much easier to work with. Here's how to do
this with LU decompositions.
The LU decomposition of a matrix consists of finding two matrices such that A
= LU, where L is a lower-triangular matrix, and U is an upper-triangular
matrix. This decomposition can be used to solve sets of linear equations.
Listing Five (page 47) illustrates how an LU decomposition class might be
structured.
Note the constructor: An LU decomposition is "constructed" from a matrix. For
computational reasons, what is actually calculated is the LU decomposition of
a row-wise permutation of the original matrix. The vector of ints permute is
used to keep track of the original index of each row. The rest of the
construction process consists of calculating the lower- and upper-diagonal
matrices L and U, which are then packed into a private matrix base class of
the same dimension as the original matrix.
Several of the more ugly details of LU decomposition can be hidden by
encapsulation. For example, it is of no interest to the user that the L and U
matrices are stuffed inside a single matrix; hence the private declaration of
the base class. Nor is it the user's concern that a row-wise permutation of
the original matrix is being decomposed. Example 6(a) shows how to use such a
decomposition class.
Example 6: (a) Using a matrix decomposition class; (b) using LU decomposition
to solve five different sets of equations; (c) requesting the inverse of the
original matrix and letting type conversion do the work; (d) specifying
conversion explicitly.

 (a)

 DoubleMatrix a(10, 10);
 // ... (initialize a somehow)
 // Construct the LU decomposition of a:
 LUDecomp aLU(a);

 // Now use it:
 double det = determinant (aLU);
 DoubleMatrix aInverse = inverse (aLU);

 (b)

 // 5 different sets of linear equations to be solved:
 DoubleVec b[5], x[5];
 // ... (set up the 5 vectors b and the 5 vectors x, each
 // with 10 elements as per the matrix a above)

 for (int i = 0; i < 5; i++)
 x[i] = solve (aLU, b[i]);

 (c)

 DoubleMatrix a(10, 10);
 // ... (initialize a)
 // Calculate the inverse directly from a.
 // A DoubleMatrix to LUDecomp type conversion takes place automatically:
 DoubleMatrix aInverse = inverse (a);

 (d)

 DoubleMatrix aInverse = inverse (LUDecomp (a));

You can also use the LU decomposition to solve a set of linear equations ax=b,
using the friend function solve(); see Example 6(b).
For the user, who doesn't even want to worry about LU decompositions, type
conversion can play an attractive and convenient role. In Example 6(a), the LU
decomposition was created first, then used to calculate, say, the inverse of
the matrix. However, you could just as well request the inverse of the
original matrix and let type conversion do the work, as in Example 6(c).
Seeing no prototype inverse(const DoubleMatrix&), the compiler will look for a
way to convert DoubleMatrix a into something for which it has a prototype.
When it discovers the constructor LUDecomp(const DoubleMatrix&), the compiler
will invoke it to call inverse (const LUDecomp&).
There are, of course, limitations to this approach: If more than one
decomposition is possible (SVD, for example), the user must specify the
conversion explicitly, lest the compiler issue an error about ambiguous type
conversions; see Example 6(d).
Transforms (FFTs and All That Stuff). Any algorithm that requires expensive
precalculation before use is a good candidate to become a class. The fast
Fourier transform (FFT) is one such algorithm. To transform a vector of length
N, the Mth order complex roots of 1 must be calculated, where M is the set of
prime factors of N. For example, if N=30, then the 2nd, 3rd, and 5th order
complex roots (2x3x5=30) of 1 must be calculated. This is an expensive
calculation. If you are going to transform many vectors of that length, you
don't want to throw the results away. The solution is to design a server class
like that in Listing Six (page 47) to hold these roots: At any given moment,
the server class can be configured to transform a vector of a certain length.
The "roots of one" of all the prime factors of a vector of length npts are
packed into the complex vector theRoots. They are calculated at three possible
times: when the server is constructed, when the user calls setOrder(unsigned),
or dynamically, when the server transforms a vector. Because of this last
capability, using such a server is a pleasure because you don't have to worry
about whether it is configured correctly to transform a given vector. If it's
not, it will automatically reconfigure, as in Example 7.
Example 7: Automatic reconfiguration of an FFT server.

 ComplexVec timeVector(30);
 FFTServer aServer; // Allocate a server
 // Will automagically reconfigure for a vector of length 30:
 ComplexVec spectrum = aServer.fourier (timeVector);

Of course, each reconfiguration is expensive, so if you plan to transform a
bunch of vectors of varying length, you will probably want to keep many
servers on hand. The bookkeeping to do this is far easier with self-contained
servers than with the equivalent Fortran approach. (You could even use a
hashed table lookup to find the correct server, making a super-server!)


What About Efficiency?


If the price were reduced efficiency, all of C++'s features would be of little
interest to Fortran programmers. Consequently, we'll look at two benchmarks
that highlight C++'s efficiency.
Nearly all numerical algorithms come down to performing binary operations on
large numbers of elements--that's why pipelined architectures on vectorizing
machines such as the Crays have been so effective and popular. Hence, it is
important to get this right.
Figure 1 compares the time and code required for C++ (using Rogue Wave's
Math.h++) versus Fortran to multiply together two vectors as a function of the
vector length. The figure demonstrates that, with care, C++ can be even faster
than Fortran!
Why? Because with C++ it's easy to isolate the crucial piece of code and treat
it right. In this program, the critical expression is DoubleVec c=a*b;. To
evaluate this, the compiler will call the function with prototype DoubleVec
operator*(const DoubleVec&, const DoubleVec&);. The sole job of this function
is to multiply the two operands together and return the results. It's
important to note that the context of this multiply is completely controlled,
freeing us from traditional C problems such as "aliased pointers": All indexes
and intermediate results can be held in registers. Inside the function you can
have highly optimized assembly code (as in this benchmark) or a call to a set
of specialized BLAs (if available). The result is extraordinarily fast code.
Figure 2 shows a different measure of efficiency, the venerable Linpack
benchmark. This benchmark sets up a matrix, factors it (using LU
factorization), then solves a set of linear equations using that
factorization. Only the factorization and solution time is actually used in
the benchmark. (The matrix setup time is not measured.) Figure 2(a) shows a
C++ version (using Math.h++), and Figure 2(b) shows a standard Fortran
version. The code listings have been set up such that comparable statements
line up side-by-side.
Figure 2: The Linpack benchmark: (a) A C++ version (using Math.h++); (b)
standard Fortran version.

 (a) #include <dgenfct.h> (b) double precision a(90, 90)
 b (90),x(90)
 #include <rstream.h> double precision ops,
 mflops, norma, normx
 double precision resid,
 residn, eps
 double precision t1, t2,
 total
 integer ipvt (90)

 class DTestMatrix : public
 DoubleGenMat {
 public:
 DTestMatrix (unsigned
 order);
 };

 class DTestRHS : public

 DoubleVec {
 public:
 DTestRHS (const
 DTestMatrix&);
 };

 double second(); double precision second
 double epslon (double); double precision epslon

 lda = 90
 const unsigned N = 90; n = 90
 const unsigned long ops
 = 2.0*N*N*N/3.0 + 2.0*N*N; ops = (2.0e0*n**3)/3.0e0
 + 2.0e0*n**2

 void main() {
 DTestMatrix a(N); call matgen (a,lda,n,b,norma)
 double norma = maxVal
 (abs(a));
 DTestRHS b(a);

 double t1 = second(); t1 = second()
 // Construct the LU
 Factorization:
 DoubleGenFact fact (a); call dgefa (a,lda,n,ipvt,info)

 t1 = second() - t1; t1 = second() -t1
 double t2 = second(); t2 = second()

 DoublVec x = solve
 (fact, b); call dgesl (a,lda,n,ipvt,b,0)

 t2 = second() - t2; t2 = second() - t2
 double total = t1+t2; total = t1+t2
 double mflops = ops
 / (1.0e6*total); mflops = ops/(1.0e6*total)

 DoubleVec tol =
 a.product(x) - b; do 10 i = 1,n
 x(i) = b(i)
 10 continue
 call matgen (a,lda,n,b,norma)
 do 20 i = 1,n
 b(i) = -b(i)
 20 continue
 CALL DMXPY (n,b,n,lda,x,a)

 double resid = maxVal (abs(tol)); RESID = 0.0
 double normx = maxVal (abs(x)); NORMX = 0.0
 DO 30 I = 1,N
 RESID = amax1 ( RESID,
 ABS(b(i)) )
 NORMX = amax1 ( NORMX,
 ABS(X(I)) )
 30 CONTINUE

 double eps = epslon(1.0); eps = epslon (1,0D0)
 double residn= resid /
 (N*norma*normx*eps); RESIDn = RESID/(

 N*NORMA*NORMX*EPS )

 cout << "Normalized residual
 = " << residn << NL; write (6,1000)RESIDn
 cout << "Residual
 = " << resid << NL; 1000 format (' Normalized residual
 =', g16.7)
 cout << "Machine precision
 = " << eps << NL; write (6,1001)RESID
 cout << "Factorization time
 = " << t1 << NL; 100 format ('Residual
 = ', g16.7)
 cout << "Solution time
 = " << t2 << NL; write (6,1002)eps
 cout << "Total time
 = " << total << NL; 1002 format (' Machine precision
 = ', g16.7)
 cout << "MFLOPS
 = " << mflops << NL; write (6,1003)t1
 } 1003 format (Factorization time
 = ', g16.7)
 write (6,1004)t2
 1004 format (' Solution time
 = ', g16.7)
 write (6,1005)total
 1005 format (' Total time
 = ', g16.7)
 write (6,1006)mflops
 1006 format (' MFLOPS
 = ', g16.7)
 stop
 end

 DTestMatrix::DTestMatrix
 (unsigned n) : DoubleGenMat (n,n) { subroutine matgen
 (a,lda,n,b,norma)
 double precision
 a(lda,1),b(1),norma

 long init = 1325; init = 1325
 for (int j=0; j<n; j++){ norma = 0.0
 for (int i=0; i<n; i++){ do 30 j = 1,n
 init = 3125*init % 65536; do 20 i = 1,n
 sub(i,j) =
 (init-32768.0)/16384,0; init = mod(3125*init,65536)
 } a(i,j) = (init - 32768.0)/
 16384.0
 } norma = amax1(a(i,j),
 norma)
 } 20 continue
 30 continue

 DTestRHS::DTestRHS(const DTestMatrix& a): do 35 i = 1,n
 DoubleVec(a.rows(), 0.0) { b(i) = 0.0
 35 continue

 do 50 j = 1,n
 for(int i=0; i<length(); i++) do 40 i = 1,n
 (*this)(i) = sum(a.row(i)); (b)i = b(i) + a(i,j)

 } 40 continue
 50 continue
 return
 end

 Total (nonblank) lines = 52 Total (nonblank) lines = 71
 Executable size = 64868 bytes Executable size = 87482 bytes
 (Borland C++ V2.0 w. optimizer & (Microsoft Fortran
 8087, large memory model) V5.0 w. optimizer & 8087)

 16 MHz 386 w. 80387: 16 MHz 386 w. 80387:

 Normalized residual = 1.24482 Normalized
 residual = 1.212636
 Residual = .4973799e-13 Residual
 = .4843265E-13
 Machine precision = .2220446e-15 Machine
 precision = .2220446E-15
 Factorization time = 3.97 Factorization
 time = 4.230000
 Solution time = 0.13 Solution time = .170000
 Total time = 4.10 Total time = 4.400000
 MFLOPS = 0.122 MFLOPS = .1141364

First, note the use of inheritance to guarantee that the test matrix and the
"right-hand side" of the sets of linear equations (DTestMatrix and DTestRHS,
respectively) are set up correctly. While this uses a few extra lines of code,
it recognizes these two objects for what they are: unique and "special,"
requiring a certain (and, in this case, intricate) initialization sequence.
While we could have used the Fortran approach and created a "blank" vector and
matrix to be passed onto a special function to be initialized, in a large
project we might neglect to do this, risking an improper initialization. Yet
because of inheritance, these special objects inherit all of the abilities of
their underlying base classes.
Second, note how simple and more intuitive the calculation of a tolerance,
residual, and norm becomes. Finally, note that despite the extra abilities of
the C++ code in terms of type checking, dynamic memory allocations, and safe
object construction, it still requires fewer lines of code. It also executes
faster!


Numerics in the Large


Up to this point, I've shown how the encapsulation and operator-overloading
properties of C++ can give rise to an impressive and pleasing economy of
expression for such "in the small" objects as vectors, matrices, FFT servers,
and the like. This is useful, but not enough for working with big, gnarly
projects where problems escape faster than they can be contained.
Suppose you wanted to model the motions of a vibrating string under the
influence of a spatially and temporally varying force of (as yet) unknown
origin. Figure 3 shows the governing equation, where u is the string
displacement, x is space, t is time, c{2} is the string tension over the
string density, and F(x,t) is the external force applied to the string.
How might we model such a problem? Example 8 shows one solution. Here's how
abstraction of the problem is enforced, line by line.
Example 8: Class declaration for a vibrating string.

 class Force: // 1
 class String { // 2
 public:
 String(double length, double tension, double density); // 3
 void setPoints(int nx); // 4
 void timeStep(double dt, const Force& force); // 5
 DoubleVec displacement() const; // 6
 private:
 DoubleVec u; // 7
 double cSquared;
 double length;
 };

1. Line #1 alerts the compiler that the keyword Force (to be defined later) is
actually a user-defined class.
2. Line #2 is where the declaration for the class String starts. In it, we
define all the external properties needed to create, manipulate, and observe a
string.
3. Line #3 shows how to construct a string. We need its length, tension, and
density.
4. Line #4 specifies the resolution of the string's numerical
representation--the number of points that will be used to represent it.
5. Line #5 time-steps the problem. To calculate the new position of the string
over a time step we must know the time step's length (dt) and the forcing
function (force).
6. Line #6 asks the string for its current displacement. This is returned as a
vector of doubles.
7. Line #7 is the private section of the declaration, where the actual
implementation details are hidden. Obviously useful variables are the string
displacement, tension, and length. We may have to introduce other variables as
intermediates in the calculations.
Of course, the actual solution procedure is not trivial; this is only the
barest of outlines. For example, we are starting with a single equation second
order in time, and we will probably want to change that to two equations first
order in time.
Now look at the abstract class Force. A fundamental assumption is that we do
not know much about its nature. Indeed, it may even depend on the string
displacement. (For example, the force of the wind on a bridge depends on the
displacement of the bridge.) See Example 9(a).
That's it. You can use inheritance to define the actual details of the force.
For example, a wind-type force acting on the string might look like Example
9(b).

You explicitly recognize that the force of the wind depends on the
displacement of the string by requiring that a specific string be used in the
constructor (as well as on a drag coefficient). This WindForceString object
will then track the string, asking it for its present displacement, before
calculating the resultant force of the wind on the string and returning it.
This is done by calling the virtual function value(), an example of
polymorphism (a fancy word for runtime binding). Example 9(c) shows how
value() might be implemented.
Example 9: (a) An abstract base class representing a force; (b) a specializing
class representing a wind force on a string; (c) sample calculation of the
resultant force.

 (a)

 class Force {
 public:
 virtual DoubleVec value() = 0; // 1
 };

 (b)

 class WindForceString : public Force {
 public:
 WindForceString{ String& string, double dragCoeff );
 void setVelocity(double windspeed);
 virtual DoubleVec value(); // 1
private:
 String& myString; // The string
 we are tracking
 double wind; // Present wind
 speed.
 double drag;
 };

 (c)

 DoubleVec WindForceString::value()
 {
 // Get present string displacement;
 DoubleVec d = string.displacement();
 return - drag*wind*wind*d; // Some (bogus) calculation
 }

It then becomes trivial to replace the forcing function, even at run time,
with another type of force. This is more than just passing a generic vector of
doubles to the String that represents the forcing function. You can have an
actual object, complete with feedback loops to the string, act as the forcing
function.


Conclusions


C++ has tremendous potential in numerics, one that has gone largely unnoticed
by fans of object-oriented programming, perhaps because previous OOP languages
lacked the efficiency required to do numerics. C++ has this efficiency--and a
lot more.





















Special Issue, 1992
USING MULTIPLE INHERITANCE IN C++


When is it worth the effort?




Tom Cargill


Tom is a software consultant based in Boulder, Colorado, specializing in C++.
He started programming in C++ in 1983 at AT&T Bell Laboratories, Murray Hill,
New Jersey. He is the author of C++ Programming Style (Addison-Wesley, 1992)
and welcomes e-mail at cargill@csn.org on Internet and 76476,1422 on
CompuServe.


This article is about how multiple inheritance can be used in writing C++
programs. I devote little time to describing the details of language features;
these can be found in any C++ text. I concentrate instead on a more subtle
issue: identifying the kinds of programming problems for which multiple
inheritance really helps programmers.
When multiple inheritance was added to C++ in 1989, I was skeptical that the
feature was worth the complexity that it added to the language. Hoping to see
its value, I studied all the programs then available that claimed to
demonstrate how to use multiple inheritance. However, I discovered that I
could rewrite all those programs without using multiple inheritance, and that
the resulting programs were generally simpler and easier to understand. My
conclusion then was that multiple inheritance in C++ was not useful, and that
its complexity was an unnecessary burden for programmers and compiler writers.
More recently, I have seen some programs (and written some myself) that use
multiple inheritance in a style quite unlike the earlier ones. I would like to
explain why the early attempts to use multiple inheritance failed and what is
different about the more recent programs.
I assume only that you know the basic property of multiple inheritance in C++:
A derived class (subclass) may inherit from more than one base class
(superclass). The programs here do not use any of the tricky parts of multiple
inheritance, such as initialization and assignment of virtual base classes or
dominance of virtual functions.


Specialization Inheritance


Most inheritance in C++ programs is single inheritance used to express
specialization; that is, to model a relationship in which one abstraction is a
specialization of another. The more general abstraction is represented by a
base class and the specialization is represented by a derived class. Any good
text on object-oriented programming in C++ emphasizes specialization as the
motivation for using inheritance.
Modeling specialization relationships results in inheritance hierarchies that
mirror classification hierarchies from the problem domain. For example, the
abstraction of a car is a specialization of the abstraction of a vehicle.
Therefore, class Car inherits from class Vehicle. The relationship between
class Car and class Vehicle is known as the is-a relationship, or is-a-kind-of
relationship, because every Car object is a kind of Vehicle object. The
inheritance relationship may be shown graphically in a tree, usually with the
base class appearing above the derived class, as in Figure 1 .


Inheritance for Communication


C++ is a statically typed language: The type of every object, pointer, and
reference must be declared at compilation time. In order for one object to
call a member function (invoke a method) of another object, the calling object
must know, at compilation time, the type of the called object. The static
type-checking of C++ poses a problem when a server object must perform a
callback to a client object. Normally, it is a client object that calls member
functions of a server object to obtain services, in which case the client
object knows the type of the server object, but not vice versa. A callback
occurs when a server object delivers part of its service asynchronously, and
must therefore call a member function of the client object.
To see where a callback might arise, consider a simulation that involves
engines that may be started by a user clicking with a mouse on an image of a
button on a display. The view of an engine object on the screen might be as
shown in Figure 2(a) .
Assume that an object of class Engine models the engine, that the Engine
object has created a Button object for the Start button, and provided the
button object with the information necessary to display itself. The initial
communication required is the normal case in which the Engine object is the
client and calls member functions of the Button object, the server. In the
object diagram shown in Figure 2(b), the arrow represents member-function
calls from the Engine object to the Button object.
Such communication does not cause problems for the static type system of C++
because the Engine object knows the class of the server object that it
created. The Engine object knows that it is dealing with a Button object and
can call the required member functions of Button.
The callback problem arises when the Button object must call back to notify
the Engine object that the user has clicked on the buttons image. From the
perspective of writing the Engine class, it would be convenient if there were
a start member function of Engine, which the Button object could invoke, as in
Example 1.
Example 1: Member function in the Engine class that should be "called" back.

 class Engine {
 . . .
 public:
 void start();
 . . .
 };

However, there is no way within the static type system of C++ for the Engine
object to tell the Button object, "Please notify me by calling my start member
function," unless the Button class knows at compilation time about the Engine
class. If Button is a special-purpose class, used only for the Start button of
engines, it can be coded with the necessary information about the Engine
class. However, it is more useful to create a general-purpose Button server
class, one that need know nothing about its clients.
The Button class may be made independent of its clients, if the Button class
specifies a callback protocol through an abstract base class. Let there be an
auxiliary class, declared alongside Button as part of the Button service,
called ButtonCallback, as in Example 2. The entire declaration of the
ButtonCallback class is shown. The class defines just one member function, a
pure virtual function (deferred method), denoted by the = 0 syntax in the
member-function prototype. The purpose of a pure virtual function is to
establish an interface; the abstract class declaring a pure virtual function
is not obliged to provide an implementation of the function. Classes derived
from the abstract class must define the member function. To execute its
callback to the client object Button demands that it communicate with an
instance of a class derived from ButtonCallback. That derived class must
define the callback member function, as shown in the object diagram in Figure
3.
Example 2: Abstract base class that enables callback protocol via pure virtual
function callback().

 class ButtonCallback {
 public:
 virtual void callback() = 0;
 };

As a client receiving the callback, class Engine is declared as a derived
class of ButtonCallback and supplies a definition of the callback function.
Engine::callback simply calls the start member function of Engine, as shown in
Example 3.
Example 3: Engine::callback() in action.

 class Engine : public ButtonCallback

 {
 . . .
 public:
 void start();
 virtual void callback();
 . . .
 };

 void Engine::callback()
 {
 start();
 }

The inheritance relationship between Engine and ButtonCallback is motivated by
the need of two objects to communicate in a statically typed language. The
inheritance relationship is not motivated by any intrinsic properties of the
abstractions involved. There is no attempt to model an is-a-kind-of
relationship. Indeed, this use of inheritance is foreign to programmers of
dynamically typed languages, such as Smalltalk, where an object sending a
message need know nothing at all about the type of the receiving object.


Multiple Inheritance


Multiple inheritance permits a class to be derived from two or more base
classes. The simplest multiple-inheritance class diagram has one class, say Z,
derived from two others, X and Y, as shown in the class diagram in Figure
4(a).
With this construction permitted in the language, class relationships can
become much more involved than with single inheritance. Under single
inheritance, the inheritance hierarchy is a tree; under multiple inheritance
the hierarchy is a directed acyclic graph, or DAG. For example, an indirect
base class can be reached by more than one path through the DAG. The minimal
example requires four classes, as shown in Figure 4(b).
Multiple paths through the DAG, may give rise to various kinds of ambiguities,
which are addressed by further rules and language mechanisms. Fortunately, we
can discuss programming situations in which multiple inheritance is and is not
useful without studying these complexities.
Multiple inheritance has caused considerable confusion among programmers and
authors of C++ texts. The reaction of most programmers to multiple inheritance
is that it must be for expressing relationships among classes that belong to
rich classification hierarchies. That is, they look for classes corresponding
to abstractions that exhibit more than one is-a-kind-of relationship. The two
situations that superficially look most promising are multiple classification
and dynamic classification.
Multiple classification arises when an abstraction participates throughout its
lifetime in more than one is-a-kind-of relationship. Should the multiple
is-a-kind-of relationships be represented by multiple inheritance?
Unfortunately, multiple inheritance is an unwieldy way to model multiple
classification. On the other hand, multiple classification is simplified by
viewing the various attributes of the class independently, and composing those
attributes to form an object. Programming with a composition of attributes is
generally simpler, more flexible, and more expressive than modeling multiple
classification with multiple inheritance.
The following small example is typical of attempts to express multiple
classification with multiple inheritance. Suppose that class Car is
specialized by engine type, as shown in Figure 5(a). Further suppose that cars
are classified by origin of manufacture, as in Figure 5(b). Many programmers
try to represent such multiple classification by multiple inheritance within
the Car class hierarchy, as in Figure 5(c).
Building programs in this fashion is not viable because of the combinatorial
explosion in the number of classes: The number of classes grows as the product
of the variation in the attributes. With two attributes the hierarchy is
almost manageable, but a real application would quickly grow to hundreds or
thousands of classes.
A simpler way to approach this modeling problem is to view the type of the
engine and the origin of the car as orthogonal attributes of a car object. The
Car class can then declare a member object (or, more precisely, pointer or
reference to an object) of the appropriate type for each attribute, as in
Example 4.
Example 4: Modeling orthogonal attributes through object reference.

 class Car {
 . . .
 Origin &source;
 Power &engine;
 . . .
 };

Classes Origin and Power may still form independent inheritance hierarchies,
but the Car class is not complicated by specialization relationships among its
attribute classes. As further specialization is introduced to these classes,
the Car class remains unchanged.
Dynamic classification arises when an abstraction participates in different
is-a-kind-of relationships at different phases of its lifetime. For example,
sometimes a seaplane is a kind of boat; at other times, it is a kind of plane.
Dynamic classification is not supported by any of the inheritance mechanisms
of C++, because every object is of precisely one type, determined at the time
the object was created. No metamorphosis is permitted; an object cannot modify
its type dynamically. If we attempt to express this dynamic classification by
multiple inheritance, the class relationship is as shown in Figure 6.
The problem with this class hierarchy is that both class Boat and class Plane
are always base classes of SeaPlane. The behavior of a SeaPlane object cannot
vary over time; it must always be the union of the behaviors of Boat and
Plane.
Dynamic classification can be expressed in C++ by the use of delegation,
instead of inheritance. Delegation is relatively simple: One object receives
member function calls and propagates them to a member function of another
object that performs the delegated work. Using delegation, a SeaPlane object
may behave like a boat by delegating incoming calls to a Boat object, as in
Figure 7(a). At other times the same SeaPlane object may choose to delegate
incoming calls to a Plane object, and therefore behave like a plane, as in
Figure 7(b).
Multiple inheritance is not an effective way to program multiple
classification or dynamic classification. Quite simply, multiple inheritance
in C++ does not support the realization in a computer program of multiple
is-a-kind-of relationships.


Multiple Inheritance for Communication


Multiple inheritance does appear to be useful for establishing
multiple-communication relationships between objects. Consider a situation in
which a single client object must receive callbacks from multiple server
objects. An example might be a Clock object that must maintain the time of day
in a window on a screen. The Clock object needs the services of two servers: a
window server that provides space on the screen in which the image of the
clock can be displayed, and an interval timer server. The Clock object must
accept callbacks from both server objects: from the Window object, to receive
notification that the client must refresh the windows content for some reason,
and from the Timer object, to receive notification periodically of the elapsed
time, as in Figure 8(a).
By analogy with the inheritance mechanism shown above for communication
between Button and Engine, class Clock must be a derived class of both
TimerCallback and WindowCallback, as in Figure 8(b).
Notice that this inheritance hierarchy reflects callback-communication
relationships between objects. Class Clock is part of an inheritance hierarchy
because of the servers from which it requires callbacks, not because of any
intrinsic properties of its abstraction.


Synthetic vs. Natural Classes


Class TimerCallback and class WindowCallback are synthetic classes: They do
not correspond to abstractions found in an application problem domain.
Synthetic classes emerge during design and coding of a system in response to
internal, synthetic needs of the software. This is in contrast to natural
classes, those that correspond to abstractions from the problem domain and
typically arise either during analysis or early in design. A simple criterion
for deciding whether a class is natural or synthetic is to ask end users if
they recognize the abstraction. Because a natural class comes from the problem
domain, an end user will understand its purpose; a synthetic class arises only
from software implementation considerations, so the end user will not
appreciate the need for it.
Callbacks are not the only reason for creating synthetic classes. They arise
anywhere we consider sophisticated implementation details, like object
persistence or reference-counted garbage collection.


Conclusion



The better we understand how we use our current programming languages, the
better we can guide future languages to meet our needs as programmers. Single
inheritance in C++ is useful for dealing with specialization, usually a
natural class relationship. Single inheritance is also useful in dealing with
synthetic class relationships, such as callbacks. However, multiple
inheritance does not appear to help in modeling richer specialization
relationships. Composition and delegation turn out to be more useful for that
purpose. Multiple inheritance does appear to be useful in managing multiple
synthetic class relationships, such as multiple-server callback communication.



























































Special Issue, 1992
IMPLEMENTING CURVES IN C++


Computer graphics benefit from class libraries




Stephen P. Johnson and Tom McReynolds


Steve is a graphics software engineer for Apple Computer. Tom is a computer
graphics engineer at Sun Microsystems and an Adjunct Professor at Santa Clara
University. They can be contacted through the DDJ offices.


Two-dimensional curves show up in various applications: EPUBS, MCAD, ECAD,
Postscript, and window systems, to name a few. To be useful, curves must be
easy to control and easy for a computer to render. Controlling lines is
simple: The user sets some control geometry (the endpoints) and knows
intuitively where the line will be drawn. Controlling a curve is more
difficult, since there is an infinite number of curved shapes between any two
endpoints. Restricting the available curves to a fixed set of curve types,
such as circles and ellipses, solves the problem, but this solution is not
powerful enough for most applications. Free-form curves, whose shape the user
can control without restriction, are required. Unfortunately, there is no
single, obvious way to create and shape them.
Parametric curve types are distinguished by the type of control geometry
describing the curve, and how it is used to generate the curve equation. A
different curve type, even if it uses the same control geometry, will be
interpreted into a different curve shape. The computer-graphics community
makes use of a variety of curve types, trading off their different strengths
and weaknesses. For example, a fast and simple Bezier curve may be ideal for
representing fonts in a Post-Script printer, while a more expressive NURBS
curve would better represent the complex shapes created with a solids modeling
application. As a result, a sophisticated application may have to handle many
different curve representations. In this article, we show how to represent a
wide variety of curve representations efficiently, implementing them in C++,
using a class hierarchy and an object-oriented programming style.


A Class Hierarchy for Curves


The formulas describing free-form curves have much in common. In fact, only
two major types of curves are in common use: exterpolating and interpolating
curves. They differ in how they respond to the control geometry. An
application uses a set of geometric locations called "control points" to
define the shape of the curve. For exterpolating curves, the control points
provide a boundary called the "convex hull." The curve always remains within
this boundary, sometimes not even touching the control points. Interpolative
curves, however, always pass through their control points.
Both types of curves can be represented in a class/object design: The class
represents free-form curves containing the control-points array and the
operators that act on them. The curve object hierarchy has three features:
It provides data abstraction, which supplies a general definition for all
curves represented using the software implementation.
It contains data-hiding mechanisms. The curve's actual representation is
hidden from the application software.
The curve representation is encapsulated; that is, the attributes and
operators on a curve object are contained within an object.


Defining Curves in C++


Figure 1 shows the class hierarchy for curves, as represented in this article.
The base class is derived into two primary subclasses: the basis-matrix class
and the nonuniform B-spline class. A basis-matrix curve is represented by a
matrix derived using the techniques described in the accompanying text box,
"Derivation of Basis Matrix." The nonuniform B-spline class is used for curves
that are nonuniform in the step of the parametric variable t.
Listing One (page 60) shows the C++ header file for the curve-class hierarchy.
The Basis_matrix_curve class is used for the uniform beta-spline and to derive
the curve types Hermite, Bezier, uniform B-spline, and Catmull-Rom
interpolating curve. The Nub_curve class defines the nonuniform, nonrational
B-splines and to derive the curve type nonuniform, rational B-spline (commonly
known as NURBS). Nonuniform B-splines require an extra piece of data called
the "knot vector," a floating-point array containing a nondecreasing list of
values that control how the curve is evaluated. Figure 2 shows the formulas
used to define nonuniform B-splines.
The implementation of curves involves defining the methods that use the
control points to render the curve. For the Basis_matrix_curve, the
constructor defines the matrix used to compute the coefficients of the
third-degree polynomial curve definition. So, to define a new type of curve,
it is necessary to derive the new types from the Basis_matrix_curve class and
define a constructor that computes the new basis matrix.
Note that the curves are two-dimensional; this is defined in the class
Point2d. It is trivial to extend the definition to three-dimensional curves by
defining a Point3d. The curves in this article are limited to fourth order or
third degree. The challenge of extending the classes to other orders is left
to the reader.
For a Hermite, Bezier, uniform B-spline, and Catmull-Rom spline, the
constructor defines the basis matrix used to compute the coefficients of the
polynomial. But for a uniform beta-spline, the bias and tension values must be
initialized. The default values are bias= 1.0 and tension= 0.0. These values
reduce to a basis matrix equivalent to the uniform B-spline. You can exert
precise control of the curve by modifying the bias and tension parameters of
the uniform beta-spline. Listing Two (page 60) shows the methods, including
the constructors, for each of the various curves.
When an application creates a curve object, it must supply the control points
that define the curve shape. For some curve types, the application may also
have to define the know vector, or tension and bias values. The application
then invokes the display_curve method to render the curve.
All curves derived from the Curve class have a display_curve method. For the
Basis_matrix_curve class, the display-curve method renders a third-degree
polynomial by tessellating it into vectors. But for a nonuniform B-spline
class, the display_curve method is overwritten to display the curve, using the
formulas in Figure 3.


Performance Comparison of Display Methods


For the Basis_matrix_curve class, the curve is converted to the coefficients
of a third-degree polynomial. This polynomial is evaluated into a number of
line segments, based on a fixed tessellation factor. The curve's tessellation
into line segments is performed within the class's display_curve method.
There are two popular methods for evaluating a third-degree polynomial. The
first technique is based on Horner's method. It is a method of rewriting the
polynomial to reduce the number of arithmetic operations associated with its
evaluation. Figure 3 shows the formulas behind Horner's method, and Listing
Three (page 62) shows the C++ source code for displaying a third-degree
polynomial using Horner's method. The code reduces the number of multiplies
and adds so that we gain some performance, but we can do better.
A higher-performance method for evaluating polynomials surfaced in the 1970s.
This method is based on evaluating the first- and second-order differences of
the polynomial and performing additions within the tessellation inner loop.
Again, Figure 3 shows the formulas for this technique, and Listing Four (page
62) shows the C++ source code for displaying a curve using forward
differencing. The inner loop of the curve display is reduced to six
additions--certainly faster than the eight multiplications and seven additions
performed by Horner's method.


Demonstration Program


Listing Five (page 63) shows the main routine for displaying several different
curve types. Each curve displayed defines the geometry of the control points
and then invokes the display-curve method.
Using the same control points, the demonstration main routine displays several
different Hermite curves. The curves' shape is modified by changing the
direction vectors at the first and last control points, and modifying the
vectors' magnitude and direction.
A Bezier curve is rendered for the control points (20, 20), (50, 100), (300,
50), and (100, 10). This defines a simple Bezier-style curve. Immediately
after rendering the Bezier curve, a non-uniform, nonrational B-spline (NUB) is
rendered. The knot vector of this curve is set to (0, 0, 0, 0, 1, 1, 1, 1).
This interpolates the endpoints and extrapolates the interior control points,
thus displaying the same Bezier curve. The knot vector of the NUB curve is
then modified to (0, 0, 0, 1, 2, 3, 3, 3) and rendered. This curve shows the
extrapolation of the control points. A simple modification to the knot vector
yields a completely different curve.
The next curve displayed is a Catmull-Rom curve which interpolates the control
points. It is useful in applications such as data analysis, where the control
points must lie on the curve.
The demo program renders several uniform, beta-spline curves and manipulates
the bias and tension parameters to demonstrate how the shape of the curve can
be easily changed. Increasing the bias parameter pulls the curve to a sharp
angle; while increasing the tension parameter yields a similar effect but in
exactly the opposite direction.
Finally, the demonstration program renders several NURB curves through the
same control points. The knot vector is manipulated to render several
different curves using the same control points.



Portability


Listing Six (page 63) is the source file utilstc.c that defines all the
Borland C++-specific rendering routines. Listing Seven (page 64), utilsxlib.c,
defines all the rendering routines specific to machines running the X Window
system. The current implementation is compiled for Borland C++ 3.0 and for
UNIX/X on a Sun and IBM RS/6000.
Porting this program to another system is easy: The source code needs a C++
compiler and a display system. You write your own, system-specific version of
the display-utilities source file shown in Listings Six and Seven.
To port this code, modify init_graphics_device to open and initialize your
graphics device. This code should create and map windows to the display and
clear the window or display to a background color. The function
close_graphics_device must be modified to destroy the window or reset the
hardware to the appropriate state before returning to the system.
More Details.
If your hardware can scan-convert a line segment, modify the source code in
the line function. Just add the appropriate code to call the system
line-segment function similar to the Xlib port. If your system cannot
scan-convert a line segment, you can change the set_pixel function to write
the appropriate color into your display device. If your system cannot display
a pixel at a given x, y position with a given color, you are out of luck.
The clear_window function sets the window or display to a background color.
The VGA implementation sets the background to pixel-value 0, which is black on
a standard display. The function text_output prints a string on the display.
The demonstration program uses these functions to help you view the curves
rendered.


Bibliography


Foley, J.D. et al. Computer Graphics Principles and Practice. Reading, MA:
Addison-Wesley, 1990.
Hearn, Donald and Pauline M. Baker. Computer Graphics. Englewood Cliffs, NJ:
Prentice Hall, 1986.
Nye, Adrian. Xlib Reference Manual for Version 11. Sebastapol, CA: O'Reilly &
Associates, 1990.
Phoenix Technical Reference Series: System BIOS for IBM PC/XT/AT Computers and
Compatibles. Reading, MA: Addison-Wesley, 1989.
Rodgers, David F. and J. Alan Adams. Mathematical Elements for Computer
Graphics, second edition. Berkeley, CA: McGraw-Hill, 1990.
Turbo C++ Library Reference. Scotts Valley, CA: Borland International, 1990.
Wilton, Richard. Programmer's Guide to PC and PS/2: Video Systems. Redmond,
WA: Microsoft Press, 1987.


Derivation of Basis Matrix


In order to make them more manageable mathematically, the equations that draw
curves are written parametrically. The parametric form of an equation is
written as a function of one or more independent variables (in these examples,
the single variable t). For example, the line equation; y= mx+b would be split
into two separate equations, both depending on t: x(t)=A[x] + B[x]t and y(t)=
A[y]+ B[y]t. In this representation, when t=0, x and y are at one endpoint of
the line; when t=1, and x and y are at the other.
Although the most common parametric curves are fourth order (containing t{3},
t{2}, t, and 1), the examples here use second order, containing only t and 1.
This restricts our "curves" to straight lines.
Although a parametric equation is useful, it's hard to find coefficients
needed to represent the desired curve. To make this easier, we'll break the
equation up into pieces. First, we must tie the equation to some geometry that
controls the type of curve drawn. The set of points that define the curve are
the control points. In this example, we want the line drawn between two
endpoints that we select. Assuming t ranges from 0 -- 1, we need to figure out
A[x], A[y], B[x], and B[y] in terms of the curve endpoints, which we'll call
X[0], Y[0], and x[1], y[1]. This leads to the equations in Example 1(a).
Now we have a parametric equation based on some understandable geometry.
Example 1(b) shows this rewritten so there's only one instance of each
geometry value.
Example 1(c) puts everything in matrix form. Splitting the last matrix into
two parts isolates the geometry from the curve type, as in Example 1(d).
The equations are now composed of three matrices: parametric, basis, and
geometry. The parametric is relatively fixed--the number of elements
determines the order of the curve. The basis matrix determines how the
geometry combines the geometry matrix with the equation's parameters to draw
the line. The geometry part is different for every line.
To find the elements of the basis matrix, solve the matrix equation. Set the t
values at 0, then 1; solving for each gives the equation in Example 1(e). The
x and y values have been replaced with more general "geometry" entries.
Multiplying the parameter and basis matrices together results in a set of
blending functions. As the value of the parameter t changes, these equations
"blend" the elements of the geometry matrix together to form curves.
Most applications contain curves made up of many connected segments. The
places where the curve segments meet, as well as the ends of complete curve,
are called "knots." To connect these segments smoothly, you must be able to
control the connections continuity. A C{0} continuity means the curve knots
connect; C{1} means the slopes of the curves match at the connection; and C{2}
means the curvatures match as well. C{n} continuity is defined by the equation
in Example 1(f), where the nth derivative of the curve segments match where
they meet. The level of continuity possible is limited by the order of the
curve used. To handle curves with multiple curve segments, we need to change
our general equation to that in Example 1(g). In our example, the basis matrix
didn't change. In most cases, it does. To keep the numbering the same, the is
must be greater than 0. Now a curve is defined by a list of control points.
Each curve segment ranges from t[i] to t[i+1]. Since this curve is uniform,
t[i+1]-t[i], the difference of t between knot values always equals 1. Note
that t ranges from 0-1 in each curve segment.
Moving to order-four curves gives us two more levels of continuity to work
with, allowing curve segments to be connected together smoothly. The equation
for fourth-order parametric curves changes surprisingly little; see Example
1(h). Different curves are formed by defining the basis and geometry matrices.
The basis matrix defines how the geometry will be selected. Since it
interprets how the geometry should be mixed by the parameter t, the type of
basis matrix depends on the type of geometry in the geometry matrix. The basis
and geometry matrices for Hermite, Bezier, B-spline, and Catmull-Rom (an
interpolating spline) are given in Example 1(i). A P in the geometry matrix
indicates a control point, an R indicates a control vector.
The first three equations in Example 1(i) approximate the curves' control
points; the Catmull-Rom equation approximates the curves themselves. The
Beta-Spline representation is a generalization of the B-spline curve. It adds
two parameters, bias (beta1) and tension (beta2). If beta1= 1 and beta2= 0,
then it reduces to the B-spline, as in Example 1(j). All the curves shown have
been computed in 2-D (x and y) coordinates. To make 3-D curves, add a z
coordinate. When rendering, convert back into nonrational by dividing each
coordinate in a point by its w part, then discarding w.
--S.J. & T.M.

_IMPLEMENTING CURVES IN C++_
by Stephen P. Johnson and Tom McReynolds


[LISTING ONE]
// curve.h - Base class for curves
#include <stdio.h>
class Point2d {
protected:
 float x, y, w;
public:
 // set x, y
 void set_xy(float new_x, float new_y) {
 x = new_x; y = new_y; w = 1.;
 }
 // set x, y and w
 void set_xyw(float new_x, float new_y, float new_w) {
 x = new_x; y = new_y; w = new_w;
 }

 void set_x(float new_x) { x = new_x; }
 void set_y(float new_y) { y = new_y; }
 void set_w(float new_w) { w = new_w; }
 // get x, y
 void get_xy(float *ret_x, float *ret_y) {
 *ret_x = x; *ret_y = y;
 }
 // get x, y and w
 void get_xy(float *ret_x, float *ret_y, float *ret_w) {
 *ret_x = x; *ret_y = y; *ret_w = w;
 }
 float get_x(void) { return x; }
 float get_y(void) { return y; }
 float get_w(void) { return w; }
 // print out the current x, y
 void print(void) { printf("%f %f\n", x, y); }
};
class Curve {
protected:
 int num_geom;
 Point2d *geom;
public:
 // constructor
 Curve(void);
 // destructor
 ~Curve(void);
 // set the geometry vector which is also called the control points
 void set_geom_vector(int count, Point2d *g);
 // method to display a third degree curve
 void display_curve(int n, int color, int show_geom_pts,
 int show_convex_hull);
};
class Basis_matrix_curve : public Curve {
protected:
 Point2d t_coeff[4];
 float basis_matrix[4][4];
public:
 // constructor
 Basis_matrix_curve(void);
 // get the coefficient matrix for the t's
 void get_t_coeff(Point2d tc[4]);
 // overwrite set geometry vector to add in matrix multiply
 void set_geom_vector(int count, Point2d *g);
 // set a user defined matrix into the basis matrix
 void set_basis_matrix(float m[4][4]);
 // get the current basis matrix
 void get_basis_matrix(float m[4][4]);
 // multiply a 4x4 by 4 points that are a 2x4 matrix
 void multiply_basis_by_geometry(void);
 // method to display a third degree curve
 void display_curve(int n, int color, int show_geom_pts,
 int show_convex_hull);
};
class Hermite_curve : public Basis_matrix_curve {
public:
 // constructor for Hermite type curve
 Hermite_curve(void);
};
class Bezier_curve : public Basis_matrix_curve {

public:
 // constructor for Bezier type curve
 Bezier_curve(void);
};
class Bspline_curve : public Basis_matrix_curve {
public:
 // constructor for Bspline type curve
 Bspline_curve(void);
};
class Catmull_Rom_curve : public Basis_matrix_curve {
public:
 // constructor for Catmull-Rom type curve
 Catmull_Rom_curve(void);
};
// uniformly shaped beta-spline
class Beta_spline_curve : public Basis_matrix_curve {
protected:
 float bias;
 float tension;
public:
 // constructor for uniformly shaped beta-spline
 Beta_spline_curve(void);
 // methods for setting and getting the bias and tension
 void set_bias(float new_bias) {
 bias = new_bias;
 update_basis_matrix();
 multiply_basis_by_geometry();
 }
 void set_tension(float new_tension) {
 tension = new_tension;
 update_basis_matrix();
 multiply_basis_by_geometry();
 }
 float get_bias(void) { return bias; }
 float get_tension(void) { return tension; }
 // method for updating the basis matrix
 void update_basis_matrix(void);
};
class Nub_curve : public Curve {
protected:
 int num_knots;
 float *knots;
public:
 // constructor for NUB curves
 Nub_curve(void);
 // destructor for NUB curves
 ~Nub_curve(void);
 // setup the knot vector with user information
 void set_knot_vector(int count, float *v);
 // method to display a third degree NUB curve
 void display_curve(int n, int color, int show_geom_pts,
 int show_convex_hull);
};
class Nurb_curve : public Nub_curve {
public:
 // method to display a third degree NURB curve
 void display_curve(int n, int color, int show_geom_pts,
 int show_convex_hull);
};





[LISTING TWO]

// curve.cc - Implementation of methods for base class for curves //
#include <iostream.h>
#include <stdio.h>
#include <stdlib.h>
#ifdef __TURBOC__
#include <alloc.h>
#else
#include <malloc.h>
#endif
#include <math.h>
#include "curve.h"

#define CROSS_SIZE 5
extern "C" void line(short, short, short, short, short);

// constructor for curve
Curve::Curve(void)
{
 num_geom = 0;
 geom = (Point2d *)NULL;
}
// destructor for curve
Curve::~Curve(void)
{
 if (num_geom != 0 geom) {
 free(geom);
 }
}
// set the geometry vector which is also called the control points
void Curve::set_geom_vector(int count, Point2d *g)
{
 if (count != num_geom && geom) {
 free(geom);
 }
 geom = (Point2d *)malloc(count * sizeof(Point2d));
 if (geom) {
 int i;
 num_geom = count;
 for (i = 0; i < num_geom; i++) {
 geom[i] = g[i];
 }
 }
}
// constructor for basis matrix class
// get the coefficients
void Basis_matrix_curve::get_t_coeff(Point2d tc[4])
{
 tc[0] = t_coeff[0];
 tc[1] = t_coeff[1];
 tc[2] = t_coeff[2];
 tc[3] = t_coeff[3];
}
// overwrite set geometry vector to do basis matrix multiply

void Basis_matrix_curve::set_geom_vector(int count, Point2d *g)
{
 Curve::set_geom_vector(count, g);
 multiply_basis_by_geometry();
}
// constructor for basis matrix class
Basis_matrix_curve::Basis_matrix_curve(void)
{
 int i, j;
 Point2d g[4];

 g[0].set_xy(0., 0.); g[1].set_xy(0., 0.);
 g[2].set_xy(0., 0.); g[3].set_xy(0., 0.);
 Curve::set_geom_vector(4, g);

 t_coeff[0].set_xy(0., 0.);
 t_coeff[1].set_xy(0., 0.);
 t_coeff[2].set_xy(0., 0.);
 t_coeff[3].set_xy(0., 0.);
 for (i = 0; i < 4; i++)
 for (j = 0; j < 4; j++)
 if (i == j)
 basis_matrix[i][j] = 1.;
 else
 basis_matrix[i][j] = 0.;
}
// set a user defined matrix into the basis matrix
void Basis_matrix_curve::set_basis_matrix(float m[4][4])
{
 int i, j;
 for (i = 0; i < 4; i++)
 for (j = 0; j < 4; j++)
 basis_matrix[i][j] = m[i][j];
}
// get the current basis matrix
void Basis_matrix_curve::get_basis_matrix(float m[4][4])
{
 int i, j;
 for (i = 0; i < 4; i++)
 for (j = 0; j < 4; j++)
 m[i][j] = basis_matrix[i][j];
}

// multiply a 4x4 by a 2x4 (2D geometric matrix)
void Basis_matrix_curve::multiply_basis_by_geometry()
{
 t_coeff[0].set_x(
 basis_matrix[0][0] * geom[0].get_x() +
 basis_matrix[0][1] * geom[1].get_x() +
 basis_matrix[0][2] * geom[2].get_x() +
 basis_matrix[0][3] * geom[3].get_x());

 t_coeff[1].set_x(
 basis_matrix[1][0] * geom[0].get_x() +
 basis_matrix[1][1] * geom[1].get_x() +
 basis_matrix[1][2] * geom[2].get_x() +
 basis_matrix[1][3] * geom[3].get_x());
 t_coeff[2].set_x(
 basis_matrix[2][0] * geom[0].get_x() +

 basis_matrix[2][1] * geom[1].get_x() +
 basis_matrix[2][2] * geom[2].get_x() +
 basis_matrix[2][3] * geom[3].get_x());
 t_coeff[3].set_x(
 basis_matrix[3][0] * geom[0].get_x() +
 basis_matrix[3][1] * geom[1].get_x() +
 basis_matrix[3][2] * geom[2].get_x() +
 basis_matrix[3][3] * geom[3].get_x());
 t_coeff[0].set_y(
 basis_matrix[0][0] * geom[0].get_y() +
 basis_matrix[0][1] * geom[1].get_y() +
 basis_matrix[0][2] * geom[2].get_y() +
 basis_matrix[0][3] * geom[3].get_y());
 t_coeff[1].set_y(
 basis_matrix[1][0] * geom[0].get_y() +
 basis_matrix[1][1] * geom[1].get_y() +
 basis_matrix[1][2] * geom[2].get_y() +
 basis_matrix[1][3] * geom[3].get_y());

 t_coeff[2].set_y(
 basis_matrix[2][0] * geom[0].get_y() +
 basis_matrix[2][1] * geom[1].get_y() +
 basis_matrix[2][2] * geom[2].get_y() +
 basis_matrix[2][3] * geom[3].get_y());

 t_coeff[3].set_y(
 basis_matrix[3][0] * geom[0].get_y() +
 basis_matrix[3][1] * geom[1].get_y() +
 basis_matrix[3][2] * geom[2].get_y() +
 basis_matrix[3][3] * geom[3].get_y());
}
// constructor for Hermit curve
Hermite_curve::Hermite_curve(void)
{
 float m[4][4];
 m[0][0] = 2.; m[0][1] = -2.; m[0][2] = 1.; m[0][3] = 1.;
 m[1][0] = -3.; m[1][1] = 3.; m[1][2] = -2.; m[1][3] = -1.;
 m[2][0] = 0.; m[2][1] = 0.; m[2][2] = 1.; m[2][3] = 0.;
 m[3][0] = 1.; m[3][1] = 0.; m[3][2] = 0.; m[3][3] = 0.;
 set_basis_matrix(m);
}
// constructor for Bezier curve
Bezier_curve::Bezier_curve(void)
{
 float m[4][4];
 m[0][0] = -1.; m[0][1] = 3.; m[0][2] = -3.; m[0][3] = 1.;
 m[1][0] = 3.; m[1][1] = -6; m[1][2] = 3.; m[1][3] = 0.;
 m[2][0] = -3.; m[2][1] = 3.; m[2][2] = 0.; m[2][3] = 0.;
 m[3][0] = 1.; m[3][1] = 0.; m[3][2] = 0.; m[3][3] = 0.;
 set_basis_matrix(m);
}
// constructor for Uniform Nonrational Bspline curve
Bspline_curve::Bspline_curve(void)
{
 float m[4][4];
 m[0][0] = -1./6.; m[0][1] = 3./6.; m[0][2] = -3./6.; m[0][3] = 1./6.;
 m[1][0] = 3./6.; m[1][1] = -6./6.; m[1][2] = 3./6.; m[1][3] = 0.;
 m[2][0] = -3./6.; m[2][1] = 0.; m[2][2] = 3./6.; m[2][3] = 0.;
 m[3][0] = 1./6.; m[3][1] = 4./6.; m[3][2] = 1./6.; m[3][3] = 0.;

 set_basis_matrix(m);
}
Catmull_Rom_curve::Catmull_Rom_curve(void)
{
 float m[4][4];
 m[0][0] = -1./2.; m[0][1] = 3./2.; m[0][2] = -3./2.; m[0][3] = 1./2.;
 m[1][0] = 2./2.; m[1][1] = -5./2.; m[1][2] = 4./2.; m[1][3] = -1./2.;
 m[2][0] = -1./2.; m[2][1] = 0.; m[2][2] = 1./2.; m[2][3] = 0.;
 m[3][0] = 0.; m[3][1] = 2./2.; m[3][2] = 0.; m[3][3] = 0.;
 set_basis_matrix(m);
}
Beta_spline_curve::Beta_spline_curve(void)
{
 bias = 1.;
 tension = 0.;
 update_basis_matrix();
}
void Beta_spline_curve::update_basis_matrix(void)
{
 int i, j;
 float m[4][4];
 float bias2 = bias*bias;
 float bias3 = bias2*bias;
 float delta;

 m[0][0] = -2.*bias3; m[0][1] = 2.*(tension+bias3+bias2+bias);
 m[0][2] = -2.*(tension+bias2+bias+1.); m[0][3] = 2.;

 m[1][0] = 6.*bias3; m[1][1] = -3.*(tension+2.*bias3+2.*bias2);
 m[1][2] = 3.*(tension+2.*bias2); m[1][3] = 0.;

 m[2][0] = -6.*bias3; m[2][1] = 6.*(bias3-bias);
 m[2][2] = 6.*bias; m[2][3] = 0.;

 m[3][0] = 2.*bias3; m[3][1] = tension+4.*(bias2+bias);
 m[3][2] = 2.; m[3][3] = 0.;
 delta = tension + 2.*bias3 + 4*bias2 + 4*bias + 2.;
 for (i = 0; i < 4; i++)
 for (j = 0; j < 4; j++)
 m[i][j] /= delta;
 set_basis_matrix(m);
}
float b1(int i, float t, float *knots)
{
 if (knots[i] <= 1. && t < knots[i+1])
 return 1.;
 else
 return 0.;
}
float b2(int i, float t, float *knots)
{
 float n, d;
 float sum = 0.;

 n = t - knots[i];
 d = knots[i+1] - knots[i];
 if (d != 0.) {
 sum += (n / d) * b1(i, t, knots);
 }

 n = knots[i+2] - t;
 d = knots[i+2] - knots[i+1];
 if (d != 0.) {
 sum += (n / d) * b1(i+1, t, knots);
 }
 return sum;
}
float b3(int i, float t, float *knots)
{
 float n, d;
 float sum = 0.;

 n = t - knots[i];
 d = knots[i+2] - knots[i];
 if (d != 0.) {
 sum += (n / d) * b2(i, t, knots);
 }
 n = knots[i+3] - t;
 d = knots[i+3] - knots[i+1];
 if (d != 0.) {
 sum += (n / d) * b2(i+1, t, knots);
 }
 return sum;
}
float b4(int i, float t, float *knots)
{
 float n, d;
 float sum = 0.;

 n = t - knots[i];
 d = knots[i+3] - knots[i];
 if (d != 0.) {
 sum += (n / d) * b3(i, t, knots);
 }
 n = knots[i+4] - t;
 d = knots[i+4] - knots[i+1];
 if (d != 0.) {
 sum += (n / d) * b3(i+1, t, knots);
 }
 return sum;
}
// constructor for NUB curves
Nub_curve::Nub_curve(void)
{
 num_knots = 0;
 knots = (float *)NULL;
}
// destructor for curve
Nub_curve::~Nub_curve(void)
{
 if (num_knots != 0 knots) {
 free(knots);
 }
}
// setup the knot vector with user information
void Nub_curve::set_knot_vector(int count, float *v) {
 if (count != num_knots && knots) {
 free(knots);
 }

 knots = (float *)malloc(count * sizeof(float));
 if (knots) {
 int i;
 num_knots = count;
 for (i = 0; i < num_knots; i++)
 knots[i] = v[i];
 }
}
// display method for a NUB curve
void Nub_curve::display_curve(int n, int color, int show_geom_pts,
 int show_convex_hull)
{
 int i, j, m;
 float t;
 float delta;
 float x0, y0, x, y, x1, y1;
 float bt0, bt1, bt2, bt3;
#define NUBEVAL(X, Y) \
 bt0 = b4(i-3, t, knots); \
 bt1 = b4(i-2, t, knots); \
 bt2 = b4(i-1, t, knots); \
 bt3 = b4(i, t, knots); \
 X = geom[i-3].get_x() * bt0 + \
 geom[i-2].get_x() * bt1 + \
 geom[i-1].get_x() * bt2 + \
 geom[i].get_x() * bt3; \
 Y = geom[i-3].get_y() * bt0 + \
 geom[i-2].get_y() * bt1 + \
 geom[i-1].get_y() * bt2 + \
 geom[i].get_y() * bt3;
 delta = (knots[4] - knots[3]) / (float)n;
 i = 3;
 t = knots[3];
 NUBEVAL(x0, y0);
 for (j = 2; j <= n; j++) {
 t += delta;
 NUBEVAL(x, y);
 line(
 (short)x0, (short)(floor(y0)+0.5),
 (short)x, (short)(floor(y)+0.5),
 color);
 x0 = x; y0 = y;
 }
 if (show_geom_pts) {
 for (i = 0; i < 4; i++) {
 x = geom[i].get_x();
 y = floor(geom[i].get_y())+0.5;
 line((short)x-CROSS_SIZE, (short)y, (short)x+CROSS_SIZE,
 (short)y, color);
 line((short)x, (short)y-CROSS_SIZE, (short)x,
 (short)y+CROSS_SIZE, color);
 }
 }
 if (show_convex_hull) {
 for (i = 0; i < 3; i++) {
 x0 = geom[i].get_x();
 y0 = floor(geom[i].get_y())+0.5;
 x1 = geom[i+1].get_x();
 y1 = floor(geom[i+1].get_y())+0.5;

 line((short)x0, (short)y0, (short)x1, (short)y1, color);
 }
 }
}
// display method for a NUB curve
void Nurb_curve::display_curve(int n, int color, int show_geom_pts,
 int show_convex_hull)
{
 int i, j;
 float t;
 float delta;
 float x0, y0, w, x, y, x1, y1;
 float bt0, bt1, bt2, bt3;
#define NURBEVAL(X, Y, W) \
 bt0 = b4(i-3, t, knots); \
 bt1 = b4(i-2, t, knots); \
 bt2 = b4(i-1, t, knots); \
 bt3 = b4(i, t, knots); \
 X = geom[i-3].get_x() * bt0 + \
 geom[i-2].get_x() * bt1 + \
 geom[i-1].get_x() * bt2 + \
 geom[i].get_x() * bt3; \
 Y = geom[i-3].get_y() * bt0 + \
 geom[i-2].get_y() * bt1 + \
 geom[i-1].get_y() * bt2 + \
 geom[i].get_y() * bt3; \
 W = geom[i-3].get_w() * bt0 + \
 geom[i-2].get_w() * bt1 + \
 geom[i-1].get_w() * bt2 + \
 geom[i].get_w() * bt3; \
 if (W != 1. && W != 0.) { \
 X /= W; Y /= W; \
 }
 delta = (knots[4] - knots[3]) / (float)n;
 i = 3;
 t = knots[3];
 NURBEVAL(x0, y0, w);
 for (j = 2; j <= n; j++) {
 t += delta;
 NURBEVAL(x, y, w);
 line(
 (short)x0, (short)(floor(y0)+0.5),
 (short)x, (short)(floor(y)+0.5),
 color);
 x0 = x; y0 = y;
 }
 if (show_geom_pts) {
 for (i = 0; i < 4; i++) {
 x = geom[i].get_x();
 y = floor(geom[i].get_y())+0.5;
 line((short)x-CROSS_SIZE, (short)y, (short)x+CROSS_SIZE,
 (short)y, color);
 line((short)x, (short)y-CROSS_SIZE, (short)x,
 (short)y+CROSS_SIZE, color);
 }
 }
 if (show_convex_hull) {
 for (i = 0; i < 3; i++) {
 x0 = geom[i].get_x();

 y0 = floor(geom[i].get_y())+0.5;
 x1 = geom[i+1].get_x();
 y1 = floor(geom[i+1].get_y())+0.5;
 line((short)x0, (short)y0, (short)x1, (short)y1, color);
 }
 }
}




[LISTING THREE]

// Horner's method for curve display method
void Basis_matrix_curve::display_curve(int n, int color, int
show_geom_pts,
 int show_convex_hull)
{
 int i;
 float delta;
 float t, t2, t3;
 float x0, y0, x, y, x1, y1;

 x0 = t_coeff[3].get_x(); y0 = t_coeff[3].get_y();
 delta = 1. / (float)n;
 t = 0.;
 for (i = 0; i <= n; i++) {
 t += delta;
 t2 = t * t;
 t3 = t2 * t;
 x = t_coeff[0].get_x() * t3 +
 t_coeff[1].get_x() * t2 +
 t_coeff[2].get_x() * t +
 t_coeff[3].get_x();
 y = t_coeff[0].get_y() * t3 +
 t_coeff[1].get_y() * t2 +
 t_coeff[2].get_y() * t +
 t_coeff[3].get_y();
 line(
 (short)x0, (short)(floor(y0)+0.5),
 (short)x, (short)(floor(y)+0.5),
 color);
 x0 = x; y0 = y;
 }
 if (show_geom_pts) {
 for (i = 0; i < 4; i++) {
 x = (short)geom[i].get_x();
 y = (short)(floor(geom[i].get_y())+0.5);
 line(x-CROSS_SIZE, y, x+CROSS_SIZE, y, color);
 line(x, y-CROSS_SIZE, x, y+CROSS_SIZE, color);
 }
 }
 if (show_convex_hull) {
 for (i = 0; i < 3; i++) {
 x0 = geom[i].get_x();
 y0 = (floor(geom[i].get_y())+0.5);
 x1 = geom[i+1].get_x();
 y1 = (floor(geom[i+1].get_y())+0.5);
 line(x0, y0, x1, y1, color);

 }
 }
}




[LISTING FOUR]

// Forward differencing for curve display method
void Basis_matrix_curve::display_curve(int n,
 int color,
 int show_geom_pts,
 int show_convex_hull)
{
 int i;
 float d, delta, delta2, delta3;
 float x0, y0, x, y, x1, y1;
 float dx, d2x, d3x;
 float dy, d2y, d3y;

 delta = 1. / (float)n;
 delta2 = delta * delta;
 delta3 = delta2 * delta;

 dx = t_coeff[0].get_x() * delta3 +
 t_coeff[1].get_x() * delta2 +
 t_coeff[2].get_x() * delta;
 d2x = 6. * t_coeff[0].get_x() * delta3 +
 2. * t_coeff[1].get_x() * delta2;
 d3x = 6. * t_coeff[0].get_x() * delta3;
 dy = t_coeff[0].get_y() * delta3 +
 t_coeff[1].get_y() * delta2 +
 t_coeff[2].get_y() * delta;
 d2y = 6. * t_coeff[0].get_y() * delta3 +
 2. * t_coeff[1].get_y() * delta2;
 d3y = 6. * t_coeff[0].get_y() * delta3;

 x = x0 = t_coeff[3].get_x(); y = y0 = t_coeff[3].get_y();
 for (i = 1; i <= n; i++) {
 x += dx; dx += d2x; d2x += d3x;
 y += dy; dy += d2y; d2y += d3y;
 line(
 (short)x0, (short)(floor(y0)+0.5),
 (short)x, (short)(floor(y)+0.5),
 color);
 x0 = x; y0 = y;
 }
 if (show_geom_pts) {
 for (i = 0; i < 4; i++) {
 x = (short)geom[i].get_x();
 y = (short)(floor(geom[i].get_y())+0.5);
 line((short)x-CROSS_SIZE, (short)y,
 (short)x+CROSS_SIZE, (short)y, color);
 line((short)x, (short)y-CROSS_SIZE,
 (short)x, (short)y+CROSS_SIZE, color);
 }
 }
 if (show_convex_hull) {

 for (i = 0; i < 3; i++) {
 x0 = geom[i].get_x();
 y0 = floor(geom[i].get_y())+0.5;
 x1 = geom[i+1].get_x();
 y1 = floor(geom[i+1].get_y())+0.5;
 line((short)x0, (short)y0,
 (short)x1, (short)y1, color);
 }
 }
}



[LISTING FIVE]

// demo.cc - demonstrating different curves
// For Unix and X, compile with:

#include <iostream.h>
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include "curve.h"

extern "C" {
void init_graphics_device(void);
void close_graphics_device(void);
void clear_window(void);
void text_output(char *);
void text_output_and_wait(char *);
};

main()
{
 float knot[8];
 Point2d geom[4];
 init_graphics_device();
 Hermite_curve herm;
 text_output("Hermite Curves");

 geom[0].set_xy(20., 20.);
 geom[1].set_xy(300., 20.);
 geom[2].set_xy(0., 500.);
 geom[3].set_xy(0., -500.);
 herm.set_geom_vector(4, geom);
 herm.display_curve(32, 14, 0, 0);

 geom[0].set_xy(20., 20.);
 geom[1].set_xy(300., 20.);
 geom[2].set_xy(0., 200.);
 geom[3].set_xy(0., -200.);
 herm.set_geom_vector(4, geom);
 herm.display_curve(32, 13, 0, 0);

 geom[0].set_xy(20., 20.);
 geom[1].set_xy(300., 20.);
 geom[2].set_xy(0., 200.);
 geom[3].set_xy(-100., 0.);
 herm.set_geom_vector(4, geom);

 herm.display_curve(32, 7, 0, 0);

 geom[0].set_xy(20., 20.);
 geom[1].set_xy(300., 20.);
 geom[2].set_xy(200., 200.);
 geom[3].set_xy(-200., -200.);
 herm.set_geom_vector(4, geom);
 herm.display_curve(32, 6, 0, 0);

 text_output_and_wait("Press return:");
 Bezier_curve bez;
 Nub_curve nub;
 clear_window();
 text_output("Bezier and NUB Curves");

 geom[0].set_xy(20., 20.);
 geom[1].set_xy(50., 180.);
 geom[2].set_xy(300., 50.);
 geom[3].set_xy(100., 10.);
 bez.set_geom_vector(4, geom);
 bez.display_curve(32, 15, 1, 1);

 text_output_and_wait("Press return:");
 knot[0] = knot[1] = knot[2] = knot[3] = 0.;
 knot[4] = knot[5] = knot[6] = knot[7] = 1.;
 nub.set_knot_vector(8, knot);
 nub.set_geom_vector(4, geom);
 nub.display_curve(32, 6, 0, 0);

 knot[0] = knot[1] = knot[2] = 0.;
 knot[3] = 1.; knot[4] = 2.;
 knot[5] = knot[6] = knot[7] = 3.;
 nub.set_knot_vector(8, knot);
 nub.set_geom_vector(4, geom);
 nub.display_curve(32, 5, 0, 0);

 text_output_and_wait("Press return:");
 Catmull_Rom_curve cm;
 clear_window();

 text_output("Catmull-Rom Curve");

 geom[0].set_xy(20., 20.);
 geom[1].set_xy(50., 180.);
 geom[2].set_xy(300., 50.);
 geom[3].set_xy(100., 10.);
 cm.set_geom_vector(4, geom);
 cm.display_curve(32, 15, 1, 1);

 text_output_and_wait("Press return:");
 Beta_spline_curve beta;
 clear_window();
 text_output("Beta-splines");

 geom[0].set_xy(20., 20.);
 geom[1].set_xy(170., 180.);
 geom[2].set_xy(150., 180.);
 geom[3].set_xy(300., 20.);


 beta.set_geom_vector(4, geom);
 beta.display_curve(32, 7, 1, 1);

 text_output_and_wait("Press return:");
 Nurb_curve nurb;
 clear_window();
 text_output("NURB Curves with varying Knots");

 geom[0].set_xy(20., 20.);
 geom[1].set_xy(50., 180.);
 geom[2].set_xy(300., 50.);
 geom[3].set_xy(100., 10.);
 knot[0] = knot[1] = knot[2] = knot[3] = 0.;
 knot[4] = knot[5] = knot[6] = knot[7] = 1.;
 nurb.set_knot_vector(8, knot);
 nurb.set_geom_vector(4, geom);
 nurb.display_curve(32, 15, 1, 1);

 knot[0] = knot[1] = knot[2] = 0.; knot[3] = 1.;
 knot[4] = 2.; knot[5] = knot[6] = knot[7] = 3.;
 nurb.set_knot_vector(8, knot);
 nurb.set_geom_vector(4, geom);
 nurb.display_curve(32, 14, 0, 0);

 knot[0] = 0.; knot[1] = 1.; knot[2] = 2.; knot[3] = 3.;
 knot[4] = 4.; knot[5] = 5.; knot[6] = 6.; knot[7] = 7.;
 nurb.set_knot_vector(8, knot);
 nurb.set_geom_vector(4, geom);
 nurb.display_curve(32, 13, 0, 0);

 knot[0] = 0.; knot[1] = 0.; knot[2] = 1.; knot[3] = 1.;
 knot[4] = 2.; knot[5] = 2.; knot[6] = 3.; knot[7] = 3.;
 nurb.set_knot_vector(8, knot);
 nurb.set_geom_vector(4, geom);
 nurb.display_curve(32, 12, 0, 0);
 text_output_and_wait("Press return:");
 close_graphics_device();

 return 0;
}




[LISTING SIX]

// utilTC.cc - display utilities for Turbo C
// For Borland C++, compile with:
// bcc -c -mh -P -v -Ie:\bc\include util.cc

#include <iostream.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <conio.h>
#include <alloc.h>
#include <dos.h>
#include <math.h>
#include "curve.h"


#define WIDTH 320
#define HEIGHT 200
#define ABS(x) ((x) < 0) ? -(x) : (x)

void init_graphics_device(void);
void close_graphics_device(void);
void set_pixel(short, short, short, short, short, short, short);
void line(short, short, short, short, short);
void clear_window(void);
void text_output(char *);
void text_output_and_wait(char *);
void display_string(int, char *);

// Global variables associated with the VGA display
unsigned char far *vga_fb = (unsigned char far *)MK_FP(0xA000,0);
void init_graphics_device(void)
{
 union REGS inregs, outregs;

 inregs.h.ah = 0; // set video mode command
 inregs.h.al = 0x13; // 320x200x8 mode
 int86(0x10, &inregs, &outregs);

 inregs.h.ah = 0xf; // check video mode
 int86(0x10, &inregs, &outregs);
 if (inregs.h.al != 0x13) {
 printf("cannot get 320x200x8 VGA mode : got %x\n",
inregs.h.al);
 exit(1);
 }
}
void close_graphics_device(void)
{
 union REGS inregs, outregs;
 inregs.h.ah = 0; // set video mode command
 inregs.h.al = 0x2; // 80x25 mode
 int86(0x10, &inregs, &outregs);
}
void line(short x1, short y1, short x2, short y2, short color)
{
 short x, y;
 short deltax, deltay;
 short temp;
 short err;
 short i;
 short swap;
 short s1, s2;

 x = x1; y = y1;
 deltax = (short)ABS(x2 - x1);
 deltay = (short)ABS(y2 - y1);

 if ((x2 - x1) < 0.) s1 = -1; else s1 = 1;
 if ((y2 - y1) < 0.) s2 = -1; else s2 = 1;
 if (deltay > deltax) {
 temp = deltax;
 deltax = deltay;
 deltay = temp;

 swap = 1;
 }
 else
 swap = 0;

 err = 2 * deltay - deltax;
 for (i = 1; i <= deltax; i++) {
 set_pixel(0, 0, WIDTH-1, HEIGHT-1, x, y, color);
 while (err >= 0) {
 if (swap) x += s1; else y += s2;
 err -= 2 * deltax;
 }
 if (swap) y += s2; else x += s1;
 err += 2 * deltay;
 }
}
void set_pixel(short xmin, short ymin, short xmax, short ymax,
 short x, short y, short color)
{
 unsigned char far *pixel;
 unsigned long offset;
 if (x >= xmin && x <= xmax && y >= ymin && y <= ymax) {
 offset = (unsigned long)(((unsigned long)WIDTH * (unsigned long)y) +
 (unsigned long)x);
 pixel = (unsigned char far *)(unsigned long)vga_fb +
 (unsigned long)offset;
 *pixel = (unsigned char)color;
 }
}
void clear_window()
{
 unsigned long far *pixel32;
 unsigned long count;

 // Clear 4 8-bit pixels at once by writing a 32 bit value
 // with a 32-bit plane mask.
 pixel32 = (unsigned long far *)vga_fb;
 // compute number of 8-bit pixels and then divide by 4 for words
 count = (unsigned long)WIDTH * (unsigned long)HEIGHT;
 count = count >> 2;
 while (count > 0) {
 *pixel32 = (unsigned long)0;
 pixel32++;
 count--;
 }
}
void text_output(char *str)
{
 display_string(23, str);
}
void text_output_and_wait(char *str)
{
 display_string(24, str);
 getch();
}
void display_string(int row, char *str)
{
 int i;
 union REGS inregs, outregs;

 for (i = 0; i < strlen(str); i++) {
 inregs.h.ah = 0x2; // set cursor position
 inregs.h.bh = 0x0; // display page
 inregs.h.dh = row; // near bottom of display
 inregs.h.dl = i; // column
 int86(0x10, &inregs, &outregs);

 inregs.h.ah = 0x9; // write character
 inregs.h.bh = 0x0; // display page
 inregs.h.bl = 0x7; // black BG/white FG
 inregs.h.al = str[i]; // character
 inregs.x.cx = 1; // repeat count
 int86(0x10, &inregs, &outregs);
 }
}


[LISTING SEVEN]

/* utilXlib.cc - utilities for Xlib */
#include <X11/Xlib.h>
#include <X11/X.h>
#include <math.h>

#define WIDTH 320
#define HEIGHT 220

void init_graphics_device(void);
void close_graphics_device(void);
void line(short, short, short, short, short);
void clear_window(void);
void text_output(char *);
void text_output_and_wait(char *);

Display *dpy;
Window win;
GC gc;

void init_graphics_device(void)
{
 XEvent event;
 int screen;
 int done;

 /* Open the X display and get the screen */
 dpy = XOpenDisplay(NULL);
 if (!dpy) {
 printf("could not open display\n");
 exit(1);
 }
 screen = DefaultScreen(dpy);
 /* Create a simple window which will be a child to the root window of
 * the display. */
 win = XCreateSimpleWindow(dpy, RootWindow(dpy, screen),
 0, 0, WIDTH, HEIGHT, 2, BlackPixel(dpy, screen),
 WhitePixel(dpy, screen));
 /* Display the window */
 XMapWindow(dpy, win);
 XSelectInput(dpy, win, ExposureMask KeyPressMask 

ButtonPressMask);
 done = 0;
 while(!done) {
 XNextEvent(dpy, &event);
 switch(event.type) {
 case Expose:
 done = 1;
 break;
 }
 }
 /* Create an X graphics context for rendering vectors that compromise
 * curves. */
 gc = XCreateGC(dpy, win, 0, NULL);
 XSetForeground(dpy, gc, BlackPixel(dpy, screen));
}
void close_graphics_device(void)
{
 exit(0);
}
void line(short x1, short y1, short x2, short y2, short color)
{
 /* Draw a 2D line using X calls and then flush the pixels to the server. */
 XDrawLine(dpy, win, gc, x1, y1, x2, y2);
 XFlush(dpy);
}
void clear_window()
{
 /* Clear the X window to the background color. */
 XClearWindow(dpy, win);
 XFlush(dpy);
}
void text_output(char *str)
{
 /* Put a string into the window. */
 XDrawString(dpy, win, gc, 0, 192, str, strlen(str));
 XFlush(dpy);
}
void text_output_and_wait(char *str)
{
 /* Put a string into the window and wait for user input. */
 XDrawString(dpy, win, gc, 0, 210, str, strlen(str));
 XFlush(dpy);
 getchar();
}




